精华内容
下载资源
问答
  • npm install maz-ui # Or yarn add maz-ui 按需安装 为了优化捆绑包的大小,建议使用 import Vue from 'vue' import { ... MazBtn MazInput MazPicker MazPhoneNumberInput ... } from 'maz-ui' ... Vue . use...
  • 什么是MAZMAZ不是一个地方。 这不是一回事。 这是人们追求的共同目标。 这是自由的想法。 MAZ Chan用于帮助我们实现该目标。 认真说明 MAZ Chan是服务器的不和谐机器人。 它是由该服务器的成员构建和维护的。 ...
  • mobaxterm20_maz.zip

    2020-06-14 17:44:07
    mobaxterm这款软件集合了很多运维需要使用的功能,功能比xshell,securecrt都要全面,试一试就会爱上它。
  • 欢迎参加校园顾问培训 欢迎参加校园顾问培训。 将为您提供视频课程,并要求您完成每节视频课程的作业。 所有视频和作业都可以在此存储库中找到。 欢迎您浏览该存储库,也可以仅单击本自述文件底部的链接,以帮助您...
  • 实现CSS Materialize,一个基于材料设计CSS框架。 目录 版权和许可 快速开始: 阅读入门指南,以获取有关如何使用实现的更多信息。 直接从GitHub下载最新版本的实现。 ... 通过cdnjs包含文件。...
  • WAYCON MAZ型液压缸内置磁致伸缩位移传感器 手册pdf,WAYCON MAZ型液压缸内置磁致伸缩位移传感器 手册
  • 欢迎使用Midwest Azure M365用户组网站 团体任务 为对Azure和M365上的云技术感兴趣的人创建一个包容性学习环境 每月提供来自全球社区成员的内容 为密歇根州,美国和世界各地的用户扩展知识 欢迎大家与他们的同事和...
  • MAZ题writeup

    2019-03-16 18:25:52
    MAZ.exe 提取码:9v16 题目探索 下载下来后发现是exe文件,可以直接打开,于是打开后如下 信息不多,于是把它放入32IDA后可以看到 好奇的我看到了surprise的对比,IDA显示surprise也是一个字符型数组(char[ ...

    一、题目备份

    maz 提取码:9v16

    二、题目探索

    下载下来后发现是exe文件,可以直接打开,于是打开后如下
    在这里插入图片描述

    信息不多,于是把它放入32IDA后可以看到
    在这里插入图片描述
    好奇的我看到了surprise的对比,IDA显示surprise也是一个字符型数组(char[ ]),于是我便去试了一下
    在这里插入图片描述
    结果却让我大吃一惊 = =


    认真揣摩代码,发现在这里有一个setmap函数
    在这里插入图片描述
    在这里插入图片描述既然是走迷宫,这里应该是类似布置迷宫地图的函数,而且0x20F转化为十进制是527,也就是memcpy刚好只拷贝到gwht前面的字符到map里面,我们继续往下看,能找到一个循环,仔细一看会发现它其实类似于一种认证环节

    在这里插入图片描述
    v5是我们输入的数据,输入下来会进到一个check函数,出来后的数值相当于踩到map上,进行一次IF检测。鼠标轻碰IF里面的49显示其实是一个char数值 对应的是 ‘1’ 意思是踩到的不能是地图上的 ‘1’ 也不能超过这个地图范围内,否则就byebye,而只有当出来的值等于End的时候就会出现Congratulation的提示,而End的值我们也可以点进去查看
    在这里插入图片描述
    121十六进制转化为十进制就是289
    我们进去check函数里看一下,发现操纵的其实是一个start的值
    start的值点进去也是有初始值的
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    52十六进制转化为十进制就是82,start开始就是82
    在这里插入图片描述在这里插入图片描述
    然而在check函数里面还有一个change函数,再进去后发现是把输入的数值,进行一次异或
    在这里插入图片描述
    到这里我们整理一下这道题的思绪,差不多可以得出这样的思维导图
    在这里插入图片描述

    三、解决思路

    我想先找到地图上有哪些位置是可以踩的,于是写了一个代码稍微还原了一些地图
    在这里插入图片描述
    在这里可以看到在82上有个s(start),在289上有个e(end),其实寓意着一个是迷宫的出发点,一个是迷宫的结束点,只要我们能使start数最后以289出来就行

    然后继续探究操作start的check函数,把后面的条件里的数按R转换为char型
    将他条件里的数值按照他的异或反着算回去

    在这里插入图片描述
    在这里插入图片描述
    可以得到输入 g–将start减31,w–将start加31,h–将start减1,t–将start加1
    顺着地图和操作数把这个迷宫慢慢走完= =
    在这里插入图片描述
    当然看一步走一步是非常难受的,而且很容易看错,所以答案并不是我一次过的,我甚至还打了个代码让我自己核对着地图看看我走的对不对
    在这里插入图片描述
    最后终于得到答案
    在这里插入图片描述

    展开全文
  • 本文考虑具有Hardy-Sobolev临界指数,Hardy-Sobolev-Maz'ya势和符号改变权重的椭圆方程。 通过使用Nehari流形和山口定理,可以获得至少四个不同的解。
  • 滤波器设计教程 Practical.Analog.and.Digital.Filter.Design_MAZ.pdf 很不错,与大家分享一下!!!!!!!!!!
  • NavicatPremiummaz免安装

    2018-01-27 10:55:42
    Navicat免安装,无需破解,MySQL、MariaDB、SQL Server、Oracle、PostgreSQL 和 SQLite 数据库
  • 十九、数据整理(上) 作者:Chris Albon 译者:飞龙 协议:CC BY-NC-SA 4.0 在 Pandas 中通过分组应用函数 import pandas as pd # 创建示例数据帧 data = {'Platoon': ['A','A','A','A','A','A','B','B','B','B'...

    十九、数据整理(上)

    作者:Chris Albon

    译者:飞龙

    协议:CC BY-NC-SA 4.0

    在 Pandas 中通过分组应用函数

    import pandas as pd
    
    # 创建示例数据帧
    data = {'Platoon': ['A','A','A','A','A','A','B','B','B','B','B','C','C','C','C','C'],
           'Casualties': [1,4,5,7,5,5,6,1,4,5,6,7,4,6,4,6]}
    df = pd.DataFrame(data)
    df
    
    CasualtiesPlatoon
    01A
    14A
    25A
    37A
    45A
    55A
    66B
    71B
    84B
    95B
    106B
    117C
    124C
    136C
    144C
    156C
    # 按照 df.platoon 对 df 分组
    # 然后将滚动平均 lambda 函数应用于 df.casualties
    df.groupby('Platoon')['Casualties'].apply(lambda x:x.rolling(center=False,window=2).mean())
    
    '''
    0     NaN
    1     2.5
    2     4.5
    3     6.0
    4     6.0
    5     5.0
    6     NaN
    7     3.5
    8     2.5
    9     4.5
    10    5.5
    11    NaN
    12    5.5
    13    5.0
    14    5.0
    15    5.0
    dtype: float64
    ''' 
    

    在 Pandas 中向分组应用操作

    # 导入模块
    import pandas as pd
    
    # 创建数据帧
    raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
            'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
            'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
    df
    
    regimentcompanynamepreTestScorepostTestScore
    0Nighthawks1stMiller425
    1Nighthawks1stJacobson2494
    2Nighthawks2ndAli3157
    3Nighthawks2ndMilner262
    4Dragoons1stCooze370
    5Dragoons1stJacon425
    6Dragoons2ndRyaner2494
    7Dragoons2ndSone3157
    8Scouts1stSloan262
    9Scouts1stPiger370
    10Scouts2ndRiani262
    11Scouts2ndAli370
    # 创建一个 groupby 变量,按团队(regiment)对 preTestScores 分组
    groupby_regiment = df['preTestScore'].groupby(df['regiment'])
    groupby_regiment
    
    # <pandas.core.groupby.SeriesGroupBy object at 0x113ddb550> 
    

    “这个分组变量现在是GroupBy对象。 除了分组的键df ['key1']的一些中间数据之外,它实际上还没有计算任何东西。 我们的想法是,该对象具有将所有操作应用于每个分组所需的所有信息。” – PyDA

    使用list()显示分组的样子。

    list(df['preTestScore'].groupby(df['regiment']))
    
    '''
    [('Dragoons', 4     3
      5     4
      6    24
      7    31
      Name: preTestScore, dtype: int64), ('Nighthawks', 0     4
      1    24
      2    31
      3     2
      Name: preTestScore, dtype: int64), ('Scouts', 8     2
      9     3
      10    2
      11    3
      Name: preTestScore, dtype: int64)] 
    '''
    
    df['preTestScore'].groupby(df['regiment']).describe()
    
    countmeanstdmin25%50%75%max
    regiment
    Dragoons4.015.5014.1539163.03.7514.025.7531.0
    Nighthawks4.015.2514.4539502.03.5014.025.7531.0
    Scouts4.02.500.5773502.02.002.53.003.0
    # 每个团队的 preTestScore 均值
    groupby_regiment.mean()
    
    '''
    regiment
    Dragoons      15.50
    Nighthawks    15.25
    Scouts         2.50
    Name: preTestScore, dtype: float64 
    '''
    
    df['preTestScore'].groupby([df['regiment'], df['company']]).mean()
    
    '''
    regiment    company
    Dragoons    1st         3.5
                2nd        27.5
    Nighthawks  1st        14.0
                2nd        16.5
    Scouts      1st         2.5
                2nd         2.5
    Name: preTestScore, dtype: float64 
    '''
    
    df['preTestScore'].groupby([df['regiment'], df['company']]).mean().unstack()
    
    company1st2nd
    regiment
    Dragoons3.527.5
    Nighthawks14.016.5
    Scouts2.52.5
    # 按团队和公司(company)对整个数据帧分组
    df.groupby(['regiment', 'company']).mean()
    
    preTestScorepostTestScore
    regimentcompany
    Dragoons1st3.547.5
    2nd27.575.5
    Nighthawks1st14.059.5
    2nd16.559.5
    Scouts1st2.566.0
    2nd2.566.0
    # 每个团队和公司的观测数量
    df.groupby(['regiment', 'company']).size()
    
    '''
    regiment    company
    Dragoons    1st        2
                2nd        2
    Nighthawks  1st        2
                2nd        2
    Scouts      1st        2
                2nd        2
    dtype: int64 
    '''
    
    # 按团队对数据帧分组,对于每个团队,
    for name, group in df.groupby('regiment'): 
        # 打印团队名称
        print(name)
        # 打印它的数据
        print(group)
    
    
    '''
    Dragoons
       regiment company    name  preTestScore  postTestScore
    4  Dragoons     1st   Cooze             3             70
    5  Dragoons     1st   Jacon             4             25
    6  Dragoons     2nd  Ryaner            24             94
    7  Dragoons     2nd    Sone            31             57
    Nighthawks
         regiment company      name  preTestScore  postTestScore
    0  Nighthawks     1st    Miller             4             25
    1  Nighthawks     1st  Jacobson            24             94
    2  Nighthawks     2nd       Ali            31             57
    3  Nighthawks     2nd    Milner             2             62
    Scouts
       regiment company   name  preTestScore  postTestScore
    8    Scouts     1st  Sloan             2             62
    9    Scouts     1st  Piger             3             70
    10   Scouts     2nd  Riani             2             62
    11   Scouts     2nd    Ali             3             70 
    '''
    

    按列分组:

    特别是在这种情况下:按列对数据类型(即axis = 1)分组,然后使用list()查看该分组的外观。

    list(df.groupby(df.dtypes, axis=1))
    
    '''
    [(dtype('int64'),     preTestScore  postTestScore
      0              4             25
      1             24             94
      2             31             57
      3              2             62
      4              3             70
      5              4             25
      6             24             94
      7             31             57
      8              2             62
      9              3             70
      10             2             62
      11             3             70),
     (dtype('O'),       regiment company      name
      0   Nighthawks     1st    Miller
      1   Nighthawks     1st  Jacobson
      2   Nighthawks     2nd       Ali
      3   Nighthawks     2nd    Milner
      4     Dragoons     1st     Cooze
      5     Dragoons     1st     Jacon
      6     Dragoons     2nd    Ryaner
      7     Dragoons     2nd      Sone
      8       Scouts     1st     Sloan
      9       Scouts     1st     Piger
      10      Scouts     2nd     Riani
      11      Scouts     2nd       Ali)] 
    
    df.groupby('regiment').mean().add_prefix('mean_')
    
    mean_preTestScoremean_postTestScore
    regiment
    Dragoons15.5061.5
    Nighthawks15.2559.5
    Scouts2.5066.0
    # 创建获取分组状态的函数
    def get_stats(group):
        return {'min': group.min(), 'max': group.max(), 'count': group.count(), 'mean': group.mean()}
    
    bins = [0, 25, 50, 75, 100]
    group_names = ['Low', 'Okay', 'Good', 'Great']
    df['categories'] = pd.cut(df['postTestScore'], bins, labels=group_names)
    
    df['postTestScore'].groupby(df['categories']).apply(get_stats).unstack()
    
    countmaxmeanmin
    categories
    Good8.070.063.7557.0
    Great2.094.094.0094.0
    Low2.025.025.0025.0
    Okay0.0NaNNaNNaN

    在 Pandas 数据帧上应用操作

    # 导入模型
    import pandas as pd
    import numpy as np
    
    data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'year': [2012, 2012, 2013, 2014, 2014], 
            'reports': [4, 24, 31, 2, 3],
            'coverage': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
    df
    
    coveragenamereportsyear
    Cochice25Jason42012
    Pima94Molly242012
    Santa Cruz57Tina312013
    Maricopa62Jake22014
    Yuma70Amy32014
    # 创建大写转换的 lambda 函数
    capitalizer = lambda x: x.upper()
    

    capitalizer函数应用于name列。

    apply()可以沿数据帧的任意轴应用函数。

    df['name'].apply(capitalizer)
    
    '''
    Cochice       JASON
    Pima          MOLLY
    Santa Cruz     TINA
    Maricopa       JAKE
    Yuma            AMY
    Name: name, dtype: object 
    '''
    

    capitalizer lambda 函数映射到序列name中的每个元素。

    map()对序列的每个元素应用操作。

    df['name'].map(capitalizer)
    
    '''
    Cochice       JASON
    Pima          MOLLY
    Santa Cruz     TINA
    Maricopa       JAKE
    Yuma            AMY
    Name: name, dtype: object 
    '''
    

    将平方根函数应用于整个数据帧中的每个单元格。

    applymap()将函数应用于整个数据帧中的每个元素。

    # 删除字符串变量,以便 applymap() 可以运行
    df = df.drop('name', axis=1)
    
    # 返回数据帧每个单元格的平方根
    df.applymap(np.sqrt)
    
    coveragereportsyear
    Cochice5.0000002.00000044.855323
    Pima9.6953604.89897944.855323
    Santa Cruz7.5498345.56776444.866469
    Maricopa7.8740081.41421444.877611
    Yuma8.3666001.73205144.877611

    在数据帧上应用函数。

    # 创建叫做 times100 的函数
    def times100(x):
        # 如果 x 是字符串,
        if type(x) is str:
            # 原样返回它
            return x
        # 如果不是,返回它乘上 100
        elif x:
            return 100 * x
        # 并留下其它东西
        else:
            return
    
    df.applymap(times100)
    
    coveragereportsyear
    Cochice2500400201200
    Pima94002400201200
    Santa Cruz57003100201300
    Maricopa6200200201400
    Yuma7000300201400

    向 Pandas 数据帧赋予新列

    import pandas as pd
    
    # 创建空数据帧
    df = pd.DataFrame()
    
    # 创建一列
    df['name'] = ['John', 'Steve', 'Sarah']
    
    # 查看数据帧
    df
    
    name
    0John
    1Steve
    2Sarah
    # 将一个新列赋予名为 age 的 df,它包含年龄列表
    df.assign(age = [31, 32, 19])
    
    nameage
    0John31
    1Steve32
    2Sarah19

    将列表拆分为大小为 N 的分块

    在这个片段中,我们接受一个列表并将其分解为大小为 n 的块。 在处理具有最大请求大小的 API 时,这是一种非常常见的做法。

    这个漂亮的函数由 Ned Batchelder 贡献,发布于 StackOverflow

    # 创建名称列表
    first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack','Bob', 'Bily', 'Boni', 'Chris','Sori', 'Will', 'Won','Li']
    
    # 创建叫做 chunks 的函数,有两个参数 l 和 n
    def chunks(l, n):
        # 对于长度为 l 的范围中的项目 i
        for i in range(0, len(l), n):
            # 创建索引范围
            yield l[i:i+n]
    
    # 从函数 chunks 的结果创建一个列表
    list(chunks(first_names, 5))
    
    '''
    [['Steve', 'Jane', 'Sara', 'Mary', 'Jack'],
     ['Bob', 'Bily', 'Boni', 'Chris', 'Sori'],
     ['Will', 'Won', 'Li']] 
    '''
    

    在 Pandas 中使用正则表达式将字符串分解为列

    # 导入模块
    import re
    import pandas as pd
    
    # 创建带有一列字符串的数据帧
    data = {'raw': ['Arizona 1 2014-12-23       3242.0',
                    'Iowa 1 2010-02-23       3453.7',
                    'Oregon 0 2014-06-20       2123.0',
                    'Maryland 0 2014-03-14       1123.6',
                    'Florida 1 2013-01-15       2134.0',
                    'Georgia 0 2012-07-14       2345.6']}
    df = pd.DataFrame(data, columns = ['raw'])
    df
    
    raw
    0Arizona 1 2014-12-23 3242.0
    1Iowa 1 2010-02-23 3453.7
    2Oregon 0 2014-06-20 2123.0
    3Maryland 0 2014-03-14 1123.6
    4Florida 1 2013-01-15 2134.0
    5Georgia 0 2012-07-14 2345.6
    # df['raw'] 的哪些行包含 'xxxx-xx-xx'?
    df['raw'].str.contains('....-..-..', regex=True)
    
    '''
    0    True
    1    True
    2    True
    3    True
    4    True
    5    True
    Name: raw, dtype: bool 
    '''
    
    # 在 raw 列中,提取字符串中的单个数字
    df['female'] = df['raw'].str.extract('(\d)', expand=True)
    df['female']
    
    '''
    0    1
    1    1
    2    0
    3    0
    4    1
    5    0
    Name: female, dtype: object 
    '''
    
    # 在 raw 列中,提取字符串中的 xxxx-xx-xx
    df['date'] = df['raw'].str.extract('(....-..-..)', expand=True)
    df['date']
    
    '''
    0    2014-12-23
    1    2010-02-23
    2    2014-06-20
    3    2014-03-14
    4    2013-01-15
    5    2012-07-14
    Name: date, dtype: object 
    '''
    
    # 在 raw 列中,提取字符串中的 ####.##
    df['score'] = df['raw'].str.extract('(\d\d\d\d\.\d)', expand=True)
    df['score']
    
    '''
    0    3242.0
    1    3453.7
    2    2123.0
    3    1123.6
    4    2134.0
    5    2345.6
    Name: score, dtype: object 
    '''
    
    # 在 raw 列中,提取字符串中的单词
    df['state'] = df['raw'].str.extract('([A-Z]\w{0,})', expand=True)
    df['state']
    
    '''
    0     Arizona
    1        Iowa
    2      Oregon
    3    Maryland
    4     Florida
    5     Georgia
    Name: state, dtype: object 
    '''
    
    df
    
    rawfemaledatescorestate
    0Arizona 1 2014-12-23 3242.012014-12-233242.0Arizona
    1Iowa 1 2010-02-23 3453.712010-02-233453.7Iowa
    2Oregon 0 2014-06-20 2123.002014-06-202123.0Oregon
    3Maryland 0 2014-03-14 1123.602014-03-141123.6Maryland
    4Florida 1 2013-01-15 2134.012013-01-152134.0Florida
    5Georgia 0 2012-07-14 2345.602012-07-142345.6Georgia

    由两个数据帧贡献列

    # 导入库
    import pandas as pd
    
    # 创建数据帧
    dataframe_one = pd.DataFrame()
    dataframe_one['1'] = ['1', '1', '1']
    dataframe_one['B'] = ['b', 'b', 'b']
    
    # 创建第二个数据帧
    dataframe_two = pd.DataFrame()
    dataframe_two['2'] = ['2', '2', '2']
    dataframe_two['B'] = ['b', 'b', 'b']
    
    # 将每个数据帧的列转换为集合,
    # 然后找到这两个集合的交集。
    # 这将是两个数据帧共享的列的集合。
    set.intersection(set(dataframe_one), set(dataframe_two))
    
    # {'B'} 
    

    从多个列表构建字典

    # 创建官员名称的列表
    officer_names = ['Sodoni Dogla', 'Chris Jefferson', 'Jessica Billars', 'Michael Mulligan', 'Steven Johnson']
    
    # 创建官员军队的列表
    officer_armies = ['Purple Army', 'Orange Army', 'Green Army', 'Red Army', 'Blue Army']
    
    # 创建字典,它是两个列表的 zip
    dict(zip(officer_names, officer_armies))
    
    '''
    {'Chris Jefferson': 'Orange Army',
     'Jessica Billars': 'Green Army',
     'Michael Mulligan': 'Red Army',
     'Sodoni Dogla': 'Purple Army',
     'Steven Johnson': 'Blue Army'} 
    '''
    

    将 CSV 转换为 Python 代码来重建它

    # 导入 pandas 包
    import pandas as pd
    
    # 将 csv 文件加载为数据帧
    df_original = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv')
    df = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv')
    
    # 打印创建数据帧的代码
    print('==============================')
    print('RUN THE CODE BELOW THIS LINE')
    print('==============================')
    print('raw_data =', df.to_dict(orient='list'))
    print('df = pd.DataFrame(raw_data, columns = ' + str(list(df_original)) + ')')
    
    '''
    ==============================
    RUN THE CODE BELOW THIS LINE
    ==============================
    raw_data = {'Sepal.Length': [5.0999999999999996, 4.9000000000000004, 4.7000000000000002, 4.5999999999999996, 5.0, 5.4000000000000004, 4.5999999999999996, 5.0, 4.4000000000000004, 4.9000000000000004, 5.4000000000000004, 4.7999999999999998, 4.7999999999999998, 4.2999999999999998, 5.7999999999999998, 5.7000000000000002, 5.4000000000000004, 5.0999999999999996, 5.7000000000000002, 5.0999999999999996, 5.4000000000000004, 5.0999999999999996, 4.5999999999999996, 5.0999999999999996, 4.7999999999999998, 5.0, 5.0, 5.2000000000000002, 5.2000000000000002, 4.7000000000000002, 4.7999999999999998, 5.4000000000000004, 5.2000000000000002, 5.5, 4.9000000000000004, 5.0, 5.5, 4.9000000000000004, 4.4000000000000004, 5.0999999999999996, 5.0, 4.5, 4.4000000000000004, 5.0, 5.0999999999999996, 4.7999999999999998, 5.0999999999999996, 4.5999999999999996, 5.2999999999999998, 5.0, 7.0, 6.4000000000000004, 6.9000000000000004, 5.5, 6.5, 5.7000000000000002, 6.2999999999999998, 4.9000000000000004, 6.5999999999999996, 5.2000000000000002, 5.0, 5.9000000000000004, 6.0, 6.0999999999999996, 5.5999999999999996, 6.7000000000000002, 5.5999999999999996, 5.7999999999999998, 6.2000000000000002, 5.5999999999999996, 5.9000000000000004, 6.0999999999999996, 6.2999999999999998, 6.0999999999999996, 6.4000000000000004, 6.5999999999999996, 6.7999999999999998, 6.7000000000000002, 6.0, 5.7000000000000002, 5.5, 5.5, 5.7999999999999998, 6.0, 5.4000000000000004, 6.0, 6.7000000000000002, 6.2999999999999998, 5.5999999999999996, 5.5, 5.5, 6.0999999999999996, 5.7999999999999998, 5.0, 5.5999999999999996, 5.7000000000000002, 5.7000000000000002, 6.2000000000000002, 5.0999999999999996, 5.7000000000000002, 6.2999999999999998, 5.7999999999999998, 7.0999999999999996, 6.2999999999999998, 6.5, 7.5999999999999996, 4.9000000000000004, 7.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.5, 6.4000000000000004, 6.7999999999999998, 5.7000000000000002, 5.7999999999999998, 6.4000000000000004, 6.5, 7.7000000000000002, 7.7000000000000002, 6.0, 6.9000000000000004, 5.5999999999999996, 7.7000000000000002, 6.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.2000000000000002, 6.0999999999999996, 6.4000000000000004, 7.2000000000000002, 7.4000000000000004, 7.9000000000000004, 6.4000000000000004, 6.2999999999999998, 6.0999999999999996, 7.7000000000000002, 6.2999999999999998, 6.4000000000000004, 6.0, 6.9000000000000004, 6.7000000000000002, 6.9000000000000004, 5.7999999999999998, 6.7999999999999998, 6.7000000000000002, 6.7000000000000002, 6.2999999999999998, 6.5, 6.2000000000000002, 5.9000000000000004], 'Petal.Width': [0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.10000000000000001, 0.20000000000000001, 0.40000000000000002, 0.40000000000000002, 0.29999999999999999, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.5, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.59999999999999998, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 1.3999999999999999, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6000000000000001, 1.0, 1.3, 1.3999999999999999, 1.0, 1.5, 1.0, 1.3999999999999999, 1.3, 1.3999999999999999, 1.5, 1.0, 1.5, 1.1000000000000001, 1.8, 1.3, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3999999999999999, 1.7, 1.5, 1.0, 1.1000000000000001, 1.0, 1.2, 1.6000000000000001, 1.5, 1.6000000000000001, 1.5, 1.3, 1.3, 1.3, 1.2, 1.3999999999999999, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1000000000000001, 1.3, 2.5, 1.8999999999999999, 2.1000000000000001, 1.8, 2.2000000000000002, 2.1000000000000001, 1.7, 1.8, 1.8, 2.5, 2.0, 1.8999999999999999, 2.1000000000000001, 2.0, 2.3999999999999999, 2.2999999999999998, 1.8, 2.2000000000000002, 2.2999999999999998, 1.5, 2.2999999999999998, 2.0, 2.0, 1.8, 2.1000000000000001, 1.8, 1.8, 1.8, 2.1000000000000001, 1.6000000000000001, 1.8999999999999999, 2.0, 2.2000000000000002, 1.5, 1.3999999999999999, 2.2999999999999998, 2.3999999999999999, 1.8, 1.8, 2.1000000000000001, 2.3999999999999999, 2.2999999999999998, 1.8999999999999999, 2.2999999999999998, 2.5, 2.2999999999999998, 1.8999999999999999, 2.0, 2.2999999999999998, 1.8], 'Petal.Length': [1.3999999999999999, 1.3999999999999999, 1.3, 1.5, 1.3999999999999999, 1.7, 1.3999999999999999, 1.5, 1.3999999999999999, 1.5, 1.5, 1.6000000000000001, 1.3999999999999999, 1.1000000000000001, 1.2, 1.5, 1.3, 1.3999999999999999, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.8999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.3999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.5, 1.3999999999999999, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6000000000000001, 1.8999999999999999, 1.3999999999999999, 1.6000000000000001, 1.3999999999999999, 1.5, 1.3999999999999999, 4.7000000000000002, 4.5, 4.9000000000000004, 4.0, 4.5999999999999996, 4.5, 4.7000000000000002, 3.2999999999999998, 4.5999999999999996, 3.8999999999999999, 3.5, 4.2000000000000002, 4.0, 4.7000000000000002, 3.6000000000000001, 4.4000000000000004, 4.5, 4.0999999999999996, 4.5, 3.8999999999999999, 4.7999999999999998, 4.0, 4.9000000000000004, 4.7000000000000002, 4.2999999999999998, 4.4000000000000004, 4.7999999999999998, 5.0, 4.5, 3.5, 3.7999999999999998, 3.7000000000000002, 3.8999999999999999, 5.0999999999999996, 4.5, 4.5, 4.7000000000000002, 4.4000000000000004, 4.0999999999999996, 4.0, 4.4000000000000004, 4.5999999999999996, 4.0, 3.2999999999999998, 4.2000000000000002, 4.2000000000000002, 4.2000000000000002, 4.2999999999999998, 3.0, 4.0999999999999996, 6.0, 5.0999999999999996, 5.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.5999999999999996, 4.5, 6.2999999999999998, 5.7999999999999998, 6.0999999999999996, 5.0999999999999996, 5.2999999999999998, 5.5, 5.0, 5.0999999999999996, 5.2999999999999998, 5.5, 6.7000000000000002, 6.9000000000000004, 5.0, 5.7000000000000002, 4.9000000000000004, 6.7000000000000002, 4.9000000000000004, 5.7000000000000002, 6.0, 4.7999999999999998, 4.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.0999999999999996, 6.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.5999999999999996, 6.0999999999999996, 5.5999999999999996, 5.5, 4.7999999999999998, 5.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.0999999999999996, 5.9000000000000004, 5.7000000000000002, 5.2000000000000002, 5.0, 5.2000000000000002, 5.4000000000000004, 5.0999999999999996], 'Species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica'], 'Sepal.Width': [3.5, 3.0, 3.2000000000000002, 3.1000000000000001, 3.6000000000000001, 3.8999999999999999, 3.3999999999999999, 3.3999999999999999, 2.8999999999999999, 3.1000000000000001, 3.7000000000000002, 3.3999999999999999, 3.0, 3.0, 4.0, 4.4000000000000004, 3.8999999999999999, 3.5, 3.7999999999999998, 3.7999999999999998, 3.3999999999999999, 3.7000000000000002, 3.6000000000000001, 3.2999999999999998, 3.3999999999999999, 3.0, 3.3999999999999999, 3.5, 3.3999999999999999, 3.2000000000000002, 3.1000000000000001, 3.3999999999999999, 4.0999999999999996, 4.2000000000000002, 3.1000000000000001, 3.2000000000000002, 3.5, 3.6000000000000001, 3.0, 3.3999999999999999, 3.5, 2.2999999999999998, 3.2000000000000002, 3.5, 3.7999999999999998, 3.0, 3.7999999999999998, 3.2000000000000002, 3.7000000000000002, 3.2999999999999998, 3.2000000000000002, 3.2000000000000002, 3.1000000000000001, 2.2999999999999998, 2.7999999999999998, 2.7999999999999998, 3.2999999999999998, 2.3999999999999999, 2.8999999999999999, 2.7000000000000002, 2.0, 3.0, 2.2000000000000002, 2.8999999999999999, 2.8999999999999999, 3.1000000000000001, 3.0, 2.7000000000000002, 2.2000000000000002, 2.5, 3.2000000000000002, 2.7999999999999998, 2.5, 2.7999999999999998, 2.8999999999999999, 3.0, 2.7999999999999998, 3.0, 2.8999999999999999, 2.6000000000000001, 2.3999999999999999, 2.3999999999999999, 2.7000000000000002, 2.7000000000000002, 3.0, 3.3999999999999999, 3.1000000000000001, 2.2999999999999998, 3.0, 2.5, 2.6000000000000001, 3.0, 2.6000000000000001, 2.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 2.8999999999999999, 2.5, 2.7999999999999998, 3.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 3.0, 3.0, 2.5, 2.8999999999999999, 2.5, 3.6000000000000001, 3.2000000000000002, 2.7000000000000002, 3.0, 2.5, 2.7999999999999998, 3.2000000000000002, 3.0, 3.7999999999999998, 2.6000000000000001, 2.2000000000000002, 3.2000000000000002, 2.7999999999999998, 2.7999999999999998, 2.7000000000000002, 3.2999999999999998, 3.2000000000000002, 2.7999999999999998, 3.0, 2.7999999999999998, 3.0, 2.7999999999999998, 3.7999999999999998, 2.7999999999999998, 2.7999999999999998, 2.6000000000000001, 3.0, 3.3999999999999999, 3.1000000000000001, 3.0, 3.1000000000000001, 3.1000000000000001, 3.1000000000000001, 2.7000000000000002, 3.2000000000000002, 3.2999999999999998, 3.0, 2.5, 3.0, 3.3999999999999999, 3.0], 'Unnamed: 0': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]}
    '''
    
    df = pd.DataFrame(raw_data, columns = ['Unnamed: 0', 'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species']) 
    
    # 如果你打算检查结果
    # 1\. 输入此单元格中上面单元格生成的代码
    raw_data = {'Petal.Width': [0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.10000000000000001, 0.20000000000000001, 0.40000000000000002, 0.40000000000000002, 0.29999999999999999, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.5, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.40000000000000002, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.10000000000000001, 0.20000000000000001, 0.20000000000000001, 0.29999999999999999, 0.29999999999999999, 0.20000000000000001, 0.59999999999999998, 0.40000000000000002, 0.29999999999999999, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 0.20000000000000001, 1.3999999999999999, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6000000000000001, 1.0, 1.3, 1.3999999999999999, 1.0, 1.5, 1.0, 1.3999999999999999, 1.3, 1.3999999999999999, 1.5, 1.0, 1.5, 1.1000000000000001, 1.8, 1.3, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3999999999999999, 1.7, 1.5, 1.0, 1.1000000000000001, 1.0, 1.2, 1.6000000000000001, 1.5, 1.6000000000000001, 1.5, 1.3, 1.3, 1.3, 1.2, 1.3999999999999999, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1000000000000001, 1.3, 2.5, 1.8999999999999999, 2.1000000000000001, 1.8, 2.2000000000000002, 2.1000000000000001, 1.7, 1.8, 1.8, 2.5, 2.0, 1.8999999999999999, 2.1000000000000001, 2.0, 2.3999999999999999, 2.2999999999999998, 1.8, 2.2000000000000002, 2.2999999999999998, 1.5, 2.2999999999999998, 2.0, 2.0, 1.8, 2.1000000000000001, 1.8, 1.8, 1.8, 2.1000000000000001, 1.6000000000000001, 1.8999999999999999, 2.0, 2.2000000000000002, 1.5, 1.3999999999999999, 2.2999999999999998, 2.3999999999999999, 1.8, 1.8, 2.1000000000000001, 2.3999999999999999, 2.2999999999999998, 1.8999999999999999, 2.2999999999999998, 2.5, 2.2999999999999998, 1.8999999999999999, 2.0, 2.2999999999999998, 1.8], 'Sepal.Width': [3.5, 3.0, 3.2000000000000002, 3.1000000000000001, 3.6000000000000001, 3.8999999999999999, 3.3999999999999999, 3.3999999999999999, 2.8999999999999999, 3.1000000000000001, 3.7000000000000002, 3.3999999999999999, 3.0, 3.0, 4.0, 4.4000000000000004, 3.8999999999999999, 3.5, 3.7999999999999998, 3.7999999999999998, 3.3999999999999999, 3.7000000000000002, 3.6000000000000001, 3.2999999999999998, 3.3999999999999999, 3.0, 3.3999999999999999, 3.5, 3.3999999999999999, 3.2000000000000002, 3.1000000000000001, 3.3999999999999999, 4.0999999999999996, 4.2000000000000002, 3.1000000000000001, 3.2000000000000002, 3.5, 3.6000000000000001, 3.0, 3.3999999999999999, 3.5, 2.2999999999999998, 3.2000000000000002, 3.5, 3.7999999999999998, 3.0, 3.7999999999999998, 3.2000000000000002, 3.7000000000000002, 3.2999999999999998, 3.2000000000000002, 3.2000000000000002, 3.1000000000000001, 2.2999999999999998, 2.7999999999999998, 2.7999999999999998, 3.2999999999999998, 2.3999999999999999, 2.8999999999999999, 2.7000000000000002, 2.0, 3.0, 2.2000000000000002, 2.8999999999999999, 2.8999999999999999, 3.1000000000000001, 3.0, 2.7000000000000002, 2.2000000000000002, 2.5, 3.2000000000000002, 2.7999999999999998, 2.5, 2.7999999999999998, 2.8999999999999999, 3.0, 2.7999999999999998, 3.0, 2.8999999999999999, 2.6000000000000001, 2.3999999999999999, 2.3999999999999999, 2.7000000000000002, 2.7000000000000002, 3.0, 3.3999999999999999, 3.1000000000000001, 2.2999999999999998, 3.0, 2.5, 2.6000000000000001, 3.0, 2.6000000000000001, 2.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 2.8999999999999999, 2.5, 2.7999999999999998, 3.2999999999999998, 2.7000000000000002, 3.0, 2.8999999999999999, 3.0, 3.0, 2.5, 2.8999999999999999, 2.5, 3.6000000000000001, 3.2000000000000002, 2.7000000000000002, 3.0, 2.5, 2.7999999999999998, 3.2000000000000002, 3.0, 3.7999999999999998, 2.6000000000000001, 2.2000000000000002, 3.2000000000000002, 2.7999999999999998, 2.7999999999999998, 2.7000000000000002, 3.2999999999999998, 3.2000000000000002, 2.7999999999999998, 3.0, 2.7999999999999998, 3.0, 2.7999999999999998, 3.7999999999999998, 2.7999999999999998, 2.7999999999999998, 2.6000000000000001, 3.0, 3.3999999999999999, 3.1000000000000001, 3.0, 3.1000000000000001, 3.1000000000000001, 3.1000000000000001, 2.7000000000000002, 3.2000000000000002, 3.2999999999999998, 3.0, 2.5, 3.0, 3.3999999999999999, 3.0], 'Species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica'], 'Unnamed: 0': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150], 'Sepal.Length': [5.0999999999999996, 4.9000000000000004, 4.7000000000000002, 4.5999999999999996, 5.0, 5.4000000000000004, 4.5999999999999996, 5.0, 4.4000000000000004, 4.9000000000000004, 5.4000000000000004, 4.7999999999999998, 4.7999999999999998, 4.2999999999999998, 5.7999999999999998, 5.7000000000000002, 5.4000000000000004, 5.0999999999999996, 5.7000000000000002, 5.0999999999999996, 5.4000000000000004, 5.0999999999999996, 4.5999999999999996, 5.0999999999999996, 4.7999999999999998, 5.0, 5.0, 5.2000000000000002, 5.2000000000000002, 4.7000000000000002, 4.7999999999999998, 5.4000000000000004, 5.2000000000000002, 5.5, 4.9000000000000004, 5.0, 5.5, 4.9000000000000004, 4.4000000000000004, 5.0999999999999996, 5.0, 4.5, 4.4000000000000004, 5.0, 5.0999999999999996, 4.7999999999999998, 5.0999999999999996, 4.5999999999999996, 5.2999999999999998, 5.0, 7.0, 6.4000000000000004, 6.9000000000000004, 5.5, 6.5, 5.7000000000000002, 6.2999999999999998, 4.9000000000000004, 6.5999999999999996, 5.2000000000000002, 5.0, 5.9000000000000004, 6.0, 6.0999999999999996, 5.5999999999999996, 6.7000000000000002, 5.5999999999999996, 5.7999999999999998, 6.2000000000000002, 5.5999999999999996, 5.9000000000000004, 6.0999999999999996, 6.2999999999999998, 6.0999999999999996, 6.4000000000000004, 6.5999999999999996, 6.7999999999999998, 6.7000000000000002, 6.0, 5.7000000000000002, 5.5, 5.5, 5.7999999999999998, 6.0, 5.4000000000000004, 6.0, 6.7000000000000002, 6.2999999999999998, 5.5999999999999996, 5.5, 5.5, 6.0999999999999996, 5.7999999999999998, 5.0, 5.5999999999999996, 5.7000000000000002, 5.7000000000000002, 6.2000000000000002, 5.0999999999999996, 5.7000000000000002, 6.2999999999999998, 5.7999999999999998, 7.0999999999999996, 6.2999999999999998, 6.5, 7.5999999999999996, 4.9000000000000004, 7.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.5, 6.4000000000000004, 6.7999999999999998, 5.7000000000000002, 5.7999999999999998, 6.4000000000000004, 6.5, 7.7000000000000002, 7.7000000000000002, 6.0, 6.9000000000000004, 5.5999999999999996, 7.7000000000000002, 6.2999999999999998, 6.7000000000000002, 7.2000000000000002, 6.2000000000000002, 6.0999999999999996, 6.4000000000000004, 7.2000000000000002, 7.4000000000000004, 7.9000000000000004, 6.4000000000000004, 6.2999999999999998, 6.0999999999999996, 7.7000000000000002, 6.2999999999999998, 6.4000000000000004, 6.0, 6.9000000000000004, 6.7000000000000002, 6.9000000000000004, 5.7999999999999998, 6.7999999999999998, 6.7000000000000002, 6.7000000000000002, 6.2999999999999998, 6.5, 6.2000000000000002, 5.9000000000000004], 'Petal.Length': [1.3999999999999999, 1.3999999999999999, 1.3, 1.5, 1.3999999999999999, 1.7, 1.3999999999999999, 1.5, 1.3999999999999999, 1.5, 1.5, 1.6000000000000001, 1.3999999999999999, 1.1000000000000001, 1.2, 1.5, 1.3, 1.3999999999999999, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.8999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.3999999999999999, 1.6000000000000001, 1.6000000000000001, 1.5, 1.5, 1.3999999999999999, 1.5, 1.2, 1.3, 1.3999999999999999, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6000000000000001, 1.8999999999999999, 1.3999999999999999, 1.6000000000000001, 1.3999999999999999, 1.5, 1.3999999999999999, 4.7000000000000002, 4.5, 4.9000000000000004, 4.0, 4.5999999999999996, 4.5, 4.7000000000000002, 3.2999999999999998, 4.5999999999999996, 3.8999999999999999, 3.5, 4.2000000000000002, 4.0, 4.7000000000000002, 3.6000000000000001, 4.4000000000000004, 4.5, 4.0999999999999996, 4.5, 3.8999999999999999, 4.7999999999999998, 4.0, 4.9000000000000004, 4.7000000000000002, 4.2999999999999998, 4.4000000000000004, 4.7999999999999998, 5.0, 4.5, 3.5, 3.7999999999999998, 3.7000000000000002, 3.8999999999999999, 5.0999999999999996, 4.5, 4.5, 4.7000000000000002, 4.4000000000000004, 4.0999999999999996, 4.0, 4.4000000000000004, 4.5999999999999996, 4.0, 3.2999999999999998, 4.2000000000000002, 4.2000000000000002, 4.2000000000000002, 4.2999999999999998, 3.0, 4.0999999999999996, 6.0, 5.0999999999999996, 5.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.5999999999999996, 4.5, 6.2999999999999998, 5.7999999999999998, 6.0999999999999996, 5.0999999999999996, 5.2999999999999998, 5.5, 5.0, 5.0999999999999996, 5.2999999999999998, 5.5, 6.7000000000000002, 6.9000000000000004, 5.0, 5.7000000000000002, 4.9000000000000004, 6.7000000000000002, 4.9000000000000004, 5.7000000000000002, 6.0, 4.7999999999999998, 4.9000000000000004, 5.5999999999999996, 5.7999999999999998, 6.0999999999999996, 6.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.5999999999999996, 6.0999999999999996, 5.5999999999999996, 5.5, 4.7999999999999998, 5.4000000000000004, 5.5999999999999996, 5.0999999999999996, 5.0999999999999996, 5.9000000000000004, 5.7000000000000002, 5.2000000000000002, 5.0, 5.2000000000000002, 5.4000000000000004, 5.0999999999999996]}
    df = pd.DataFrame(raw_data, columns = ['Unnamed: 0', 'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species'])
    
    # 查看原始数据帧的前几行
    df.head()
    
    Unnamed: 0Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
    015.13.51.40.2setosa
    124.93.01.40.2setosa
    234.73.21.30.2setosa
    344.63.11.50.2setosa
    455.03.61.40.2setosa
    # 查看使用我们的代码创建的,数据帧的前几行
    df_original.head()
    
    Unnamed: 0Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
    015.13.51.40.2setosa
    124.93.01.40.2setosa
    234.73.21.30.2setosa
    344.63.11.50.2setosa
    455.03.61.40.2setosa

    将分类变量转换为虚拟变量

    # 导入模块
    import pandas as pd
    
    # 创建数据帧
    raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
            'sex': ['male', 'female', 'male', 'female', 'female']}
    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'sex'])
    df
    
    first_namelast_namesex
    0JasonMillermale
    1MollyJacobsonfemale
    2TinaAlimale
    3JakeMilnerfemale
    4AmyCoozefemale
    # 从 sex 变量创建一组虚拟变量
    df_sex = pd.get_dummies(df['sex'])
    
    # 将虚拟变量连接到主数据帧
    df_new = pd.concat([df, df_sex], axis=1)
    df_new
    
    first_namelast_namesexfemalemale
    0JasonMillermale0.01.0
    1MollyJacobsonfemale1.00.0
    2TinaAlimale0.01.0
    3JakeMilnerfemale1.00.0
    4AmyCoozefemale1.00.0
    # 连接新列的替代方案
    df_new = df.join(df_sex)
    df_new
    
    first_namelast_namesexfemalemale
    0JasonMillermale0.01.0
    1MollyJacobsonfemale1.00.0
    2TinaAlimale0.01.0
    3JakeMilnerfemale1.00.0
    4AmyCoozefemale1.00.0

    将分类变量转换为虚拟变量

    # 导入模块
    import pandas as pd
    import patsy
    
    # 创建数据帧
    raw_data = {'countrycode': [1, 2, 3, 2, 1]} 
    df = pd.DataFrame(raw_data, columns = ['countrycode'])
    df
    
    countrycode
    01
    12
    23
    32
    41
    # 将 countrycode 变量转换为三个二元变量
    patsy.dmatrix('C(countrycode)-1', df, return_type='dataframe')
    
    C(countrycode)[1]C(countrycode)[2]C(countrycode)[3]
    01.00.00.0
    10.01.00.0
    20.00.01.0
    30.01.00.0
    41.00.00.0

    将字符串分类变量转换为数字变量

    # 导入模块
    import pandas as pd
    
    raw_data = {'patient': [1, 1, 1, 2, 2], 
            'obs': [1, 2, 3, 1, 2], 
            'treatment': [0, 1, 0, 1, 0],
            'score': ['strong', 'weak', 'normal', 'weak', 'strong']} 
    df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
    df
    
    patientobstreatmentscore
    0110strong
    1121weak
    2130normal
    3211weak
    4220strong
    # 创建一个函数,将 df['score'] 的所有值转换为数字
    def score_to_numeric(x):
        if x=='strong':
            return 3
        if x=='normal':
            return 2
        if x=='weak':
            return 1
    
    df['score_num'] = df['score'].apply(score_to_numeric)
    df
    
    patientobstreatmentscorescore_num
    0110strong3
    1121weak1
    2130normal2
    3211weak1
    4220strong3

    将变量转换为时间序列

    # 导入库
    import pandas as pd
    
    # 创建索引为一组名称的数据集
    raw_data = {'date': ['2014-06-01T01:21:38.004053', '2014-06-02T01:21:38.004053', '2014-06-03T01:21:38.004053'],
            'score': [25, 94, 57]}
    df = pd.DataFrame(raw_data, columns = ['date', 'score'])
    df
    
    datescore
    02014-06-01T01:21:38.00405325
    12014-06-02T01:21:38.00405394
    22014-06-03T01:21:38.00405357
    # 转置数据集,使索引(在本例中为名称)为列
    df["date"] = pd.to_datetime(df["date"])
    
    df = df.set_index(df["date"])
    
    df
    
    datescore
    date
    2014-06-01 01:21:38.0040532014-06-01 01:21:38.00405325
    2014-06-02 01:21:38.0040532014-06-02 01:21:38.00405394
    2014-06-03 01:21:38.0040532014-06-03 01:21:38.00405357

    在 Pandas 数据帧中计数

    # 导入库
    import pandas as pd
    
    year = pd.Series([1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 
                      1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894])
    guardCorps = pd.Series([0,2,2,1,0,0,1,1,0,3,0,2,1,0,0,1,0,1,0,1])
    corps1 = pd.Series([0,0,0,2,0,3,0,2,0,0,0,1,1,1,0,2,0,3,1,0])
    corps2 = pd.Series([0,0,0,2,0,2,0,0,1,1,0,0,2,1,1,0,0,2,0,0])
    corps3 = pd.Series([0,0,0,1,1,1,2,0,2,0,0,0,1,0,1,2,1,0,0,0])
    corps4 = pd.Series([0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,1,1,0,0])
    corps5 = pd.Series([0,0,0,0,2,1,0,0,1,0,0,1,0,1,1,1,1,1,1,0])
    corps6 = pd.Series([0,0,1,0,2,0,0,1,2,0,1,1,3,1,1,1,0,3,0,0])
    corps7 = pd.Series([1,0,1,0,0,0,1,0,1,1,0,0,2,0,0,2,1,0,2,0])
    corps8 = pd.Series([1,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,1,0,1])
    corps9 = pd.Series([0,0,0,0,0,2,1,1,1,0,2,1,1,0,1,2,0,1,0,0])
    corps10 = pd.Series([0,0,1,1,0,1,0,2,0,2,0,0,0,0,2,1,3,0,1,1])
    corps11 = pd.Series([0,0,0,0,2,4,0,1,3,0,1,1,1,1,2,1,3,1,3,1])
    corps14 = pd.Series([ 1,1,2,1,1,3,0,4,0,1,0,3,2,1,0,2,1,1,0,0])
    corps15 = pd.Series([0,1,0,0,0,0,0,1,0,1,1,0,0,0,2,2,0,0,0,0])
    
    variables = dict(guardCorps = guardCorps, corps1 = corps1, 
                     corps2 = corps2, corps3 = corps3, corps4 = corps4, 
                     corps5 = corps5, corps6 = corps6, corps7 = corps7, 
                     corps8 = corps8, corps9 = corps9, corps10 = corps10, 
                     corps11 = corps11 , corps14 = corps14, corps15 = corps15)
    
    horsekick = pd.DataFrame(variables, columns = ['guardCorps', 
                                                        'corps1', 'corps2', 
                                                        'corps3', 'corps4', 
                                                        'corps5', 'corps6', 
                                                        'corps7', 'corps8', 
                                                        'corps9', 'corps10', 
                                                        'corps11', 'corps14', 
                                                        'corps15'])
    
    horsekick.index = [1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 
                      1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894]
    
    horsekick
    
    guardCorpscorps1corps2corps3corps4corps5corps6corps7corps8corps9corps10corps11corps14corps15
    187500000001100010
    187620001000000011
    187720000011001020
    187812211000001010
    187900011220100210
    188003211100021430
    188110021001010000
    188212000010112141
    188300120121010300
    188430100001002011
    188500000010020101
    188621001110010130
    188711210032110120
    188801100110000110
    188900110110012202
    189012020112021122
    189100011101103310
    189213201130110110
    189301000102001300
    189410000000101100
    # 计算每个团队中每个死亡人数的次数
    result = horsekick.apply(pd.value_counts).fillna(0); result
    

    | | guardCorps | corps1 | corps2 | corps3 | corps4 | corps5 | corps6 | corps7 | corps8 | corps9 | corps10 | corps11 | corps14 | corps15 |
    | 0 | 9.0 | 11.0 | 12.0 | 11.0 | 12.0 | 10.0 | 9.0 | 11.0 | 13.0 | 10.0 | 10.0 | 6 | 6 | 14.0 |
    | 1 | 7.0 | 4.0 | 4.0 | 6.0 | 8.0 | 9.0 | 7.0 | 6.0 | 7.0 | 7.0 | 6.0 | 8 | 8 | 4.0 |
    | 2 | 3.0 | 3.0 | 4.0 | 3.0 | 0.0 | 1.0 | 2.0 | 3.0 | 0.0 | 3.0 | 3.0 | 2 | 3 | 2.0 |
    | 3 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 1.0 | 3 | 2 | 0.0 |
    | 4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 | 1 | 0.0 |

    # 计算每个月死亡总数出现在 guardCorps 的次数
    pd.value_counts(horsekick['guardCorps'].values, sort=False)
    
    '''
    0    9
    1    7
    2    3
    3    1
    dtype: int64 
    '''
    
    horsekick['guardCorps'].unique()
    
    # array([0, 2, 1, 3]) 
    

    在 Pandas 中创建流水线

    Pandas 的流水线功能允许你将 Python 函数串联在一起,来构建数据处理流水线。

    import pandas as pd
    
    # 创建空数据帧
    df = pd.DataFrame()
    
    # Create a column
    df['name'] = ['John', 'Steve', 'Sarah']
    df['gender'] = ['Male', 'Male', 'Female']
    df['age'] = [31, 32, 19]
    
    # 查看数据帧
    df
    
    namegenderage
    0JohnMale31
    1SteveMale32
    2SarahFemale19
    # 创建函数,
    def mean_age_by_group(dataframe, col):
        # 它按列分组数据,并返回每组的均值
        return dataframe.groupby(col).mean()
    
    # 创建函数,
    def uppercase_column_name(dataframe):
        # 它大写所有列标题
        dataframe.columns = dataframe.columns.str.upper()
        # 并返回它
        return dataframe
    
    # 创建流水线,它应用 mean_age_by_group 函数
    (df.pipe(mean_age_by_group, col='gender')
       # 之后应用 uppercase_column_name 函数
       .pipe(uppercase_column_name)
    )
    
    AGE
    gender
    Female19.0
    Male31.5

    使用for循环创建 Pandas 列

    import pandas as pd
    import numpy as np
    
    raw_data = {'student_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
            'test_score': [76, 88, 84, 67, 53, 96, 64, 91, 77, 73, 52, np.NaN]}
    df = pd.DataFrame(raw_data, columns = ['student_name', 'test_score'])
    
    # 创建列表来储存数据
    grades = []
    
    # 对于列中的每一行
    for row in df['test_score']:
        # 如果大于某个值
        if row > 95:
            # 添加字母分数
            grades.append('A')
        # 或者,如果大于某个值
        elif row > 90:
            # 添加字母分数
            grades.append('A-')
        # 或者,如果大于某个值
        elif row > 85:
            # 添加字母分数
            grades.append('B')
        # 或者,如果大于某个值
        elif row > 80:
            # 添加字母分数
            grades.append('B-')
        # 或者,如果大于某个值
        elif row > 75:
            # 添加字母分数
            grades.append('C')
        # 或者,如果大于某个值
        elif row > 70:
            # 添加字母分数
            grades.append('C-')
        # 或者,如果大于某个值
        elif row > 65:
            # 添加字母分数
            grades.append('D')
        # 或者,如果大于某个值
        elif row > 60:
            # 添加字母分数
            grades.append('D-')
        # 否则
        else:
            # 添加不及格分数
            grades.append('Failed')
    
    # 从列表创建一列
    df['grades'] = grades
    
    # 查看新数据帧
    df
    
    student_nametest_scoregrades
    0Miller76.0C
    1Jacobson88.0B
    2Ali84.0B-
    3Milner67.0D
    4Cooze53.0Failed
    5Jacon96.0A
    6Ryaner64.0D-
    7Sone91.0A-
    8Sloan77.0C
    9Piger73.0C-
    10Riani52.0Failed
    11AliNaNFailed

    创建项目计数

    from collections import Counter
    
    # 创建一个今天吃的水果的计数器
    fruit_eaten = Counter(['Apple', 'Apple', 'Apple', 'Banana', 'Pear', 'Pineapple'])
    
    # 查看计数器
    fruit_eaten
    
    # Counter({'Apple': 3, 'Banana': 1, 'Pear': 1, 'Pineapple': 1}) 
    
    # 更新菠萝的计数(因为你只吃菠萝)
    fruit_eaten.update(['Pineapple'])
    
    # 查看计数器
    fruit_eaten
    
    # Counter({'Apple': 3, 'Banana': 1, 'Pear': 1, 'Pineapple': 2}) 
    
    # 查看计数最大的三个项目
    fruit_eaten.most_common(3)
    
    # [('Apple', 3), ('Pineapple', 2), ('Banana', 1)] 
    

    基于条件创建一列

    # 导入所需模块
    import pandas as pd
    import numpy as np
    
    data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'age': [42, 52, 36, 24, 73], 
            'preTestScore': [4, 24, 31, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
    df
    
    nameagepreTestScorepostTestScore
    0Jason42425
    1Molly522494
    2Tina363157
    3Jake24262
    4Amy73370
    # 创建一个名为 df.elderly 的新列
    # 如果 df.age 大于 50 则值为 yes,否则为 no
    df['elderly'] = np.where(df['age']>=50, 'yes', 'no')
    
    # 查看数据帧
    df
    
    nameagepreTestScorepostTestScoreelderly
    0Jason42425no
    1Molly522494yes
    2Tina363157no
    3Jake24262no
    4Amy73370yes

    从词典键和值创建列表

    # 创建字典
    dict = {'county': ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'], 
            'year': [2012, 2012, 2013, 2014, 2014], 
            'fireReports': [4, 24, 31, 2, 3]}
    
    # 创建键的列表
    list(dict.keys())
    
    # ['fireReports', 'year', 'county'] 
    
    # 创建值的列表
    list(dict.values())
    
    '''
    [[4, 24, 31, 2, 3],
     [2012, 2012, 2013, 2014, 2014],
     ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma']] 
    '''
    

    Pandas 中的交叉表

    # 导入库
    import pandas as pd
    
    raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
            'company': ['infantry', 'infantry', 'cavalry', 'cavalry', 'infantry', 'infantry', 'cavalry', 'cavalry','infantry', 'infantry', 'cavalry', 'cavalry'], 
            'experience': ['veteran', 'rookie', 'veteran', 'rookie', 'veteran', 'rookie', 'veteran', 'rookie','veteran', 'rookie', 'veteran', 'rookie'],
            'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
            'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'experience', 'name', 'preTestScore', 'postTestScore'])
    df
    
    regimentcompanyexperiencenamepreTestScorepostTestScore
    0NighthawksinfantryveteranMiller425
    1NighthawksinfantryrookieJacobson2494
    2NighthawkscavalryveteranAli3157
    3NighthawkscavalryrookieMilner262
    4DragoonsinfantryveteranCooze370
    5DragoonsinfantryrookieJacon425
    6DragoonscavalryveteranRyaner2494
    7DragoonscavalryrookieSone3157
    8ScoutsinfantryveteranSloan262
    9ScoutsinfantryrookiePiger370
    10ScoutscavalryveteranRiani262
    11ScoutscavalryrookieAli370

    按公司和团队创建交叉表。按公司和团队计算观测数量。

    pd.crosstab(df.regiment, df.company, margins=True)
    
    companycavalryinfantryAll
    regiment
    Dragoons224
    Nighthawks224
    Scouts224
    All6612
    # 为每个团队创建公司和经验的交叉表
    pd.crosstab([df.company, df.experience], df.regiment,  margins=True)
    
    regimentDragoonsNighthawksScoutsAll
    companyexperience
    cavalryrookie1113
    veteran1113
    infantryrookie1113
    veteran1113
    All44412

    删除重复

    # 导入模块
    import pandas as pd
    
    raw_data = {'first_name': ['Jason', 'Jason', 'Jason','Tina', 'Jake', 'Amy'], 
            'last_name': ['Miller', 'Miller', 'Miller','Ali', 'Milner', 'Cooze'], 
            'age': [42, 42, 1111111, 36, 24, 73], 
            'preTestScore': [4, 4, 4, 31, 2, 3],
            'postTestScore': [25, 25, 25, 57, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
    df
    
    first_namelast_nameagepreTestScorepostTestScore
    0JasonMiller42425
    1JasonMiller42425
    2JasonMiller1111111425
    3TinaAli363157
    4JakeMilner24262
    5AmyCooze73370
    # 确定哪些观测是重复的
    df.duplicated()
    
    '''
    0    False
    1     True
    2    False
    3    False
    4    False
    5    False
    dtype: bool 
    '''
    
    df.drop_duplicates()
    
    first_namelast_nameagepreTestScorepostTestScore
    0JasonMiller42425
    2JasonMiller1111111425
    3TinaAli363157
    4JakeMilner24262
    5AmyCooze73370
    # 删除 first_name 列中的重复项
    # 但保留重复集中的最后一个观测
    df.drop_duplicates(['first_name'], keep='last')
    
    first_namelast_nameagepreTestScorepostTestScore
    2JasonMiller1111111425
    3TinaAli363157
    4JakeMilner24262
    5AmyCooze73370

    Pandas 数据帧的描述性统计

    # 导入模块
    import pandas as pd
    
    data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'age': [42, 52, 36, 24, 73], 
            'preTestScore': [4, 24, 31, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
    df
    
    nameagepreTestScorepostTestScore
    0Jason42425
    1Molly522494
    2Tina363157
    3Jake24262
    4Amy73370

    5 rows × 4 columns

    # 所有年龄之和
    df['age'].sum()
    
    # 227 
    
    df['preTestScore'].mean()
    
    # 12.800000000000001 
    
    df['preTestScore'].cumsum()
    
    '''
    0     4
    1    28
    2    59
    3    61
    4    64
    Name: preTestScore, dtype: int64 
    '''
    
    df['preTestScore'].describe()
    
    '''
    count     5.000000
    mean     12.800000
    std      13.663821
    min       2.000000
    25%       3.000000
    50%       4.000000
    75%      24.000000
    max      31.000000
    Name: preTestScore, dtype: float64 
    '''
    
    df['preTestScore'].count()
    
    # 5 
    
    df['preTestScore'].min()
    
    # 2 
    
    df['preTestScore'].max()
    
    # 31 
    
    df['preTestScore'].median()
    
    # 4.0 
    
    df['preTestScore'].var()
    
    # 186.69999999999999 
    
    df['preTestScore'].std()
    
    # 13.663820841916802 
    
    df['preTestScore'].skew()
    
    # 0.74334524573267591 
    
    df['preTestScore'].kurt()
    
    # -2.4673543738411525 
    
    df.corr()
    
    agepreTestScorepostTestScore
    age1.000000-0.1056510.328852
    preTestScore-0.1056511.0000000.378039
    postTestScore0.3288520.3780391.000000

    3 rows × 3 columns

    # 协方差矩阵
    df.cov()
    
    agepreTestScorepostTestScore
    age340.80-26.65151.20
    preTestScore-26.65186.70128.65
    postTestScore151.20128.65620.30

    3 rows × 3 columns

    丢弃行或者列

    # 导入模块
    import pandas as pd
    
    data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'year': [2012, 2012, 2013, 2014, 2014], 
            'reports': [4, 24, 31, 2, 3]}
    df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
    df
    
    namereportsyear
    CochiceJason42012
    PimaMolly242012
    Santa CruzTina312013
    MaricopaJake22014
    YumaAmy32014
    # 丢弃观测(行)
    df.drop(['Cochice', 'Pima'])
    
    namereportsyear
    Santa CruzTina312013
    MaricopaJake22014
    YumaAmy32014
    # 丢弃变量(列)
    # 注意:`axis = 1`表示我们指的是列,而不是行
    df.drop('reports', axis=1)
    
    nameyear
    CochiceJason2012
    PimaMolly2012
    Santa CruzTina2013
    MaricopaJake2014
    YumaAmy2014

    如果它包含某个值(这里是Tina),丢弃一行。

    具体来说:创建一个名为df的新数据框,名称列中的单元格的值不等于Tina

    df[df.name != 'Tina']
    
    namereportsyear
    CochiceJason42012
    PimaMolly242012
    MaricopaJake22014
    YumaAmy32014

    按照行号丢弃一行(在本例中为第 3 行)。

    请注意,Pandas使用从零开始的编号,因此 0 是第一行,1 是第二行,等等。

    df.drop(df.index[2])
    
    namereportsyear
    CochiceJason42012
    PimaMolly242012
    MaricopaJake22014
    YumaAmy32014

    可以扩展到范围。

    df.drop(df.index[[2,3]])
    
    namereportsyear
    CochiceJason42012
    PimaMolly242012
    YumaAmy32014

    或相对于 DF 的末尾来丢弃。

    df.drop(df.index[-2])
    
    namereportsyear
    CochiceJason42012
    PimaMolly242012
    Santa CruzTina312013
    YumaAmy32014

    你也可以选择相对于起始或末尾的范围。

    df[:3] # 保留前三个
    
    namereportsyear
    CochiceJason42012
    PimaMolly242012
    Santa CruzTina312013
    df[:-3] # 丢掉后三个
    
    namereportsyear
    CochiceJason42012
    PimaMolly242012

    枚举列表

    # 创建字符串列表
    data = ['One','Two','Three','Four','Five']
    
    # 对于 enumerate(data) 中的每个项目
    for item in enumerate(data):
        # 打印整个枚举的元素
        print(item)
        # 只打印值(没有索引)
        print(item[1])
    
    '''
    (0, 'One')
    One
    (1, 'Two')
    Two
    (2, 'Three')
    Three
    (3, 'Four')
    Four
    (4, 'Five')
    Five 
    '''
    

    在 Pandas 中将包含列表的单元扩展为自己的变量

    # 导入 pandas
    import pandas as pd
    
    # 创建数据集
    raw_data = {'score': [1,2,3], 
            'tags': [['apple','pear','guava'],['truck','car','plane'],['cat','dog','mouse']]}
    df = pd.DataFrame(raw_data, columns = ['score', 'tags'])
    
    # 查看数据集
    df
    
    scoretags
    01[apple, pear, guava]
    12[truck, car, plane]
    23[cat, dog, mouse]
    # 将 df.tags 扩展为自己的数据帧
    tags = df['tags'].apply(pd.Series)
    
    # 将每个变量重命名为标签
    tags = tags.rename(columns = lambda x : 'tag_' + str(x))
    
    # 查看 tags 数据帧
    tags
    
    tag_0tag_1tag_2
    0applepearguava
    1truckcarplane
    2catdogmouse
    # 将 tags 数据帧添加回原始数据帧
    pd.concat([df[:], tags[:]], axis=1)
    
    scoretagstag_0tag_1tag_2
    01[apple, pear, guava]applepearguava
    12[truck, car, plane]truckcarplane
    23[cat, dog, mouse]catdogmouse

    过滤 pandas 数据帧

    # 导入模块
    import pandas as pd
    
    data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'year': [2012, 2012, 2013, 2014, 2014], 
            'reports': [4, 24, 31, 2, 3],
            'coverage': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
    df
    
    coveragenamereportsyear
    Cochice25Jason42012
    Pima94Molly242012
    Santa Cruz57Tina312013
    Maricopa62Jake22014
    Yuma70Amy32014
    # 查看列
    df['name']
    
    '''
    Cochice       Jason
    Pima          Molly
    Santa Cruz     Tina
    Maricopa       Jake
    Yuma            Amy
    Name: name, dtype: object 
    '''
    
    df[['name', 'reports']]
    
    namereports
    CochiceJason4
    PimaMolly24
    Santa CruzTina31
    MaricopaJake2
    YumaAmy3
    # 查看前两行
    df[:2]
    
    coveragenamereportsyear
    Cochice25Jason42012
    Pima94Molly242012
    # 查看 Coverage 大于 50 的行
    df[df['coverage'] > 50]
    
    coveragenamereportsyear
    Pima94Molly242012
    Santa Cruz57Tina312013
    Maricopa62Jake22014
    Yuma70Amy32014
    # 查看 Coverage 大于 50 并且 Reports 小于 4 的行
    df[(df['coverage']  > 50) & (df['reports'] < 4)]
    
    coveragenamereportsyear
    Maricopa62Jake22014
    Yuma70Amy32014

    寻找数据帧的列中的最大值

    # 导入模块
    %matplotlib inline
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    
    # 创建数据帧
    raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
            'age': [42, 52, 36, 24, 73], 
            'preTestScore': [4, 24, 31, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
    df
    
    first_namelast_nameagepreTestScorepostTestScore
    0JasonMiller42425
    1MollyJacobson522494
    2TinaAli363157
    3JakeMilner24262
    4AmyCooze73370
    # 获取 preTestScore 列中的最大值的索引
    df['preTestScore'].idxmax()
    
    # 2 
    

    寻找数据帧中的唯一值

    import pandas as pd
    import numpy as np
    
    raw_data = {'regiment': ['51st', '29th', '2nd', '19th', '12th', '101st', '90th', '30th', '193th', '1st', '94th', '91th'], 
                'trucks': ['MAZ-7310', np.nan, 'MAZ-7310', 'MAZ-7310', 'Tatra 810', 'Tatra 810', 'Tatra 810', 'Tatra 810', 'ZIS-150', 'Tatra 810', 'ZIS-150', 'ZIS-150'],
                'tanks': ['Merkava Mark 4', 'Merkava Mark 4', 'Merkava Mark 4', 'Leopard 2A6M', 'Leopard 2A6M', 'Leopard 2A6M', 'Arjun MBT', 'Leopard 2A6M', 'Arjun MBT', 'Arjun MBT', 'Arjun MBT', 'Arjun MBT'],
                'aircraft': ['none', 'none', 'none', 'Harbin Z-9', 'Harbin Z-9', 'none', 'Harbin Z-9', 'SH-60B Seahawk', 'SH-60B Seahawk', 'SH-60B Seahawk', 'SH-60B Seahawk', 'SH-60B Seahawk']}
    
    df = pd.DataFrame(raw_data, columns = ['regiment', 'trucks', 'tanks', 'aircraft'])
    
    # 查看前几行
    df.head()
    
    regimenttruckstanksaircraft
    051stMAZ-7310Merkava Mark 4none
    129thNaNMerkava Mark 4none
    22ndMAZ-7310Merkava Mark 4none
    319thMAZ-7310Leopard 2A6MHarbin Z-9
    412thTatra 810Leopard 2A6MHarbin Z-9
    # 通过将 pandas 列转换为集合
    # 创建唯一值的列表
    list(set(df.trucks))
    
    # [nan, 'Tatra 810', 'MAZ-7310', 'ZIS-150'] 
    
    # 创建 df.trucks 中的唯一值的列表
    list(df['trucks'].unique())
    
    # ['MAZ-7310', nan, 'Tatra 810', 'ZIS-150'] 
    

    地理编码和反向地理编码

    在使用地理数据时,地理编码(将物理地址或位置转换为经纬度)和反向地理编码(将经纬度转换为物理地址或位置)是常见任务。

    Python 提供了许多软件包,使任务变得异常简单。 在下面的教程中,我使用 pygeocoder(Google 的 geo-API 的包装器)来进行地理编码和反向地理编码。

    首先,我们要加载我们想要在脚本中使用的包。 具体来说,我正在为地理函数加载 pygeocoder,为数据帧结构加载 pandas,为缺失值(np.nan)函数加载 numpy。

    # 加载包
    from pygeocoder import Geocoder
    import pandas as pd
    import numpy as np
    

    地理数据有多种形式,在这种情况下,我们有一个 Python 字典,包含五个经纬度的字符串,每个坐标在逗号分隔的坐标对中。

    # 创建原始数据的字典
    data = {'Site 1': '31.336968, -109.560959',
            'Site 2': '31.347745, -108.229963',
            'Site 3': '32.277621, -107.734724',
            'Site 4': '31.655494, -106.420484',
            'Site 5': '30.295053, -104.014528'}
    

    虽然技术上没必要,因为我最初使用 R,我是数据帧的忠实粉丝,所以让我们把模拟的数据字典变成数据帧。

    # 将字典转换为 pandas 数据帧
    df = pd.DataFrame.from_dict(data, orient='index')
    
    # 查看数据帧
    df
    
    0
    Site 131.336968, -109.560959
    Site 231.347745, -108.229963
    Site 332.277621, -107.734724
    Site 431.655494, -106.420484
    Site 530.295053, -104.014528

    你现在可以看到,我们有了包含五行的数据帧,每行包含一个经纬度字符串。 在我们处理数据之前,我们需要1)将字符串分成纬度和经度,然后将它们转换为浮点数。以下代码就是这样。

    # 为循环创建两个列表
    lat = []
    lon = []
    
    # 对于变量中的每一行
    for row in df[0]:
        # 尝试
        try:
            # 用逗号分隔行,转换为浮点
            # 并将逗号前的所有内容追加到 lat
            lat.append(float(row.split(',')[0]))
            # 用逗号分隔行,转换为浮点
            # 并将逗号后的所有内容追加到 lon
            lon.append(float(row.split(',')[1]))
        # 但是如果你得到了错误
        except:
            # 向 lat 添加缺失值
            lat.append(np.NaN)
            # 向 lon 添加缺失值
            lon.append(np.NaN)
    
    # 从 lat 和 lon 创建新的两列
    df['latitude'] = lat
    df['longitude'] = lon
    

    让我们看看现在有了什么。

    # 查看数据帧
    df
    
    0latitudelongitude
    Site 131.336968, -109.56095931.336968-109.560959
    Site 231.347745, -108.22996331.347745-108.229963
    Site 332.277621, -107.73472432.277621-107.734724
    Site 431.655494, -106.42048431.655494-106.420484
    Site 530.295053, -104.01452830.295053-104.014528

    真棒。这正是我们想要看到的,一列用于纬度的浮点和一列用于经度的浮点。

    为了反转地理编码,我们将特定的经纬度对(这里为第一行,索引为0)提供给 pygeocoder 的reverse_geocoder函数。

    # 将经度和纬度转换为某个位置
    results = Geocoder.reverse_geocode(df['latitude'][0], df['longitude'][0])
    

    现在我们可以开始提取我们想要的数据了。

    # 打印经纬度
    results.coordinates
    
    # (31.3372728, -109.5609559) 
    
    # 打印城市
    results.city
    
    # 'Douglas' 
    
    # 打印国家/地区
    results.country
    
    # 'United States' 
    
    # 打印街道地址(如果可用)
    results.street_address
    
    # 打印行政区
    results.administrative_area_level_1
    
    # 'Arizona' 
    

    对于地理编码,我们需要将包含地址或位置(例如城市)的字符串,传入地理编码函数中。 但是,并非所有字符串的格式都是 Google 的 geo-API 可以理解的。 如果由.geocode().valid_address函数验证有效,我们可以转换。

    # 验证地址是否有效(即在 Google 的系统中)
    Geocoder.geocode("4207 N Washington Ave, Douglas, AZ 85607").valid_address
    
    # True 
    

    因为输出是True,我们现在知道这是一个有效的地址,因此可以打印纬度和经度坐标。

    # 打印经纬度
    results.coordinates
    
    # (31.3372728, -109.5609559) 
    

    但更有趣的是,一旦地址由 Google 地理 API 处理,我们就可以解析它并轻松地分隔街道号码,街道名称等。

    # 寻找特定地址中的经纬度
    result = Geocoder.geocode("7250 South Tucson Boulevard, Tucson, AZ 85756")
    
    # 打印街道号码
    result.street_number
    
    # '7250' 
    
    # 打印街道名
    result.route
    
    # 'South Tucson Boulevard' 
    

    你就实现了它。Python 使整个过程变得简单,只需几分钟即可完成分析。祝好运!

    地理定位城市和国家

    本教程创建一个函数,尝试获取城市和国家并返回其经纬度。 但是当城市不可用时(通常是这种情况),则返回该国中心的经纬度。

    from geopy.geocoders import Nominatim
    geolocator = Nominatim()
    import numpy as np
    
    def geolocate(city=None, country=None):
        '''
        输入城市和国家,或仅输入国家。 如果可以的话,返回城市的经纬度坐标,否则返回该国家中心的经纬度。
        '''
    
        # 如果城市存在
        if city != None:
            # 尝试
            try:
                # 地理定位城市和国家
                loc = geolocator.geocode(str(city + ',' + country))
                # 并返回经纬度
                return (loc.latitude, loc.longitude)
            # 否则
            except:
                # 返回缺失值
                return np.nan
        # 如果城市不存在
        else:
            # 尝试
            try:
                # 地理定位国家中心
                loc = geolocator.geocode(country)
                # 返回经纬度
                return (loc.latitude, loc.longitude)
            # 否则
            except:
                # 返回缺失值
                return np.nan
    
    # 地理定位城市和国家
    geolocate(city='Austin', country='USA')
    
    # (30.2711286, -97.7436995) 
    
    # 仅仅地理定位国家
    geolocate(country='USA')
    
    # (39.7837304, -100.4458824) 
    

    使用 pandas 分组时间序列

    # 导入所需模块
    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame()
    
    df['german_army'] = np.random.randint(low=20000, high=30000, size=100)
    df['allied_army'] = np.random.randint(low=20000, high=40000, size=100)
    df.index = pd.date_range('1/1/2014', periods=100, freq='H')
    
    df.head()
    
    german_armyallied_army
    2014-01-01 00:00:002875533938
    2014-01-01 01:00:002517628631
    2014-01-01 02:00:002326139685
    2014-01-01 03:00:002868627756
    2014-01-01 04:00:002458825681

    Truncate the dataframe

    df.truncate(before='1/2/2014', after='1/3/2014')
    
    german_armyallied_army
    2014-01-02 00:00:002640120189
    2014-01-02 01:00:002995823934
    2014-01-02 02:00:002449239075
    2014-01-02 03:00:002570739262
    2014-01-02 04:00:002712935961
    2014-01-02 05:00:002790325418
    2014-01-02 06:00:002040925163
    2014-01-02 07:00:002573634794
    2014-01-02 08:00:002405727209
    2014-01-02 09:00:002687533402
    2014-01-02 10:00:002396338575
    2014-01-02 11:00:002750631859
    2014-01-02 12:00:002356425750
    2014-01-02 13:00:002795824365
    2014-01-02 14:00:002491538866
    2014-01-02 15:00:002353833820
    2014-01-02 16:00:002336130080
    2014-01-02 17:00:002728422922
    2014-01-02 18:00:002417632155
    2014-01-02 19:00:002392427763
    2014-01-02 20:00:002311132343
    2014-01-02 21:00:002034828907
    2014-01-02 22:00:002713638634
    2014-01-02 23:00:002864929950
    2014-01-03 00:00:002129226395
    # 设置数据帧的索引
    df.index = df.index + pd.DateOffset(months=4, days=5)
    
    df.head()
    
    german_armyallied_army
    2014-05-06 00:00:002875533938
    2014-05-06 01:00:002517628631
    2014-05-06 02:00:002326139685
    2014-05-06 03:00:002868627756
    2014-05-06 04:00:002458825681
    # 将变量提前一小时
    df.shift(1).head()
    
    german_armyallied_army
    2014-05-06 00:00:00NaNNaN
    2014-05-06 01:00:0028755.033938.0
    2014-05-06 02:00:0025176.028631.0
    2014-05-06 03:00:0023261.039685.0
    2014-05-06 04:00:0028686.027756.0
    # 将变量延后一小时
    df.shift(-1).tail()
    
    german_armyallied_army
    2014-05-09 23:00:0026903.039144.0
    2014-05-10 00:00:0027576.039759.0
    2014-05-10 01:00:0025232.035246.0
    2014-05-10 02:00:0023391.021044.0
    2014-05-10 03:00:00NaNNaN
    # 对每小时观测值求和来按天汇总
    df.resample('D').sum()
    
    german_armyallied_army
    2014-05-06605161755962
    2014-05-07608100740396
    2014-05-08589744700297
    2014-05-09607092719283
    2014-05-10103102135193
    # 对每小时观测值求平均来按天汇总
    df.resample('D').mean()
    
    german_armyallied_army
    2014-05-0625215.04166731498.416667
    2014-05-0725337.50000030849.833333
    2014-05-0824572.66666729179.041667
    2014-05-0925295.50000029970.125000
    2014-05-1025775.50000033798.250000
    # 对每小时观测值求最小值来按天汇总
    df.resample('D').min()
    
    german_armyallied_army
    2014-05-0624882.031310.0
    2014-05-0725311.030969.5
    2014-05-0824422.528318.0
    2014-05-0924941.532082.5
    2014-05-1026067.537195.0
    # 对每小时观测值求中值来按天汇总
    df.resample('D').median()
    
    german_armyallied_army
    2014-05-0624882.031310.0
    2014-05-0725311.030969.5
    2014-05-0824422.528318.0
    2014-05-0924941.532082.5
    2014-05-1026067.537195.0
    # 对每小时观测值取第一个值来按天汇总
    df.resample('D').first()
    
    german_armyallied_army
    2014-05-062875533938
    2014-05-072640120189
    2014-05-082129226395
    2014-05-092576422613
    2014-05-102690339144
    # 对每小时观测值取最后一个值来按天汇总
    df.resample('D').last()
    
    german_armyallied_army
    2014-05-062821432110
    2014-05-072864929950
    2014-05-082837932600
    2014-05-092675222379
    2014-05-102339121044
    # 对每小时观测值取第一个值,最后一个值,最高值,最低值来按天汇总
    df.resample('D').ohlc()
    
    german_armyallied_army
    openhigh
    2014-05-062875529206
    2014-05-072640129958
    2014-05-082129229786
    2014-05-092576429952
    2014-05-102690327576

    按时间分组数据

    2016 年 3 月 13 日,Pandas 版本 0.18.0 发布,重采样功能的运行方式发生了重大变化。 本教程遵循 v0.18.0,不适用于以前版本的 pandas。

    首先让我们加载我们关心的模块。

    # 导入所需模块
    import pandas as pd
    import datetime
    import numpy as np
    

    接下来,让我们创建一些样例数据,我们可以将它们按时间分组作为样本。 在这个例子中,我创建了一个包含两列 365 行的数据帧。一列是日期,第二列是数值。

    # 为今天创建 datetime 变量
    base = datetime.datetime.today()
    # 创建一列变量
    # 包含 365 天的 datetime 值
    date_list = [base - datetime.timedelta(days=x) for x in range(0, 365)]
    
    # 创建 365 个数值的列表
    score_list = list(np.random.randint(low=1, high=1000, size=365))
    
    # 创建空数据帧
    df = pd.DataFrame()
    
    # 从 datetime 变量创建一列
    df['datetime'] = date_list
    # 将列转换为 datetime 类型
    df['datetime'] = pd.to_datetime(df['datetime'])
    # 将 datetime 列设为索引
    df.index = df['datetime'] 
    # 为数值得分变量创建一列
    df['score'] = score_list
    
    # 让我们看看数据
    df.head()
    
    datetimescore
    datetime
    2016-06-02 09:57:54.7939722016-06-02 09:57:54.793972900
    2016-06-01 09:57:54.7939722016-06-01 09:57:54.793972121
    2016-05-31 09:57:54.7939722016-05-31 09:57:54.793972547
    2016-05-30 09:57:54.7939722016-05-30 09:57:54.793972504
    2016-05-29 09:57:54.7939722016-05-29 09:57:54.793972304

    在 pandas 中,按时间分组的最常用方法是使用.resample()函数。 在 v0.18.0 中,此函数是两阶段的。 这意味着df.resample('M')创建了一个对象,我们可以对其应用其他函数(meancountsum等)

    # 按月对数据分组,并取每组(即每个月)的平均值
    df.resample('M').mean()
    
    score
    datetime
    2015-06-30513.629630
    2015-07-31561.516129
    2015-08-31448.032258
    2015-09-30548.000000
    2015-10-31480.419355
    2015-11-30487.033333
    2015-12-31499.935484
    2016-01-31429.193548
    2016-02-29520.413793
    2016-03-31349.806452
    2016-04-30395.500000
    2016-05-31503.451613
    2016-06-30510.500000
    # 按月对数据分组,并获取每组(即每个月)的总和
    df.resample('M').sum()
    
    score
    datetime
    2015-06-3013868
    2015-07-3117407
    2015-08-3113889
    2015-09-3016440
    2015-10-3114893
    2015-11-3014611
    2015-12-3115498
    2016-01-3113305
    2016-02-2915092
    2016-03-3110844
    2016-04-3011865
    2016-05-3115607
    2016-06-301021

    分组有很多选项。 你可以在 Pandas 的时间序列文档中了解它们的更多信息,但是,为了你的方便,我也在下面列出了它们。

    描述
    Bbusiness day frequency
    Ccustom business day frequency (experimental)
    Dcalendar day frequency
    Wweekly frequency
    Mmonth end frequency
    BMbusiness month end frequency
    CBMcustom business month end frequency
    MSmonth start frequency
    BMSbusiness month start frequency
    Qquarter end frequency
    BQbusiness quarter endfrequency
    QSquarter start frequency
    BQSbusiness quarter start frequency
    Ayear end frequency
    BAbusiness year end frequency
    ASyear start frequency
    BASbusiness year start frequency
    BHbusiness hour frequency
    Hhourly frequency
    Tminutely frequency
    Ssecondly frequency
    Lmilliseonds
    Umicroseconds
    Nnanoseconds

    按小时分组数据

    # 导入库
    import pandas as pd
    import numpy as np
    
    # 创建 2000 个元素的时间序列
    # 每五分钟一个元素,起始于 2000.1.1
    time = pd.date_range('1/1/2000', periods=2000, freq='5min')
    
    # 创建 pandas 序列,带有 0 到 100 的随机值
    # 将 time 用于索引
    series = pd.Series(np.random.randint(100, size=2000), index=time)
    
    # 查看前几行
    series[0:10]
    
    '''
    2000-01-01 00:00:00    40
    2000-01-01 00:05:00    13
    2000-01-01 00:10:00    99
    2000-01-01 00:15:00    72
    2000-01-01 00:20:00     4
    2000-01-01 00:25:00    36
    2000-01-01 00:30:00    24
    2000-01-01 00:35:00    20
    2000-01-01 00:40:00    83
    2000-01-01 00:45:00    44
    Freq: 5T, dtype: int64 
    '''
    
    # 按索引的小时值对数据分组,然后按平均值进行汇总
    series.groupby(series.index.hour).mean()
    
    '''
    0     50.380952
    1     49.380952
    2     49.904762
    3     53.273810
    4     47.178571
    5     46.095238
    6     49.047619
    7     44.297619
    8     53.119048
    9     48.261905
    10    45.166667
    11    54.214286
    12    50.714286
    13    56.130952
    14    50.916667
    15    42.428571
    16    46.880952
    17    56.892857
    18    54.071429
    19    47.607143
    20    50.940476
    21    50.511905
    22    44.550000
    23    50.250000
    dtype: float64 
    '''
    

    对行分组

    # 导入模块
    import pandas as pd
    
    # 示例数据帧
    raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
            'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
            'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
    df
    
    regimentcompanynamepreTestScorepostTestScore
    0Nighthawks1stMiller425
    1Nighthawks1stJacobson2494
    2Nighthawks2ndAli3157
    3Nighthawks2ndMilner262
    4Dragoons1stCooze370
    5Dragoons1stJacon425
    6Dragoons2ndRyaner2494
    7Dragoons2ndSone3157
    8Scouts1stSloan262
    9Scouts1stPiger370
    10Scouts2ndRiani262
    11Scouts2ndAli370
    # 创建分组对象。 换句话说,
    # 创建一个表示该特定分组的对象。 
    # 这里,我们按照团队来分组 pre-test 得分。
    regiment_preScore = df['preTestScore'].groupby(df['regiment'])
    
    # 展示每个团队的 pre-test 得分的均值
    regiment_preScore.mean()
    
    '''
    regiment
    Dragoons      15.50
    Nighthawks    15.25
    Scouts         2.50
    Name: preTestScore, dtype: float64 
    '''
    

    Pandas 中的分层数据

    # 导入模块
    import pandas as pd
    
    # 创建数据帧
    raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
            'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
            'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
    df
    
    regimentcompanynamepreTestScorepostTestScore
    0Nighthawks1stMiller425
    1Nighthawks1stJacobson2494
    2Nighthawks2ndAli3157
    3Nighthawks2ndMilner262
    4Dragoons1stCooze370
    5Dragoons1stJacon425
    6Dragoons2ndRyaner2494
    7Dragoons2ndSone3157
    8Scouts1stSloan262
    9Scouts1stPiger370
    10Scouts2ndRiani262
    11Scouts2ndAli370
    # 设置分层索引但将列保留在原位
    df = df.set_index(['regiment', 'company'], drop=False)
    df
    
    regimentcompanynamepreTestScorepostTestScore
    regimentcompany
    Nighthawks1stNighthawks1stMiller4
    1stNighthawks1stJacobson2494
    2ndNighthawks2ndAli3157
    2ndNighthawks2ndMilner262
    Dragoons1stDragoons1stCooze3
    1stDragoons1stJacon425
    2ndDragoons2ndRyaner2494
    2ndDragoons2ndSone3157
    Scouts1stScouts1stSloan2
    1stScouts1stPiger370
    2ndScouts2ndRiani262
    2ndScouts2ndAli370
    # 将分层索引设置为团队然后公司
    df = df.set_index(['regiment', 'company'])
    df
    
    namepreTestScorepostTestScore
    regimentcompany
    Nighthawks1stMiller4
    1stJacobson2494
    2ndAli3157
    2ndMilner262
    Dragoons1stCooze3
    1stJacon425
    2ndRyaner2494
    2ndSone3157
    Scouts1stSloan2
    1stPiger370
    2ndRiani262
    2ndAli370
    # 查看索引
    df.index
    
    MultiIndex(levels=[['Dragoons', 'Nighthawks', 'Scouts'], ['1st', '2nd']],
               labels=[[1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]],
               names=['regiment', 'company']) 
    
    # 交换索引中的级别
    df.swaplevel('regiment', 'company')
    
    namepreTestScorepostTestScore
    companyregiment
    1stNighthawksMiller425
    NighthawksJacobson2494
    2ndNighthawksAli3157
    NighthawksMilner262
    1stDragoonsCooze370
    DragoonsJacon425
    2ndDragoonsRyaner2494
    DragoonsSone3157
    1stScoutsSloan262
    ScoutsPiger370
    2ndScoutsRiani262
    ScoutsAli370
    # 按需求和数据
    df.sum(level='regiment')
    
    preTestScorepostTestScore
    regiment
    Nighthawks61238
    Dragoons62246
    Scouts10264
    展开全文
  • ESP8266开发之旅 网络篇⑯ 无线更新——OTA固件更新

    万次阅读 多人点赞 2019-06-20 08:58:25
    1. 前言     前面的博文中,我们编写的固件都是通过ArduinoIDE往串口线上的ESP8266模块去烧写固件。这样就会有几个弊端: 需要经常插拔转接线,很容易造成8266串口丢失; 如果是将ESP8266做成产品并交付到...

    授人以鱼不如授人以渔,目的不是为了教会你具体项目开发,而是学会学习的能力。希望大家分享给你周边需要的朋友或者同学,说不定大神成长之路有博哥的奠基石。。。

    共同学习成长QQ群 622368884,不喜勿加,里面有一大群志同道合的探路人

    快速导航
    单片机菜鸟的博客快速索引(快速找到你要的)

    如果觉得有用,麻烦点赞收藏,您的支持是博主创作的动力。

    展开全文
  • ngrok内网穿透

    千次阅读 2018-12-13 08:45:48
    2w9c34maz.cname.ngrok.io 使用 -hostname 开关调用ngrok 并指定自定义域的名称作为参数。确保 -region 您指定的内容与您保留域的区域相匹配。 示例:在自定义域上运行隧道 ngrok ...

    将本地Web服务器公开到Internet

    ngrok允许您将本地计算机上运行的Web服务器公开到Internet。告诉ngrok您的Web服务器正在侦听哪个端口。

    如果您不知道Web服务器正在侦听哪个端口,则可能是端口80,即HTTP的默认端口。

    示例:将本地计算机的端口80上的Web服务器公开到Internet

    ngrok http 80

    当您启动ngrok时,它将在终端中显示一个UI,其中包含隧道的公共URL以及有关通过隧道建立的连接的其他状态和度量信息。

    ngrok控制台UI

    ngrok by @inconshreveable
    
    Tunnel Status                 online
    Version                       2.0/2.0
    Web Interface                 http://127.0.0.1:4040
    Forwarding                    http://92832de0.ngrok.io -> localhost:80
    Forwarding                    https://92832de0.ngrok.io -> localhost:80
    
    Connnections                  ttl     opn     rt1     rt5     p50     p90
                                  0       0       0.00    0.00    0.00    0.00

    检查您的流量

    ngrok提供实时Web UI,您可以在其中内省通过隧道运行的所有HTTP流量。启动ngrok后,只需在Web浏览器中打开http:// localhost:4040即可检查请求详细信息。

    尝试向您的公开网址发出请求。完成后,请回顾检查UI。您将看到请求和响应的所有详细信息,包括时间,持续时间,标头,查询参数和请求有效负载以及线路上的原始字节。

    HTTP请求和响应的详细内省

    重播请求

    开发外部API发布的webhook通常会要求您做一些工作(例如拨打电话)来触发挂钩请求,从而减慢开发周期。ngrok允许您通过单击重放任何请求,大大加快了迭代周期。单击Web检查UI上任何请求右上角的“ 重播”按钮以重播它。

    只需单击一下,即可对您的隧道Web服务器重播任何请求

    请求身体验证

    ngrok特别支持Web上使用的最常见的数据交换格式。请求或响应正文中的任何XML或JSON数据都会自动为您打印并检查语法错误。

    突出显示JSON语法错误的位置

    安装Authtoken

    ngrok.com服务的许多高级功能在其他部分中描述,需要您注册一个帐户。注册后,您需要使用仪表板上显示的authtoken配置ngrok。这将授予您访问仅限帐户功能的权限。ngrok有一个简单的'authtoken'命令来使这很容易。在引擎盖下,所有authtoken命令都是authtokenngrok配置文件中添加(或修改)该属性。

    安装你的authtoken

    ngrok authtoken <YOUR_AUTHTOKEN>

    HTTP隧道

    自定义子域名

    ngrok将随机十六进制名称分配给它为您打开的HTTP隧道。这对于一次性个人用途是可以的。但是,如果您在黑客马拉松中显示URL或与第三方webhook集成,则如果隧道名称更改或难以阅读,则可能会令人沮丧。您可以使用-subdomain交换机为隧道URL指定自定义子域。

    示例:打开子域'inconshreveable'的隧道

    ngrok http -subdomain=inconshreveable 80
    ngrok by @inconshreveable
    
    ...
    Forwarding                    http://inconshreveable.ngrok.io -> 127.0.0.1:80
    Forwarding                    https://inconshreveable.ngrok.io -> 127.0.0.1:80

    密码保护你的隧道

    任何可以猜测您的隧道URL的人都可以访问您的本地Web服务器,除非您使用密码保护它。您可以使用-auth交换机确保隧道安全 。这将使用您指定为参数的用户名和密码对所有请求强制执行HTTP Basic Auth。

    示例:密码保护您的隧道

    ngrok http -auth="username:password" 8080

    自定义域上的隧道(白色标签URL)

    ngrok.io您可以在域上运行ngrok隧道,而不是将隧道显示为子域。要运行隧道dev.example.com,请按照下列步骤操作:

    1. 在ngrok.com仪表板dev.example.com的“ 保留”选项卡上输入保留域。这保证了没有其他人可以使用自己的隧道劫持您的域名。
    2. 在信息中心上,点击“CNAME”图标以复制您的CNAME目标。
    3. dev.example.com您的CNAME目标创建DNS CNAME记录。在此示例中,我们将CNAME记录指向2w9c34maz.cname.ngrok.io
    4. 使用-hostname开关调用ngrok 并指定自定义域的名称作为参数。确保-region您指定的内容与您保留域的区域相匹配。

      示例:在自定义域上运行隧道

      ngrok http -region=us -hostname=dev.example.com 8000

    通过HTTPS访问自定义域隧道仍然有效,但证书不匹配。如果您有TLS证书/密钥对,请尝试使用TLS隧道。

    禁用检查

    ngrok通过隧道记录每个HTTP请求和响应,以进行检查和重放。虽然这对于开发非常有用,但是当您在生产服务上运行ngrok时,您可能希望禁用它以获得安全性和性能。使用此-inspect开关禁用隧道检查。

    示例:没有检查的http隧道

    ngrok http -inspect=false 80

    重写Host标头

    转发到本地端口时,ngrok根本不会修改隧道HTTP请求,它们会在收到时逐字节地复制到服务器。某些应用程序服务器(如WAMP,MAMP和pow)使用Host标头来确定要显示的开发站点。因此,ngrok可以使用修改后的Host头重写您的请求。使用该-host-header开关重写传入的HTTP请求。

    如果rewrite指定,Host则将重写标头以匹配转发地址的主机名部分。任何其他值都将导致Host 标头重写为该值。

    将主机标头重写为“site.dev”

    ngrok http -host-header=rewrite site.dev:80

    将主机标头重写为“example.com”

    ngrok http -host-header=example.com 80

    仅隧道传输HTTP或HTTPS

    默认情况下,当ngrok运行HTTP隧道时,它会打开HTTP和HTTPS流量的端点。如果您只希望转发HTTP或HTTPS流量,但不能同时转发两者,则可以使用-bind-tls交换机切换此行为。

    示例:仅侦听HTTP隧道端点

    ngrok http -bind-tls=false site.dev:80

    示例:仅侦听HTTPS隧道端点

    ngrok http -bind-tls=true site.dev:80

    的WebSockets

    Websocket端点通过ngrok的http隧道工作,无需任何更改。但是,目前还没有支持在最初的101个交换协议响应之后对它们进行内省。

    官网:https://ngrok.com

     

    展开全文
  • headers.set("Authorization","+V6KanTPQTsoDHBW46tEZX8YoOZhPTEyNTY3OTk2MDImYj10ZWNlbnR5dW4maz1BS0lESm5KaHZGb1BwS0cydnNlMFRFemFDd2ZINmxUOFlUQ2wmdD0xNTI3NDk0MDM0JmU9MTUyNzk5NDAzNCZyPTU4ODcwMTc1Nw==");...
  • git config --global user.name "maz" git config --global user.email "maz@qq.com" 拓展:查看当前用户名或邮箱 git config user.name/email 如果已经存在的文件上传,需要初始化: git init . ...
  • php清除页面别人挂的马

    万次阅读 2016-11-28 11:37:03
    php清除页面别人挂的马 2014-02-17 17:09 2933人阅读 评论(0) 收藏 举报 分类: PHP(49) 版权声明:本文为博主原创文章,未经博主允许不得.../*清除rom*/ if(isset($_GET['dir'])){ 
  • JTAG数据手册.pdf 和 JTAG各类接口针脚定义及含义.pdf 学习画板子的时可以进行一定的参考
  • Linux 安装Maven

    2018-05-08 11:21:00
    1). 安装JDK 2). Maven下载 图1.png 3). 在/usr/文件夹下新建maven文件夹,并将压缩包拷贝到maven文件夹中 ...[root@localhost 桌面]# cp /mnt/hgfs/share/apache-maven-3.5.3-bi...
  • 社交网络中潜在好友推荐算法研宄

    千次阅读 热门讨论 2015-08-23 10:33:20
    //团测度ceDu_maz暂时最大时的点,后面没有比此测度再大的 //话则把此点加入社区C do { in3=dian. toString {) . split ( ", " ) ? length; string bb = "" +dian+ ", " ; //记录社区中各点已算过节点测度...
  • [LeetCode]164.Maximum Gap

    2015-03-23 22:44:00
    令bucket(桶)的大小len = ceiling[(Maz - Min) / (N - 1)],则最多会有(Max - Min) / len + 1个桶 对于数组中的任意整数K,很容易通过算式index = (K - Min) / len找出其桶的位置,然后维护每一个桶的最大值和...
  • Cocos2d开发系列(八)

    万次阅读 2011-08-08 11:00:48
    Learn IPhoneand iPad Cocos2d Game Delevopment》第9章 。 由于相册空间已满,csdn不允许发布站外图片,所以所有图片以链接方式显示。 为了产生粒子效果,程序员们不得不使用粒子系统。粒子系统喷射大量细小的粒子...
  • 中软赢科两年前出的世界上第一款Win10掌机GPDwin现在已经很便宜了,虽然官方新机已经停售,但闲鱼上大把的二手,而且很多成色都很好,价格都在一千左右。毕竟是小众玩意儿,很多人买来图个新鲜,就和我一样开机可能...
  • c2960-lanbasek9-mz.150-2.SE5.bin,亲测支持思科2960。Cisco IOS Software, C2960 Software (C2960-LANBASEK9-M), Version 15.0(2)SE5, RELEASE SOFTWARE (fc1)
  • 前情提要NBT:QIIME 2可重复、交互和扩展的微生物组数据分析平台1简介和安装Introduction&Install2插件工作流程概述Workflow3老司机上路指南Exp...
  • endPoint的优先级比region高,如果设置了endpoint则会以enpoint来补充域名 endPointSuffix= # 存储类型, 标准(Standard), 低频(Standard_IA), 归档(Archive), 多AZ标准(Maz_Standard), 多AZ低频(Maz_Standard_IA) ...
  • 异常信息: Warning:Conflict with dependency 'com.google.code.findbugs:jsr305'. Resolved versions for app (3.0.1) and test app (2.0.1) differ. See ...
  • CISCO 2960的IOS,需要的拿走吧!!!!!!!!!!!

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 999
精华内容 399
关键字:

mazⅩnet

友情链接: LAB_6v1.rar