精华内容
下载资源
问答
  • pandas DataFrame数据转为list

    万次阅读 多人点赞 2017-05-21 22:46:10
    首先使用np.array()函数把DataFrame转化为np.ndarray(),再利用tolist()函数把np.ndarray()转为list,示例代码如下:# -*- coding:utf-8-*- import numpy as np import pandas as pddata_x = pd.read_csv("E:/...

    首先使用np.array()函数把DataFrame转化为np.ndarray(),再利用tolist()函数把np.ndarray()转为list,示例代码如下:

    # -*- coding:utf-8-*-
    import numpy as np
    import pandas as pd
    
    data_x = pd.read_csv("E:/Tianchi/result/features.csv",usecols=[2,3,4])#pd.dataframe
    data_y =  pd.read_csv("E:/Tianchi/result/features.csv",usecols=[5])
    
    train_data = np.array(data_x)#np.ndarray()
    train_x_list=train_data.tolist()#list
    print(train_x_list)
    print(type(train_x_list))
    
    展开全文
  • Python中pandas dataframe删除一行或一列:drop函数

    万次阅读 多人点赞 2018-02-10 20:10:25
    用法:DataFrame.drop(labels=None,axis=0, index=None, columns=None, inplace=False) 在这里默认:axis=0,指删除index,因此删除columns时要指定axis=1; inplace=False,默认该删除操作不改变原数据,而是返回...

    用法:DataFrame.drop(labels=None,axis=0, index=None, columns=None, inplace=False)

    参数说明:
    labels 就是要删除的行列的名字,用列表给定
    axis 默认为0,指删除行,因此删除columns时要指定axis=1;
    index 直接指定要删除的行
    columns 直接指定要删除的列
    inplace=False,默认该删除操作不改变原数据,而是返回一个执行删除操作后的新dataframe;
    inplace=True,则会直接在原数据上进行删除操作,删除后无法返回。

    因此,删除行列有两种方式:
    1)labels=None,axis=0 的组合
    2)index或columns直接指定要删除的行或列

    例子:

    >>>df = pd.DataFrame(np.arange(12).reshape(3,4), columns=['A', 'B', 'C', 'D'])
    
    >>>df
    
       A   B   C   D
    
    0  0   1   2   3
    
    1  4   5   6   7
    
    2  8   9  10  11
    
    #Drop columns,两种方法等价
    
    >>>df.drop(['B', 'C'], axis=1)
    
       A   D
    
    0  0   3
    
    1  4   7
    
    2  8  11
    
    >>>df.drop(columns=['B', 'C'])
    
       A   D
    
    0  0   3
    
    1  4   7
    
    2  8  11
    
    # 第一种方法下删除column一定要指定axis=1,否则会报错
    >>> df.drop(['B', 'C'])
    
    ValueError: labels ['B' 'C'] not contained in axis
    
    #Drop rows
    >>>df.drop([0, 1])
    
       A  B   C   D
    
    2  8  9  10  11
    
    >>> df.drop(index=[0, 1])
    
       A  B   C   D
       
    2  8  9  10  11
    

    Life is short, You need Python~

    展开全文
  • 有时候DataFrame中的行列数量太多,print打印出来会显示不完全。就像下图这样: 列显示不全: 行显示不全: 添加如下代码,即可解决。 #显示所有列 pd.set_option('display.max_...

    有时候DataFrame中的行列数量太多,print打印出来会显示不完全。就像下图这样:

    列显示不全:

    这里写图片描述

    行显示不全:

    这里写图片描述

    添加如下代码,即可解决。

    #显示所有列
    pd.set_option('display.max_columns', None)
    #显示所有行
    pd.set_option('display.max_rows', None)
    #设置value的显示长度为100,默认为50
    pd.set_option('max_colwidth',100)
    

    根据自己的需要更改相应的设置即可。

    ps:set_option()的所有属性:

    Available options:
    
    - display.[chop_threshold, colheader_justify, column_space, date_dayfirst,
      date_yearfirst, encoding, expand_frame_repr, float_format, height, large_repr]
    - display.latex.[escape, longtable, repr]
    - display.[line_width, max_categories, max_columns, max_colwidth,
      max_info_columns, max_info_rows, max_rows, max_seq_items, memory_usage,
      mpl_style, multi_sparse, notebook_repr_html, pprint_nest_depth, precision,
      show_dimensions]
    - display.unicode.[ambiguous_as_wide, east_asian_width]
    - display.[width]
    - io.excel.xls.[writer]
    - io.excel.xlsm.[writer]
    - io.excel.xlsx.[writer]
    - io.hdf.[default_format, dropna_table]
    - mode.[chained_assignment, sim_interactive, use_inf_as_null]
    
    Parameters
    ----------
    pat : str
        Regexp which should match a single option.
        Note: partial matches are supported for convenience, but unless you use the
        full option name (e.g. x.y.z.option_name), your code may break in future
        versions if new options with similar names are introduced.
    value :
        new value of option.
    
    Returns
    -------
    None
    
    Raises
    ------
    OptionError if no such option exists
    
    Notes
    -----
    The available options with its descriptions:
    
    display.chop_threshold : float or None
        if set to a float value, all float values smaller then the given threshold
        will be displayed as exactly 0 by repr and friends.
        [default: None] [currently: None]
    
    display.colheader_justify : 'left'/'right'
        Controls the justification of column headers. used by DataFrameFormatter.
        [default: right] [currently: right]
    
    display.column_space No description available.
        [default: 12] [currently: 12]
    
    display.date_dayfirst : boolean
        When True, prints and parses dates with the day first, eg 20/01/2005
        [default: False] [currently: False]
    
    display.date_yearfirst : boolean
        When True, prints and parses dates with the year first, eg 2005/01/20
        [default: False] [currently: False]
    
    display.encoding : str/unicode
        Defaults to the detected encoding of the console.
        Specifies the encoding to be used for strings returned by to_string,
        these are generally strings meant to be displayed on the console.
        [default: UTF-8] [currently: UTF-8]
    
    display.expand_frame_repr : boolean
        Whether to print out the full DataFrame repr for wide DataFrames across
        multiple lines, `max_columns` is still respected, but the output will
        wrap-around across multiple "pages" if its width exceeds `display.width`.
        [default: True] [currently: True]
    
    display.float_format : callable
        The callable should accept a floating point number and return
        a string with the desired format of the number. This is used
        in some places like SeriesFormatter.
        See formats.format.EngFormatter for an example.
        [default: None] [currently: None]
    
    display.height : int
        Deprecated.
        [default: 60] [currently: 60]
        (Deprecated, use `display.max_rows` instead.)
    
    display.large_repr : 'truncate'/'info'
        For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can
        show a truncated table (the default from 0.13), or switch to the view from
        df.info() (the behaviour in earlier versions of pandas).
        [default: truncate] [currently: truncate]
    
    display.latex.escape : bool
        This specifies if the to_latex method of a Dataframe uses escapes special
        characters.
        method. Valid values: False,True
        [default: True] [currently: True]
    
    display.latex.longtable :bool
        This specifies if the to_latex method of a Dataframe uses the longtable
        format.
        method. Valid values: False,True
        [default: False] [currently: False]
    
    display.latex.repr : boolean
        Whether to produce a latex DataFrame representation for jupyter
        environments that support it.
        (default: False)
        [default: False] [currently: False]
    
    display.line_width : int
        Deprecated.
        [default: 80] [currently: 80]
        (Deprecated, use `display.width` instead.)
    
    display.max_categories : int
        This sets the maximum number of categories pandas should output when
        printing out a `Categorical` or a Series of dtype "category".
        [default: 8] [currently: 8]
    
    display.max_columns : int
        If max_cols is exceeded, switch to truncate view. Depending on
        `large_repr`, objects are either centrally truncated or printed as
        a summary view. 'None' value means unlimited.
    
        In case python/IPython is running in a terminal and `large_repr`
        equals 'truncate' this can be set to 0 and pandas will auto-detect
        the width of the terminal and print a truncated object which fits
        the screen width. The IPython notebook, IPython qtconsole, or IDLE
        do not run in a terminal and hence it is not possible to do
        correct auto-detection.
        [default: 20] [currently: 20]
    
    display.max_colwidth : int
        The maximum width in characters of a column in the repr of
        a pandas data structure. When the column overflows, a "..."
        placeholder is embedded in the output.
        [default: 50] [currently: 200]
    
    display.max_info_columns : int
        max_info_columns is used in DataFrame.info method to decide if
        per column information will be printed.
        [default: 100] [currently: 100]
    
    display.max_info_rows : int or None
        df.info() will usually show null-counts for each column.
        For large frames this can be quite slow. max_info_rows and max_info_cols
        limit this null check only to frames with smaller dimensions than
        specified.
        [default: 1690785] [currently: 1690785]
    
    display.max_rows : int
        If max_rows is exceeded, switch to truncate view. Depending on
        `large_repr`, objects are either centrally truncated or printed as
        a summary view. 'None' value means unlimited.
    
        In case python/IPython is running in a terminal and `large_repr`
        equals 'truncate' this can be set to 0 and pandas will auto-detect
        the height of the terminal and print a truncated object which fits
        the screen height. The IPython notebook, IPython qtconsole, or
        IDLE do not run in a terminal and hence it is not possible to do
        correct auto-detection.
        [default: 60] [currently: 60]
    
    display.max_seq_items : int or None
        when pretty-printing a long sequence, no more then `max_seq_items`
        will be printed. If items are omitted, they will be denoted by the
        addition of "..." to the resulting string.
    
        If set to None, the number of items to be printed is unlimited.
        [default: 100] [currently: 100]
    
    display.memory_usage : bool, string or None
        This specifies if the memory usage of a DataFrame should be displayed when
        df.info() is called. Valid values True,False,'deep'
        [default: True] [currently: True]
    
    display.mpl_style : bool
        Setting this to 'default' will modify the rcParams used by matplotlib
        to give plots a more pleasing visual style by default.
        Setting this to None/False restores the values to their initial value.
        [default: None] [currently: None]
    
    display.multi_sparse : boolean
        "sparsify" MultiIndex display (don't display repeated
        elements in outer levels within groups)
        [default: True] [currently: True]
    
    display.notebook_repr_html : boolean
        When True, IPython notebook will use html representation for
        pandas objects (if it is available).
        [default: True] [currently: True]
    
    display.pprint_nest_depth : int
        Controls the number of nested levels to process when pretty-printing
        [default: 3] [currently: 3]
    
    display.precision : int
        Floating point output precision (number of significant digits). This is
        only a suggestion
        [default: 6] [currently: 6]
    
    display.show_dimensions : boolean or 'truncate'
        Whether to print out dimensions at the end of DataFrame repr.
        If 'truncate' is specified, only print out the dimensions if the
        frame is truncated (e.g. not display all rows and/or columns)
        [default: truncate] [currently: truncate]
    
    display.unicode.ambiguous_as_wide : boolean
        Whether to use the Unicode East Asian Width to calculate the display text
        width.
        Enabling this may affect to the performance (default: False)
        [default: False] [currently: False]
    
    display.unicode.east_asian_width : boolean
        Whether to use the Unicode East Asian Width to calculate the display text
        width.
        Enabling this may affect to the performance (default: False)
        [default: False] [currently: False]
    
    display.width : int
        Width of the display in characters. In case python/IPython is running in
        a terminal this can be set to None and pandas will correctly auto-detect
        the width.
        Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
        terminal and hence it is not possible to correctly detect the width.
        [default: 80] [currently: 80]
    
    io.excel.xls.writer : string
        The default Excel writer engine for 'xls' files. Available options:
        'xlwt' (the default).
        [default: xlwt] [currently: xlwt]
    
    io.excel.xlsm.writer : string
        The default Excel writer engine for 'xlsm' files. Available options:
        'openpyxl' (the default).
        [default: openpyxl] [currently: openpyxl]
    
    io.excel.xlsx.writer : string
        The default Excel writer engine for 'xlsx' files. Available options:
        'xlsxwriter' (the default), 'openpyxl'.
        [default: xlsxwriter] [currently: xlsxwriter]
    
    io.hdf.default_format : format
        default format writing format, if None, then
        put will default to 'fixed' and append will default to 'table'
        [default: None] [currently: None]
    
    io.hdf.dropna_table : boolean
        drop ALL nan rows when appending to a table
        [default: False] [currently: False]
    
    mode.chained_assignment : string
        Raise an exception, warn, or no action if trying to use chained assignment,
        The default is warn
        [default: warn] [currently: warn]
    
    mode.sim_interactive : boolean
        Whether to simulate interactive mode for purposes of testing
        [default: False] [currently: False]
    
    mode.use_inf_as_null : boolean
        True means treat None, NaN, INF, -INF as null (old way),
        False means None and NaN are null, but INF, -INF are not null
        (new way).
        [default: False] [currently: False]
    
    展开全文
  • Pandas:DataFrame对象的基础操作

    万次阅读 多人点赞 2017-07-18 22:36:17
    DataFrame对象的创建,修改,合并import pandas as pd import numpy as np创建DataFrame对象# 创建DataFrame对象 df = pd.DataFrame([1, 2, 3, 4, 5], columns=['cols'], index=['a','b','c','d','e']) df ....

    DataFrame对象的创建,修改,合并

    
    import pandas as pd
    import numpy as np
    

    创建DataFrame对象

    # 创建DataFrame对象
    df = pd.DataFrame([1, 2, 3, 4, 5], columns=['cols'], index=['a','b','c','d','e'])
    print df
    
       cols
    a     1
    b     2
    c     3
    d     4
    e     5
    
    df2 = pd.DataFrame([[1, 2, 3],[4, 5, 6]], columns=['col1','col2','col3'], index=['a','b'])
    print df2
    
       col1  col2  col3
    a     1     2     3
    b     4     5     6
    
    df3 = pd.DataFrame(np.array([[1,2],[3,4]]), columns=['col1','col2'], index=['a','b'])
    print df3
    
       col1  col2
    a     1     2
    b     3     4
    
    df4 = pd.DataFrame({'col1':[1,3],'col2':[2,4]},index=['a','b'])
    print df4
    
       col1  col2
    a     1     2
    b     3     4
    
    创建DataFrame对象的数据可以为列表,数组和字典,列名和索引为列表对象
    

    基本操作

    # DataFrame对象的基本操作
    df2.index
    
    Index([u'a', u'b'], dtype='object')
    
    df2.columns
    
    Index([u'col1', u'col2', u'col3'], dtype='object')
    
    # 根据索引查看数据
    df2.loc['a']   
    # 索引为a这一行的数据
    # df2.iloc[0] 跟上面的操作等价,一个是根据索引名,一个是根据数字索引访问数据
    
    col1    1
    col2    2
    col3    3
    Name: a, dtype: int64
    
    print df2.loc[['a','b']]    # 访问多行数据,索引参数为一个列表对象
    
       col1  col2  col3
    a     1     2     3
    b     4     5     6
    
    print df.loc[df.index[1:3]]
    
       cols
    b     2
    c     3
    
    # 访问列数据
    print df2[['col1','col3']]
    
       col1  col3
    a     1     3
    b     4     6
    

    计算

    # DataFrame元素求和
    # 默认是对每列元素求和
    print df2.sum()
    
    col1    5
    col2    7
    col3    9
    dtype: int64
    
    # 行求和
    print df2.sum(1)
    
    a     6
    b    15
    dtype: int64
    
    # 对每个元素乘以2
    print df2.apply(lambda x:x*2)
    
       col1  col2  col3
    a     2     4     6
    b     8    10    12
    
    # 对每个元素求平方(支持ndarray一样的向量化操作)
    print df2**2
    
       col1  col2  col3
    a     1     4     9
    b    16    25    36
    

    列扩充

    # 对DataFrame对象进行列扩充
    df2['col4'] = ['cnn','rnn']
    print df2
    
       col1  col2  col3 col4
    a     1     2     3  cnn
    b     4     5     6  rnn
    
    # 也可以通过一个新的DataFrame对象来定义一个新列,索引自动对应
    df2['col5'] = pd.DataFrame(['MachineLearning','DeepLearning'],index=['a','b'])
    print df2
    
       col1  col2  col3 col4             col5
    a     1     2     3  cnn  MachineLearning
    b     4     5     6  rnn     DeepLearning
    

    行扩充

    # 行进行扩充
    print df2.append(pd.DataFrame({'col1':7,'col2':8,'col3':9,'col4':'rcnn','col5':'ReinforcementLearning'},index=['c']))
    
       col1  col2  col3  col4                   col5
    a     1     2     3   cnn        MachineLearning
    b     4     5     6   rnn           DeepLearning
    c     7     8     9  rcnn  ReinforcementLearning
    

    注意!

    # 如果在进行 行扩充时候没有,指定index的参数,索引会被数字取代
    print df2.append({'col1':10,'col2':11,'col3':12,'col4':'frnn','col5':'DRL'},ignore_index=True)
    
       col1  col2  col3  col4             col5
    0     1     2     3   cnn  MachineLearning
    1     4     5     6   rnn     DeepLearning
    2    10    11    12  frnn              DRL
    
    # 以上的行扩充,并没有真正修改,df2这个DataFrame对象,除非
    df2 = df2.append(pd.DataFrame({'col1':7,'col2':8,'col3':9,'col4':'rcnn','col5':'ReinforcementLearning'},index=['c']))
    print df2
    
       col1  col2  col3  col4                   col5
    a     1     2     3   cnn        MachineLearning
    b     4     5     6   rnn           DeepLearning
    c     7     8     9  rcnn  ReinforcementLearning
    c     7     8     9  rcnn  ReinforcementLearning
    
    print df2.loc['c']
    
       col1  col2  col3  col4                   col5
    c     7     8     9  rcnn  ReinforcementLearning
    c     7     8     9  rcnn  ReinforcementLearning
    

    DataFrame对象的合并

    # DataFrame 对象的合并
    df_a = pd.DataFrame(['wang','jing','hui','is','a','master'],columns=['col6'],index=['a','b','c','d','e','f'])
    print df_a
    
         col6
    a    wang
    b    jing
    c     hui
    d      is
    e       a
    f  master
    
    # 默认合并,只保留dfb中的全部索引
    dfb = pd.DataFrame([1,2,4,5,6,7],columns=['col1'],index=['a','b','c','d','f','g'])
    print dfb.join(df_a)
    
       col1    col6
    a     1    wang
    b     2    jing
    c     4     hui
    d     5      is
    f     6  master
    g     7     NaN
    
    # 默认合并之接受索引已经存在的值
    # 通过指定参数 how,指定合并的方式
    print dfb.join(df_a,how='inner')   # 合并两个DataFrame对象的交集
    
       col1    col6
    a     1    wang
    b     2    jing
    c     4     hui
    d     5      is
    f     6  master
    
    # 合并两个DataFrame对象的并集
    print dfb.join(df_a,how='outer')
    
       col1    col6
    a   1.0    wang
    b   2.0    jing
    c   4.0     hui
    d   5.0      is
    e   NaN       a
    f   6.0  master
    g   7.0     NaN
    

    安利一下,公众号:唐牛才是食神

    在这里插入图片描述

    展开全文
  • Python在Dataframe中新添加一列

    万次阅读 多人点赞 2019-08-13 16:31:25
    在敲代码的过程中,老是会遇到在Dataframe中新添加一列的情况,每次都要重新google,这次做个记录。 其实在Dataframe中新添加一列很简单,直接指明列名,然后赋值就可以了。 import pandas as pd data = pd....
  • pandas 入门:DataFrame的创建,读写,插入和删除

    万次阅读 多人点赞 2016-10-26 22:52:51
    不得不说DataFrame现在很火,已经有很多库都是基于DataFrame写的,而且它用起来也很方便,读excel只需要一行代码,起使用xlrd的日子,至今还脑壳疼,所以对于一个用python做数据处理的人来说,pandas是必须要了解的...
  • pandas的dataframe如何更改数据类型?

    万次阅读 多人点赞 2018-07-22 20:38:16
    pandas的dataframe数据类型转换 在使用pandas库进行数据分析时,有时候会需要将object类型转换成数值类型(float,int),那么如何做呢? 主要有以下三种方法:创建时指定类型,df.astype强制类型转换,以及使用pd....
  • Pandas把dataframe或series转换成list

    万次阅读 多人点赞 2019-08-12 12:25:15
    dataframe转换为list 输入多维dataframe: df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9], 'b':[3,5,6,2,4,6,7,8,7,8,9]}) 把a列的元素转换成list: # 方法1 df['a'].values.tolist() # 方法2 df['a']....
  • pandas.DataFrame删除/选取含有特定数值的行或列

    万次阅读 多人点赞 2018-06-20 09:13:59
    1.删除/选取某列含有特殊数值的行 import pandas as pd import numpy as np ...df1=pd.DataFrame(a,index=['row0','row1','row2'],columns=list('ABC')) print(df1) df2=df1.copy() #删除/选取...
  • python将字典转换成dataframe数据框

    万次阅读 多人点赞 2018-12-17 18:13:54
    需要将字典转换成dataframe来操作,并且将字典的key,value分别作为dataframe两列。 数据样例如下: 一个key只有一个value的字典如果直接转化成数据框会报错: 如下两种方法可达成目标。 一,将字典转换成...
  • 作者:lianghc ... ...pandas提供了一个类似于关系数据库的连接(join)操作的方法merage,可以根据一个或多个键将不同DataFrame中的行连接起来 语法如下: merge(left, right, how='inner', on=No...
  • Python中DataFrame按照行遍历

    万次阅读 2017-09-07 13:49:21
    在做分类模型时候,需要在DataFrame中按照行获取数据以便于进行训练和测试。import pandas as pddict=[[1,2,3,4,5,6],[2,3,4,5,6,7],[3,4,5,6,7,8],[4,5,6,7,8,9],[5,6,7,8,9,10]] data=pd.DataFrame(dict) print...
  • python中pandas库中DataFrame对行和列的操作使用方法

    万次阅读 多人点赞 2016-11-10 01:15:19
    这个repo 用来记录一些python技巧、书籍...用pandas中的DataFrame时选取行或列: import numpy as np import pandas as pd from pandas import Sereis, DataFrame ser = Series(np.arange(3.)) data = DataFrame...
  • pandas的DataFrame的append方法详细介绍

    万次阅读 多人点赞 2018-12-11 22:18:06
    官方文档介绍链接:append方法介绍 ...功能说明:向dataframe对象中添加新的行,如果添加的列名不在dataframe对象中,将会被当作新的列进行添加 other:DataFrame、series、dict、list这样的数据结构 i...
  • Python向DataFrame中指定位置添加一列或多列

    万次阅读 多人点赞 2018-08-23 11:40:21
    对于这个问题,相信很多人都会很困惑,本篇文章将会给大家介绍一种非常简单的方式向DataFrame中任意指定的位置添加一列。 在此之前或许有不少读者已经了解了最普通的添加一列的方式,如下: import pandas as pd...
  • Dataframe中的时间是不能直接进行相加减的。如果将两列时间进行加减,会弹出类型错误: TypeError: unsupported operand type(s) for -: 'str' and 'str' 所以需要先用pandas的to_datetime()方法,转化成时间...
  • 例子中定义了多个List数据集合,包括用户信息,订单信息,用户订单信息,将List对象生成DataFrame,使用SparkSQL查询将多个DataFrame合成一个DataFrame,使用Scala语言编写。
  • Pandas DataFrame的基本属性详解

    万次阅读 多人点赞 2019-06-24 19:13:33
    Pandas DataFrame的一些基本属性 基本功能列表 import pandas as pd 导入库 df = pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False) 创建一个DataFrame df.index df.columns df.axes df.T...
  • pandas按行按列遍历Dataframe的几种方式

    万次阅读 多人点赞 2019-02-27 15:24:53
    iterrows(): 按行遍历,将DataFrame的每一行迭代为(index, Series)对,可以通过row[name]对元素进行访问。 itertuples(): 按行遍历,将DataFrame的每一行迭代为元祖,可以通过row[name]对元素进行访问,比iterrows()...
  • Spark DataFrame

    2018-11-06 23:08:46
    DataFrame是一种不可变的分布式数据集,这种数据集被组织成指定的列,类似于关系数据库中的表。SchemaRDD作为Apache Spark 1.0版本中的实验性工作,它在Apache Spark 1.3版本中被命名为DataFrame。对于熟悉Python ...
  • Pandas.DataFrame转置

    万次阅读 多人点赞 2019-06-02 22:52:19
    简述 Motivation sometimes,换一种获取数据的方式,可以提高数据获取的速度。...这些情况下,你可能就会需要遇到DataFrame行列转置的方法。 Contribution 提供了Pandas.DataFrame的行列转置的方法 实验部分...
  • @DataFrame操作 生成空dataframe m = pd.DataFrame() 生成一列全是0的array: m = np.zeros((4500,0)) 合并两个dataframe axis=1为列合并,列增加 axis=0为行合并,行增加 concat([df1,df2],axis=1)
  • 用pandas中的DataFrame时选取行或列

    万次阅读 多人点赞 2017-12-24 21:59:58
    import numpy as np import pandas as pd ...from pandas import Sereis, DataFrame ser = Series(np.arange(3.)) data = DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list
  • 成功解决AttributeError: 'DataFrame' object has no attribute 'reshape' 目录 解决问题 解决思路 解决方法 解决问题 AttributeError: 'DataFrame' object has no attribute 'reshape' 解决思路...
  • 在pandas中遍历DataFrame

    万次阅读 多人点赞 2018-02-22 21:54:05
    有如下 Pandas DataFrame:import pandas as pd inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}] df = pd.DataFrame(inp) print df 上面代码输出: c1 c2 0 10 100 1 11 110 2 12 120 现在...
  • Spark DataFrame 与Pandas DataFrame差异

    千次阅读 2019-01-10 16:10:14
    Spark DataFrame 与Pandas DataFrame差异为何使用pyspark dataframepandas dataframe数据结构特性spark dataframe结构与存储特性spark toPandas详解参考文献 为何使用pyspark dataframe 使用pandas进行数据处理...
  • DataFrame的copy的用法

    万次阅读 2018-10-12 22:11:26
    pandas.DataFrame.copy¶ DataFrame.copy(deep=True) 当deep=false相当于引用,原值改变复制的结果随着改变。 data=DataFrame.copy(deep=True) 等价于 data=DataFrame 假设有DataFrame: data.loc[["a"...
  • Python3 Pandas DataFrame 对某一列求和

    万次阅读 2019-04-24 11:50:05
    Pandas DataFrame 对某一列求和 在操作pandas的DataFrame的时候,常常会遇到某些列是字符串,某一些列是数值的情况,如果直接使用df_obj.apply(sum)往往会出错 使用如下方式即可对其中某一列进行求和 dataf_test1['...
  • pandas dataframe 提取行和列

    万次阅读 多人点赞 2019-01-11 11:27:09
    data = pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]}) 提取列 单列 data['a'] 多列 data[['a', 'b']] 使用 .loc或者 .iloc 提取 第一个参数是行,第二个参数为列 .loc为按标签提取, .iloc为...
  • Pandas给DataFrame赋值

    万次阅读 2018-06-25 16:12:29
    Pandas 设置值 相关代码 创建数据 我们可以根据自己的需求, 用 pandas 进行更改数据里面的值, 或者加上一些空的,或者有数值的列. ...df = pd.DataFrame(np.arange(24).reshape((6,4)),index...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 52,644
精华内容 21,057
关键字:

dataframe