精华内容
下载资源
问答
  • iloc(纯数字筛选)4.ix(标签与数字的混合筛选)5.判断条件筛选 1.普通方法筛选 我们首先构造了一个 5X4 的矩阵数据。 import pandas as pd import numpy as np dates = pd.date_range('20200315', periods = 5) ...
  • 今天小编就为大家分享一篇对pandas中iloc,loc取数据差别及按条件取值的方法详解,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
  • df是一个dataframe,列名为A B C D 具体值如下: A B C D 0 ss 小红 8 1 aa 小明 d 4 f f 6 ak 小紫 7 dataframe里的属性是不定的,空值默认为NA。 一、选取标签为A和C的列...聪明的朋友已经看出iloc和loc的不同了:lo
  • 文章目录切片之一维数组切片之二维数组.iloc()函数slice()函数enumerate() 因为一个切片索引导致一上午在纠结代码肿么回事!ε=(´ο`*)))唉 话不多说,上例子 切片之一维数组 import numpy as np a=np.arange(5)...
  • 主要介绍了详解pandas DataFrame的查询方法(loc,iloc,at,iat,ix的用法和区别),文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
  • loc方法、iloc方法、[ ]操作符(a)loc方法(b)iloc方法(c) [ ]操作符3.布尔索引4. 快速标量索引5. 区间索引二、多级索引1.创建多级索引2.多层索引切片3.多层索引中的slice对象4.索引层的交换(a)swaplevel方法...
  • 今天小编就为大家分享一篇详谈Pandas中iloc和loc以及ix的区别,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
  • 下面小编就为大家分享一篇浅谈pandas中Dataframe的查询方法([], loc, iloc, at, iat, ix),具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
  • pandas iloc函数

    千次阅读 2019-09-23 16:09:32
    - pandas iloc函数 train.iloc[0:4] 0:4是选取0,1,2,3这四行,这里是前闭后开集合 train.iloc[:,:8] 选取位置为[0,8)列的整列数据 train.iloc[0:2,8] 选取位置为8的列的[0,2)行的数据 ` ...

    - pandas iloc函数

    train.iloc[0:4]

    0:4是选取0,1,2,3这四行,这里是前闭后开集合

    train.iloc[:,:8]

    选取位置为[0,8)列的整列数据

    train.iloc[0:2,8]

    选取位置为8的列的[0,2)行的数据

    `

    展开全文
  • 主要介绍了pandas数据选取:df[] df.loc[] df.iloc[] df.ix[] df.at[] df.iat[],文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
  • 今天小编就为大家分享一篇python pandas.DataFrame选取、修改数据最好用.loc,.iloc,.ix实现。具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
  • 本文翻译自:How are iloc, ix and loc different? Can someone explain how these three methods of slicing are different? 有人可以解释这三种切片方法有何不同吗? I've seen the docs , and I've seen these ...

    本文翻译自:How are iloc, ix and loc different?

    Can someone explain how these three methods of slicing are different? 有人可以解释这三种切片方法有何不同吗?
    I've seen the docs , and I've seen these answers , but I still find myself unable to explain how the three are different. 我看过文档 ,也看过这些 答案 ,但是我仍然发现自己无法解释这三者之间的区别。 To me, they seem interchangeable in large part, because they are at the lower levels of slicing. 在我看来,它们在很大程度上似乎是可互换的,因为它们处于切片的较低级别。

    For example, say we want to get the first five rows of a DataFrame . 例如,假设我们要获取DataFrame的前五行。 How is it that all three of these work? 这三者如何运作?

    df.loc[:5]
    df.ix[:5]
    df.iloc[:5]
    

    Can someone present three cases where the distinction in uses are clearer? 有人可以提出三种用法之间的区别更清楚的情况吗?


    #1楼

    参考:https://stackoom.com/question/28Ypl/iloc-ix和loc有何不同


    #2楼

    Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. 注意:在熊猫版本0.20.0及更高版本中, 不建议使用ix ,而建议使用lociloc I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. 我保留了完整描述ix的部分答案,以供早期版本的熊猫用户参考。 Examples have been added below showing alternatives to ix . 下面添加了示例,显示了ix替代方案


    First, here's a recap of the three methods: 首先,以下是三种方法的概述:

    • loc gets rows (or columns) with particular labels from the index. loc从索引中获取带有特定标签的行(或列)。
    • iloc gets rows (or columns) at particular positions in the index (so it only takes integers). iloc在索引中的特定位置获取行(或列)(因此仅获取整数)。
    • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index. ix通常会尝试像loc一样表现, iloc如果索引中不存在标签,则会退回到iloc表现。

    It's important to note some subtleties that can make ix slightly tricky to use: 重要的是要注意一些会使ix难以使用的微妙之处:

    • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. 如果索引是整数类型,则ix将仅使用基于标签的索引,而不会使用基于位置的索引。 If the label is not in the index, an error is raised. 如果标签不在索引中,则会引发错误。

    • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. 如果指数不包含唯一整数,然后给出一个整数, ix将立即使用基于位置的索引,而不是基于标签的索引。 If however ix is given another type (eg a string), it can use label-based indexing. 但是,如果给ix另一种类型(例如字符串),则可以使用基于标签的索引。


    To illustrate the differences between the three methods, consider the following Series: 为了说明这三种方法之间的差异,请考虑以下系列:

    >>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
    >>> s
    49   NaN
    48   NaN
    47   NaN
    46   NaN
    45   NaN
    1    NaN
    2    NaN
    3    NaN
    4    NaN
    5    NaN
    

    We'll look at slicing with the integer value 3 . 我们将看一下整数3切片。

    In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label): 在这种情况下, s.iloc[:3]返回我们的前3行(因为它将3视为位置),而s.loc[:3]返回我们的前8行(因为将3作为标签):

    >>> s.iloc[:3] # slice the first three rows
    49   NaN
    48   NaN
    47   NaN
    
    >>> s.loc[:3] # slice up to and including label 3
    49   NaN
    48   NaN
    47   NaN
    46   NaN
    45   NaN
    1    NaN
    2    NaN
    3    NaN
    
    >>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
    49   NaN
    48   NaN
    47   NaN
    46   NaN
    45   NaN
    1    NaN
    2    NaN
    3    NaN
    

    Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type). 通知s.ix[:3]返回相同的系列作为s.loc[:3]因为它看起来为标签第一而不是工作的位置(和用于索引s是整数类型的)。

    What if we try with an integer label that isn't in the index (say 6 )? 如果我们尝试使用不在索引中的整数标签(例如6 )怎么办?

    Here s.iloc[:6] returns the first 6 rows of the Series as expected. 此处s.iloc[:6]返回Series的前6行。 However, s.loc[:6] raises a KeyError since 6 is not in the index. 但是, s.loc[:6]会引发s.loc[:6]因为索引中没有6

    >>> s.iloc[:6]
    49   NaN
    48   NaN
    47   NaN
    46   NaN
    45   NaN
    1    NaN
    
    >>> s.loc[:6]
    KeyError: 6
    
    >>> s.ix[:6]
    KeyError: 6
    

    As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can't find a 6 in the index. 按照上面提到的s.ix[:6]s.ix[:6]现在引发s.ix[:6]因为它试图像loc一样工作,但是在索引中找不到6 Because our index is of integer type ix doesn't fall back to behaving like iloc . 因为我们的索引是整数类型ix所以不会像iloc那样iloc

    If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError: 但是,如果索引是混合类型的,给定整数ix行为将立即类似于iloc ,而不是引发iloc

    >>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
    >>> s2.index.is_mixed() # index is mix of different types
    True
    >>> s2.ix[:6] # now behaves like iloc given integer
    a   NaN
    b   NaN
    c   NaN
    d   NaN
    e   NaN
    1   NaN
    

    Keep in mind that ix can still accept non-integers and behave like loc : 请记住, ix仍然可以接受非整数并且行为类似于loc

    >>> s2.ix[:'c'] # behaves like loc given non-integer
    a   NaN
    b   NaN
    c   NaN
    

    As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results - try not use ix . 作为一般建议,如果仅使用标签建立索引,或仅使用整数位置建立索引,请坚持使用lociloc以避免意外的结果-请勿使用ix


    Combining position-based and label-based indexing 结合基于位置和基于标签的索引

    Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns. 有时给定一个DataFrame,您将需要为行和列混合使用标签和位置索引方法。

    For example, consider the following DataFrame. 例如,考虑以下DataFrame。 How best to slice the rows up to and including 'c' and take the first four columns? 如何最好地将行切成“ c” 包括前四列?

    >>> df = pd.DataFrame(np.nan, 
                          index=list('abcde'),
                          columns=['x','y','z', 8, 9])
    >>> df
        x   y   z   8   9
    a NaN NaN NaN NaN NaN
    b NaN NaN NaN NaN NaN
    c NaN NaN NaN NaN NaN
    d NaN NaN NaN NaN NaN
    e NaN NaN NaN NaN NaN
    

    In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name): 在较早版本的pandas(0.20.0之前)中, ix可以使您非常整洁地执行此操作-我们可以按标签对行进行切片,并按位置对列进行切片(请注意,对于列, ix将默认为基于位置的切片,因为4不是列名):

    >>> df.ix[:'c', :4]
        x   y   z   8
    a NaN NaN NaN NaN
    b NaN NaN NaN NaN
    c NaN NaN NaN NaN
    

    In later versions of pandas, we can achieve this result using iloc and the help of another method: 在更高版本的熊猫中,我们可以使用iloc和另一种方法的帮助来实现此结果:

    >>> df.iloc[:df.index.get_loc('c') + 1, :4]
        x   y   z   8
    a NaN NaN NaN NaN
    b NaN NaN NaN NaN
    c NaN NaN NaN NaN
    

    get_loc() is an index method meaning "get the position of the label in this index". get_loc()是一个索引方法,意思是“获取标签在此索引中的位置”。 Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well. 请注意,由于使用iloc切片不包括其端点,因此如果还要行'c',则必须在此值上加1。

    There are further examples in pandas' documentation here . 此处的熊猫文档中还有其他示例。


    #3楼

    iloc works based on integer positioning. iloc基于整数定位。 So no matter what your row labels are, you can always, eg, get the first row by doing 因此,无论您的行标签是什么,您都可以始终执行以下操作:

    df.iloc[0]
    

    or the last five rows by doing 或最后五行

    df.iloc[-5:]
    

    You can also use it on the columns. 您也可以在列上使用它。 This retrieves the 3rd column: 这将检索第三列:

    df.iloc[:, 2]    # the : in the first position indicates all rows
    

    You can combine them to get intersections of rows and columns: 您可以将它们结合起来以获得行和列的交集:

    df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)
    

    On the other hand, .loc use named indices. 另一方面, .loc使用命名索引。 Let's set up a data frame with strings as row and column labels: 让我们设置一个带有字符串作为行和列标签的数据框:

    df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])
    

    Then we can get the first row by 然后我们可以得到第一行

    df.loc['a']     # equivalent to df.iloc[0]
    

    and the second two rows of the 'date' column by 'date'列的后两行

    df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]
    

    and so on. 等等。 Now, it's probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. 现在,可能值得指出的是, DataFrame的默认行和列索引是从0 DataFrame的整数,在这种情况下, ilocloc将以相同的方式工作。 This is why your three examples are equivalent. 这就是为什么您的三个示例是等效的。 If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error. 如果您有非数字索引,例如字符串或日期时间,则 df.loc[:5] 会引发错误。

    Also, you can do column retrieval just by using the data frame's __getitem__ : 另外,您可以仅使用数据框的__getitem__进行列检索:

    df['time']    # equivalent to df.loc[:, 'time']
    

    Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). 现在假设您要混合使用位置索引和命名索引,即使用行上的名称和列上的位置进行索引(为了澄清,我的意思是从我们的数据框中进行选择,而不是创建一个在行索引中包含字符串而在其中包含整数的数据框架列索引)。 This is where .ix comes in: 这是.ix来源:

    df.ix[:2, 'time']    # the first two rows of the 'time' column
    

    I think it's also worth mentioning that you can pass boolean vectors to the loc method as well. 我认为也值得一提的是,您也可以将布尔向量传递给loc方法。 For example: 例如:

     b = [True, False, True]
     df.loc[b] 
    

    Will return the 1st and 3rd rows of df . 将返回df的第一行和第三行。 This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors: 这等效于df[b]进行选择,但也可以用于通过布尔向量进行分配:

    df.loc[b, 'name'] = 'Mary', 'John'
    

    #4楼

    In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. 我认为,可接受的答案令人困惑,因为它使用仅缺少值的DataFrame。 I also do not like the term position-based for .iloc and instead, prefer integer location as it is much more descriptive and exactly what .iloc stands for. 我也不太喜欢.iloc的术语基于位置的术语,而是更喜欢整数位置,因为它更具描述性,并且确切地代表.iloc The key word is INTEGER - .iloc needs INTEGERS. 关键字是INTEGER- .iloc需要INTEGERS。

    See my extremely detailed blog series on subset selection for more 请参阅我关于子集选择的非常详细的博客系列 ,以了解更多信息


    .ix is deprecated and ambiguous and should never be used .ix已弃用且含糊不清,切勿使用

    Because .ix is deprecated we will only focus on the differences between .loc and .iloc . 由于.ix已弃用,因此我们仅关注.loc.iloc之间的差异。

    Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. 在讨论差异之前,重要的是要了解DataFrame具有用于帮助标识每个列和每个索引的标签。 Let's take a look at a sample DataFrame: 让我们看一个示例DataFrame:

    df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                       'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                       'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                       'height':[165, 70, 120, 80, 180, 172, 150],
                       'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                       'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                       },
                      index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])
    

    在此处输入图片说明

    All the words in bold are the labels. 所有粗体字均为标签。 The labels, age , color , food , height , score and state are used for the columns . 标签, agecolorfoodheightscorestate被用于 The other labels, Jane , Nick , Aaron , Penelope , Dean , Christina , Cornelia are used for the index . 索引使用其他标签,例如JaneNickAaronPenelopeDeanChristinaCornelia


    The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. 在DataFrame中选择特定行的主要方法是使用.loc.iloc索引器。 Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. 这些索引器中的每一个也可以用于同时选择列,但是现在只关注行比较容易。 Also, each of the indexers use a set of brackets that immediately follow their name to make their selections. 此外,每个索引器都使用紧跟其名称的一组括号来进行选择。

    .loc selects data only by labels .loc仅通过标签选择数据

    We will first talk about the .loc indexer which only selects data by the index or column labels. 我们将首先讨论.loc索引器,该索引器仅通过索引或列标签选择数据。 In our sample DataFrame, we have provided meaningful names as values for the index. 在示例DataFrame中,我们提供了有意义的名称作为索引值。 Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame. 许多DataFrame都没有任何有意义的名称,而是默认为0到n-1之间的整数,其中n是DataFrame的长度。

    There are three different inputs you can use for .loc .loc可以使用三种不同的输入

    • A string 一串
    • A list of strings 字符串列表
    • Slice notation using strings as the start and stop values 使用字符串作为起始值和终止值的切片符号

    Selecting a single row with .loc with a string 用带字符串的.loc选择单行

    To select a single row of data, place the index label inside of the brackets following .loc . 要选择一行数据,请将索引标签放在.loc之后的括号内。

    df.loc['Penelope']
    

    This returns the row of data as a Series 这将数据行作为系列返回

    age           4
    color     white
    food      Apple
    height       80
    score       3.3
    state        AL
    Name: Penelope, dtype: object
    

    Selecting multiple rows with .loc with a list of strings 使用.loc与字符串列表选择多行

    df.loc[['Cornelia', 'Jane', 'Dean']]
    

    This returns a DataFrame with the rows in the order specified in the list: 这将返回一个DataFrame,其中的数据行按列表中指定的顺序进行:

    在此处输入图片说明

    Selecting multiple rows with .loc with slice notation 使用带有切片符号的.loc选择多行

    Slice notation is defined by a start, stop and step values. 切片符号由开始,停止和步进值定义。 When slicing by label, pandas includes the stop value in the return. 按标签切片时,大熊猫在返回值中包含停止值。 The following slices from Aaron to Dean, inclusive. 以下是从亚伦到迪恩(含)的片段。 Its step size is not explicitly defined but defaulted to 1. 它的步长未明确定义,但默认为1。

    df.loc['Aaron':'Dean']
    

    在此处输入图片说明

    Complex slices can be taken in the same manner as Python lists. 可以采用与Python列表相同的方式获取复杂的切片。

    .iloc selects data only by integer location .iloc仅按整数位置选择数据

    Let's now turn to .iloc . 现在转到.iloc Every row and column of data in a DataFrame has an integer location that defines it. DataFrame中数据的每一行和每一列都有一个定义它的整数位置。 This is in addition to the label that is visually displayed in the output . 这是在输出中直观显示的标签的补充 The integer location is simply the number of rows/columns from the top/left beginning at 0. 整数位置只是从0开始从顶部/左侧开始的行/列数。

    There are three different inputs you can use for .iloc .iloc可以使用三种不同的输入

    • An integer 一个整数
    • A list of integers 整数列表
    • Slice notation using integers as the start and stop values 使用整数作为起始值和终止值的切片符号

    Selecting a single row with .iloc with an integer 用带整数的.iloc选择单行

    df.iloc[4]
    

    This returns the 5th row (integer location 4) as a Series 这将返回第5行(整数位置4)为系列

    age           32
    color       gray
    food      Cheese
    height       180
    score        1.8
    state         AK
    Name: Dean, dtype: object
    

    Selecting multiple rows with .iloc with a list of integers 用.iloc选择带有整数列表的多行

    df.iloc[[2, -2]]
    

    This returns a DataFrame of the third and second to last rows: 这将返回第三行和倒数第二行的DataFrame:

    在此处输入图片说明

    Selecting multiple rows with .iloc with slice notation 使用带切片符号的.iloc选择多行

    df.iloc[:5:3]
    

    在此处输入图片说明


    Simultaneous selection of rows and columns with .loc and .iloc 使用.loc和.iloc同时选择行和列

    One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. .loc/.iloc一项出色功能是它们可以同时选择行和列。 In the examples above, all the columns were returned from each selection. 在上面的示例中,所有列都是从每个选择中返回的。 We can choose columns with the same types of inputs as we do for rows. 我们可以选择输入类型与行相同的列。 We simply need to separate the row and column selection with a comma . 我们只需要用逗号分隔行和列选择即可。

    For example, we can select rows Jane, and Dean with just the columns height, score and state like this: 例如,我们可以选择Jane行和Dean行,它们的高度,得分和状态如下:

    df.loc[['Jane', 'Dean'], 'height':]
    

    在此处输入图片说明

    This uses a list of labels for the rows and slice notation for the columns 这对行使用标签列表,对列使用切片符号

    We can naturally do similar operations with .iloc using only integers. 我们自然可以只使用整数对.iloc进行类似的操作。

    df.iloc[[1,4], 2]
    Nick      Lamb
    Dean    Cheese
    Name: food, dtype: object
    

    Simultaneous selection with labels and integer location 带标签和整数位置的同时选择

    .ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. .ix用于同时与标签和整数位置进行选择,这虽然有用,但有时会造成混淆和模棱两可,值得庆幸的是,它已被弃用。 In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations. 如果您需要混合使用标签和整数位置进行选择,则必须同时选择标签或整数位置。

    For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following: 例如,如果我们要选择行NickCornelia以及列2和4,我们可以使用.loc通过将整数转换为带有以下内容的标签:

    col_names = df.columns[[2, 4]]
    df.loc[['Nick', 'Cornelia'], col_names] 
    

    Or alternatively, convert the index labels to integers with the get_loc index method. 或者,可以使用get_loc index方法将索引标签转换为整数。

    labels = ['Nick', 'Cornelia']
    index_ints = [df.index.get_loc(label) for label in labels]
    df.iloc[index_ints, [2, 4]]
    

    Boolean Selection 布尔选择

    The .loc indexer can also do boolean selection. .loc索引器还可以进行布尔选择。 For instance, if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following: 例如,如果我们有兴趣查找年龄在30岁以上的所有行,并仅返回foodscore列,则可以执行以下操作:

    df.loc[df['age'] > 30, ['food', 'score']] 
    

    You can replicate this with .iloc but you cannot pass it a boolean series. 您可以使用.iloc复制此文件,但不能将其传递为布尔系列。 You must convert the boolean Series into a numpy array like this: 您必须将boolean Series转换为numpy数组,如下所示:

    df.iloc[(df['age'] > 30).values, [2, 4]] 
    

    Selecting all rows 选择所有行

    It is possible to use .loc/.iloc for just column selection. 可以将.loc/.iloc用于仅列选择。 You can select all the rows by using a colon like this: 您可以使用如下冒号来选择所有行:

    df.loc[:, 'color':'score':2]
    

    在此处输入图片说明


    The indexing operator, [] , can select rows and columns too but not simultaneously. 索引运算符[]也可以选择行和列,但不能同时选择。

    Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. 大多数人都熟悉DataFrame索引运算符的主要目的,即选择列。 A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame. 字符串选择单个列作为系列,而字符串列表选择多个列作为DataFrame。

    df['food']
    
    Jane          Steak
    Nick           Lamb
    Aaron         Mango
    Penelope      Apple
    Dean         Cheese
    Christina     Melon
    Cornelia      Beans
    Name: food, dtype: object
    

    Using a list selects multiple columns 使用列表选择多个列

    df[['food', 'score']]
    

    在此处输入图片说明

    What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. 人们所不熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。 This is very confusing and something that I almost never use but it does work. 这非常令人困惑,我几乎从未使用过,但是确实可以使用。

    df['Penelope':'Christina'] # slice rows by label
    

    在此处输入图片说明

    df[2:6:2] # slice rows by integer location
    

    在此处输入图片说明

    The explicitness of .loc/.iloc for selecting rows is highly preferred. 强烈建议使用.loc/.iloc来选择行。 The indexing operator alone is unable to select rows and columns simultaneously. 单独的索引运算符无法同时选择行和列。

    df[3:5, 'color']
    TypeError: unhashable type: 'slice'
    
    展开全文
  • 用loc,iloc,直接取值三种方法;对DataFrame,Series,行和列进行操作 import pandas as pd #读取college数据集 college = pd.read_csv('data/college.csv', index_col='INSTNM') iloc通过行标签取数 索引值的下标 ...

    用loc,iloc,直接取值三种方法;对DataFrame,Series,行和列进行操作 

    import pandas as pd
    #读取college数据集
    college = pd.read_csv('data/college.csv', index_col='INSTNM')

    iloc通过行标签取数 索引值的下标

    # 选取第61行
    pd.options.display.max_rows = 6
    college.iloc[60]
    '''
    CITY                  Anchorage
    STABBR                       AK
    HBCU                          0
                            ...    
    UG25ABV                  0.4386
    MD_EARN_WNE_P10           42500
    GRAD_DEBT_MDN_SUPP      19449.5
    Name: University of Alaska Anchorage, Length: 26, dtype: object
    '''
    
    
    # 选取多个不连续的行
    college.iloc[[60, 99, 3]] #在series中取值62行,101行,5行
    
     CITYSTABBRHBCUMENONLYWOMENONLYRELAFFILSATVRMIDSATMTMIDDISTANCEONLYUGDS...UGDS_2MORUGDS_NRAUGDS_UNKNPPTUG_EFCURROPERPCTPELLPCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
    INSTNM                     
    University of Alaska AnchorageAnchorageAK0.00.00.00NaNNaN0.012865.0...0.09800.01810.04570.453910.23850.26470.43864250019449.5
    International Academy of Hair DesignTempeAZ0.00.00.00NaNNaN0.0188.0...0.01600.00000.06380.000000.71850.73460.39052220010556
    University of Alabama in HuntsvilleHuntsvilleAL0.00.00.00595.0590.00.05451.0...0.01720.03320.03500.214610.30720.45960.26404550024097

    3 rows × 26 columns

    # iloc可以用切片连续选取 
    college.iloc[99:102] #选取99行到101行,99,100,101
     CITYSTABBRHBCUMENONLYWOMENONLYRELAFFILSATVRMIDSATMTMIDDISTANCEONLYUGDS...UGDS_2MORUGDS_NRAUGDS_UNKNPPTUG_EFCURROPERPCTPELLPCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
    INSTNM                     
    International Academy of Hair DesignTempeAZ0.00.00.00NaNNaN0.0188.0...0.01600.00000.06380.000000.71850.73460.39052220010556
    GateWay Community CollegePhoenixAZ0.00.00.00NaNNaN0.05211.0...0.01270.01610.07020.746510.32700.21890.5832298007283
    Mesa Community CollegeMesaAZ0.00.00.00NaNNaN0.019055.0...0.02050.02570.06820.645710.34230.22070.4010352008000

    3 rows × 26 columns

     

    loc通过行标签取数 索引值

    # 也可以通过行标签选取
    college.loc['University of Alaska Anchorage']
    '''
    CITY                  Anchorage
    STABBR                       AK
    HBCU                          0
    MENONLY                       0
    WOMENONLY                     0
                            ...    
    PCTPELL                  0.2385
    PCTFLOAN                 0.2647
    UG25ABV                  0.4386
    MD_EARN_WNE_P10           42500
    GRAD_DEBT_MDN_SUPP      19449.5
    Name: University of Alaska Anchorage, Length: 26, dtype: object
    '''

    用loc加列表来选取 

    # 也可以用loc加列表来选取
    labels = ['University of Alaska Anchorage','International Academy of Hair Design','University of Alabama in Huntsville']
    college.loc[labels]
     CITYSTABBRHBCUMENONLYWOMENONLYRELAFFILSATVRMIDSATMTMIDDISTANCEONLYUGDS...UGDS_2MORUGDS_NRAUGDS_UNKNPPTUG_EFCURROPERPCTPELLPCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
    INSTNM                     
    University of Alaska AnchorageAnchorageAK0.00.00.00NaNNaN0.012865.0...0.09800.01810.04570.453910.23850.26470.43864250019449.5
    International Academy of Hair DesignTempeAZ0.00.00.00NaNNaN0.0188.0...0.01600.00000.06380.000000.71850.73460.39052220010556
    University of Alabama in HuntsvilleHuntsvilleAL0.00.00.00595.0590.00.05451.0...0.01720.03320.03500.214610.30720.45960.26404550024097

    3 rows × 26 columns

    loc可以用标签连续选取start-stop

    # loc可以用标签连续选取start-stop
    start = 'Amridge University'
    stop = 'Athens State University'
    college.loc[start:stop]
     CITYSTABBRHBCUMENONLYWOMENONLYRELAFFILSATVRMIDSATMTMIDDISTANCEONLYUGDS...UGDS_2MORUGDS_NRAUGDS_UNKNPPTUG_EFCURROPERPCTPELLPCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
    INSTNM                     
    Amridge UniversityMontgomeryAL0.00.00.01NaNNaN1.0291.0...0.00000.00000.27150.453610.68010.77950.85404010023370
    University of Alabama in HuntsvilleHuntsvilleAL0.00.00.00595.0590.00.05451.0...0.01720.03320.03500.214610.30720.45960.26404550024097
    Alabama State UniversityMontgomeryAL1.00.00.00425.0430.00.04811.0...0.00980.02430.01370.089210.73470.75540.12702660033118.5
    The University of AlabamaTuscaloosaAL0.00.00.00555.0565.00.029851.0...0.02610.02680.00260.084410.20400.40100.08534190023750
    Central Alabama Community CollegeAlexander CityAL0.00.00.00NaNNaN0.01592.0...0.00000.00000.00190.388210.58920.39770.31532750016127
    Athens State UniversityAthensAL0.00.00.00NaNNaN0.02991.0...0.01740.00570.03340.551710.40880.62960.64103900018595

    6 rows × 26 columns

    index.tolist()提取行索引生成列表 

    #index.tolist()提取行索引生成列表 在series中,多选取一行,代表,多添加一行的列名
    college.iloc[[60, 49, 3]].index.tolist()#选了三行
    ['University of Alaska Anchorage',
     'Snead State Community College',
     'University of Alabama in Huntsville']

    使用iloc,loc选取前3行和前4列的不同做法

    # 读取college数据集,给行索引命名为INSTNM;选取前3行和前4列
    college = pd.read_csv('data/college.csv', index_col='INSTNM')
    college.iloc[:3, :4]
    college.loc[:'Amridge University', :'MENONLY']

    选取两列的所有的行

    college.iloc[:, [4,6]].head()
    college.loc[:, ['WOMENONLY', 'SATVRMID']].head()
     WOMENONLYSATVRMID
    INSTNM  
    Alabama A & M University0.0424.0
    University of Alabama at Birmingham0.0570.0
    Amridge University0.0NaN
    University of Alabama in Huntsville0.0595.0
    Alabama State University0.0425.0

    选取不连续的行和列

    # 选取不连续的行和列
    college.iloc[[100, 200], [7, 15]]
    
    	                                  SATMTMID	UGDS_NHPI
    INSTNM		
    GateWay Community College	            NaN	     0.0029
    American Baptist Seminary of the West	NaN	     NaN
    
    # 用loc和列表,选取不连续的行和列
    rows = ['GateWay Community College', 'American Baptist Seminary of the West']
    columns = ['SATMTMID', 'UGDS_NHPI']
    college.loc[rows, columns]
    
    	                                  SATMTMID	UGDS_NHPI
    INSTNM		
    GateWay Community College	            NaN	     0.0029
    American Baptist Seminary of the West	NaN	     NaN

     

    不用loc,iloc行切片

    #从行索引10到20,每隔一个取一行
    # 读取college数据集;从行索引10到20,每隔一个取一行
    college = pd.read_csv('data/college.csv', index_col='INSTNM')
    college[10:20:2]
     CITYSTABBRHBCUMENONLYWOMENONLYRELAFFILSATVRMIDSATMTMIDDISTANCEONLYUGDS...UGDS_2MORUGDS_NRAUGDS_UNKNPPTUG_EFCURROPERPCTPELLPCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
    INSTNM                     
    Birmingham Southern CollegeBirminghamAL0.00.00.01560.0560.00.01180.0...0.00510.00000.00510.001710.19200.48090.01524420027000
    Concordia College AlabamaSelmaAL1.00.00.01420.0400.00.0322.0...0.00310.04660.00000.105610.86670.93330.236719900PrivacySuppressed
    Enterprise State Community CollegeEnterpriseAL0.00.00.00NaNNaN0.01729.0...0.02540.00120.00690.382310.48950.22630.3399246008273
    Faulkner UniversityMontgomeryAL0.00.00.01NaNNaN0.02367.0...0.01730.01820.02580.230210.58120.72530.45893720022000
    New Beginning College of CosmetologyAlbertvilleAL0.00.00.00NaNNaN0.0115.0...0.00000.00000.00000.078310.82240.85530.3933NaN5500

    5 rows × 26 columns

    Series切片求10到19之间,每隔2个间隔的值

    # Series也可以进行同样的切片
    city = college['CITY']
    city[10:20:2]
    '''
    INSTNM
    Birmingham Southern College              Birmingham
    Concordia College Alabama                     Selma
    Enterprise State Community College       Enterprise
    Faulkner University                      Montgomery
    New Beginning College of Cosmetology    Albertville
    Name: CITY, dtype: object
    '''

    查看第4002个行索引标签

    # 查看第4002个行索引标签
    college.index[4001]
    #'Spokane Community College'

    对DataFrame用标签切片

    # Series和DataFrame都可以用标签进行切片。下面是对DataFrame用标签切片
    start = 'Mesa Community College'
    stop = 'Spokane Community College'
    college[start:stop:1500]
    CITYSTABBRHBCUMENONLYWOMENONLYRELAFFILSATVRMIDSATMTMIDDISTANCEONLYUGDS...UGDS_2MORUGDS_NRAUGDS_UNKNPPTUG_EFCURROPERPCTPELLPCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
    INSTNM                     
    Mesa Community CollegeMesaAZ0.00.00.00NaNNaN0.019055.0...0.02050.02570.06820.645710.34230.22070.4010352008000
    Hair Academy Inc-New CarrolltonNew CarrolltonMD0.00.00.00NaNNaN0.0504.0...0.00000.00000.00000.468310.97561.00000.5882152009666
    National College of Natural MedicinePortlandOR0.00.00.00NaNNaN0.0NaN...NaNNaNNaNNaN1NaNNaNNaNNaNPrivacySuppressed

    3 rows × 26 columns

    对Series用标签切片

    # 下面是对Series用标签切片
    city[start:stop:1500]
    ’‘’
    INSTNM
    Mesa Community College                            Mesa
    Hair Academy Inc-New Carrollton         New Carrollton
    National College of Natural Medicine          Portland
    Name: CITY, dtype: object
    ‘’‘

    直接切片不能用于列,只能用于DataFrame的行和Series,也不能同时选取行和列。

    # 下面尝试选取两列,导致错误
    # college[:10, ['CITY', 'STABBR']]
    # TypeError: '(slice(None, 10, None), ['CITY', 'STABBR'])' is an invalid key
    # 只能用.loc和.iloc选取
    first_ten_instnm = college.index[:10]
    college.loc[first_ten_instnm, ['CITY', 'STABBR']]
     CITYSTABBR
    INSTNM  
    A & W Healthcare EducatorsNew OrleansLA
    A T Still University of Health SciencesKirksvilleMO
    ABC Beauty AcademyGarlandTX
    ABC Beauty College IncArkadelphiaAR
    AI Miami International University of Art and DesignMiamiFL
    展开全文
  • loc和iloc的用法和区别

    万次阅读 多人点赞 2019-04-18 21:57:25
    iloc——通过行号索引行数据 ix——通过行标签或者行号索引行数据(基于loc和iloc 的混合) 标签切片,如’a’:‘c’,与序列切片如0:2不同,后者不包含index=2的元素,前者包含结束标签’c’所在的行。 ...

    loc——通过行标签索引行数据 
    iloc——通过行号索引行数据 
    ix——通过行标签或者行号索引行数据(基于loc和iloc 的混合) 

    • 标签切片,如’a’:‘c’,与序列切片如0:2不同,后者不包含index=2的元素,前者包含结束标签’c’所在的行。

    • 布尔类型数组作为标签,例如[True, False]等价于[‘a’,‘c’]

    1.loc

    import numpy as np
    import pandas as pd
    from pandas import *
    from numpy import *
    
    data=DataFrame(np.arange(16).reshape(4,4),index=list("ABCD"),columns=list("wxyz"))
    print(data)
    #    w   x   y   z
    #A   0   1   2   3
    #B   4   5   6   7
    #C   8   9  10  11
    #D  12  13  14  15
    
    #loc
    #行的选取
    print(data.loc["A"])
    print(type(data.loc["A"]))
    #w    0
    #x    1
    #y    2
    #z    3
    #Name: A, dtype: int32
    #<class 'pandas.core.series.Series'>
    
    print(data.loc[["A"]])
    print(type(data.loc[["A"]]))
    #   w  x  y  z
    #A  0  1  2  3
    #<class 'pandas.core.frame.DataFrame'>
    #综上,[]返回Series,[[]]返回DataFrame
    
    print(data.loc["A","w"])
    print(type(data.loc["A","w"]))
    #0
    #<class 'numpy.int32'>
    
    print(data.loc[:,"w"])
    print(type(data.loc[:,"w"]))
    #A     0
    #B     4
    #C     8
    #D    12
    #Name: w, dtype: int32
    #<class 'pandas.core.series.Series'>
    
    print(data.loc["A":"C"])
    print(type(data.loc["A":"C"]))
    #   w  x   y   z
    #A  0  1   2   3
    #B  4  5   6   7
    #C  8  9  10  11
    #<class 'pandas.core.frame.DataFrame'>
    
    print(data.loc["A":"C","w":"y"])
    print(type(data.loc["A":"C","w":"y"]))
    #   w  x   y
    #A  0  1   2
    #B  4  5   6
    #C  8  9  10
    #<class 'pandas.core.frame.DataFrame'>
    
    print(data.loc[["A","C"],["w","y"]])
    print(type(data.loc[["A","C"],["w","y"]]))
    #   w   y
    #A  0   2
    #C  8  10
    #<class 'pandas.core.frame.DataFrame'>
    
    print(data.loc[:,["w","y"]])
    print(type(data.loc[:,["w","y"]]))
    #    w   y
    #A   0   2
    #B   4   6
    #C   8  10
    #D  12  14
    #<class 'pandas.core.frame.DataFrame'>
    
    #列的选取
    print(data["w"])#等同于print(data.loc[:,"w"])
    #A     0
    #B     4
    #C     8
    #D    12
    #Name: w, dtype: int32
    print(data.loc[:,"w"])
    #A     0
    #B     4
    #C     8
    #D    12
    #Name: w, dtype: int32
    print(data["w"].equals(data.loc[:,"w"]))#True
    
    #根据特殊条件选取行列
    print(data["w"]>5)
    #A    False
    #B    False
    #C     True
    #D     True
    #Name: w, dtype: bool
    
    print(data.loc[data["w"]>5])
    #    w   x   y   z
    #C   8   9  10  11
    #D  12  13  14  15
    print(data.loc[data["w"]>5,"w"])
    print(type(data.loc[data["w"]>5,"w"]))
    #C     8
    #D    12
    #Name: w, dtype: int32
    #<class 'pandas.core.series.Series'>
    print(data.loc[data["w"]>5,["w"]])
    print(type(data.loc[data["w"]>5,["w"]]))
    #    w
    #C   8
    #D  12
    #<class 'pandas.core.frame.DataFrame'>
    print(data["w"]==0)
    print(data.loc[lambda data:data["w"]==0])
    print(type(data.loc[lambda data:data["w"]==0]))
    #A     True
    #B    False
    #C    False
    #D    False
    #Name: w, dtype: bool
    #   w  x  y  z
    #A  0  1  2  3
    #<class 'pandas.core.frame.DataFrame'>
    
    #loc赋值
    print(data)
    #    w   x   y   z
    #A   0   1   2   3
    #B   4   5   6   7
    #C   8   9  10  11
    #D  12  13  14  15
    data.loc[["A","C"],["w","x"]]=999
    print(data)
    #     w    x   y   z
    #A  999  999   2   3
    #B    4    5   6   7
    #C  999  999  10  11
    #D   12   13  14  15
    
    

    2.iloc

    data=DataFrame(np.arange(16).reshape(4,4),index=list("ABCD"),columns=list("wxyz"))
    print(data)
    #    w   x   y   z
    #A   0   1   2   3
    #B   4   5   6   7
    #C   8   9  10  11
    #D  12  13  14  15
    
    print(data.iloc[0])
    print(type(data.iloc[0]))
    #w    0
    #x    1
    #y    2
    #z    3
    #Name: A, dtype: int32
    #<class 'pandas.core.series.Series'>
    #print(data.iloc["A"])报错
    
    #print(data.loc[0])报错
    print(data.loc[["A"]])
    print(type(data.loc["A"]))
    #   w  x  y  z
    #A  0  1  2  3
    #<class 'pandas.core.series.Series'>

    3.iloc和loc差别

    iloc是按照行数取值,而loc按着index名取值

    data=DataFrame(np.arange(16).reshape(4,4),index=list("1234"),columns=list("wxyz"))
    print(data)
    #    w   x   y   z
    #1   0   1   2   3
    #2   4   5   6   7
    #3   8   9  10  11
    #4  12  13  14  15
    print(data.iloc[0])
    #w    0
    #x    1
    #y    2
    #z    3
    #Name: 1, dtype: int32
    #print(data.loc[0])报错

    参考:https://blog.csdn.net/boywaiter/article/details/86012620

    展开全文
  • <code class="language-python">import pandas as pd df = pd.read_excel("1.xls", None, header=None) # 读取数据表格 ...大概是iloc不识别字典的值 输出第一个hh值为 4:,2:4 烦劳大神解答</p>
  • Pandas之iloc、loc

    千次阅读 2020-04-02 13:34:19
    iloc loc
  • 使用iloc,loc和ix在Pandas DataFrames中选择行和列 78评论 /博客,数据科学,pandas,python,教程 / shanelynn Pandas数据选择有多种方法可以从Pandas DataFrames中选择和索引行和列。我发现在线教程侧重...
  • name 'iloc' is not defined

    2021-06-22 21:32:46
    code = rawdata.at[iloc,'security_code'] 运行后提示未定义iloc,请问老师这个是怎么回事啊?是因为anaconda文件夹下面没有吗?应该怎么解决? NameError: name 'iloc' is not defined
  • 'age']]) print(df.loc[1:3,['gender','age']]) # loc是闭区间,尾部包含 print(df.iloc[1:3,[1,2]]) # iloc是开区间,尾部不包含 print(df.iloc[:,1:3]) # 读取第1行到第2行的数据 print(df[1:3]) print(df.iloc[1:...
  • 写在前面 虽然用了pandas很长时间了,但是基本也是围绕在用pandas去做一些文件操作以及...然后以数组的方式进行索引,因为对DataFrame用 [] 进行索引时经常会有一些易错点,而用loc、iloc时,老是忘记它们的适用...
  • iloc函数使用方法

    千次阅读 2020-12-01 15:43:39
    iloc[ : , : ] 前面的冒号就是取行数,后面的冒号是取列数 找了一个例子做实践 import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('Position_Salaries.csv') ...
  • iloc和loc的区别

    千次阅读 2020-02-25 10:58:52
    iloc和loc的不同,总是模棱两可,下面通过测试对比一下 import pandas as pd import os import numpy as np 设置文件夹 os.chdir("/Users/XXX/Documents/csv“) 打开文件、赋值并添加索引 In [8]:df = pd.read_...
  • iloc中多添加一个数字,下表中就增加一行 city.iloc[[2]] ''' INSTNM Amridge University Montgomery Name: CITY, dtype: object ''' iloc通过整数列表选取多行,返回结果是Series # iloc通过整数列表选取多行,...
  • Pandas中loc和iloc函数的用法

    万次阅读 2021-01-04 22:18:44
    1 loc和iloc的含义 loc表示location的意思;iloc中的loc意思相同,前面的i表示integer,所以它只接受整数作为参数。 2 用法 import pandas as pd import numpy as np # np.random.randn(5, 2)表示返回5x2的矩阵,...
  • python数据提取——loc和iloc 1、目标:提取第一行(蓝色) 方法一:轴标签(loc) #输入数据 >>import pandas as pd >>import numpy as np >>data=pd.DataFrame(np.arange(16).reshape(4,4),index=...
  • iloc与loc区别

    万次阅读 多人点赞 2019-07-27 10:50:33
    iloc PK loc 导入numpy,pandas import numpy as np import pandas as pd 构建一个0,30左闭右开的偶数数组 data=np.arange(0,30,2) #arange(起始,结束,相差) data 类型ndarray print(type(data)) <class '...
  • pandas.DataFrame.iloc的使用 今天学习时遇到了这个方法,为了加深理解做一下笔记。 这是该方法的文档,从中可以看出,中括号里允许输入可情形有5种。 此外,iloc方法既可以索引行数据,也可以列数据。 //首先创建...
  • pandas.DataFrame.iloc() 纯基于位置的整数索引 输入格式: 一个整数列表或数组,如[4,3,0]。 一个带有int类型的slice对象,例如1:7。 一个布尔值数组。 一个具有一个参数的可调用函数,返回索引 案例 mydict = [{...
  • python 索引方法loc和iloc

    千次阅读 2019-10-20 17:21:12
    loc方法和iloc方法是按照行和列用于索引表格或者数据(数据类型是DataFrame)中所需要的元素(也就是数据内容) loc索引方式: 通过索引行和索引列的名称来索引 表达式: df.loc[ [ 索引行的名称], [ 索引列的名称 ] ]...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 46,253
精华内容 18,501
关键字:

iloc