精华内容
下载资源
问答
  • 最近学习python数据分析,遇到了四分位数计算问题,因四分位数计算公式不一致,导致结果不一样,坑爹的百度只给了一种计算方法,容易迷惑初学者,故总结如下:计算方法三个四分位数的确定:先按从小到大方法排序,...

    最近学习python数据分析,遇到了四分位数计算问题,因四分位数计算公式不一致,导致结果不一样,坑爹的百度只给了一种计算方法,容易迷惑初学者,故总结如下:

    计算方法

    三个四分位数的确定:

    先按从小到大方法排序,然后使用下列方法。

    方法1:n+1法

    Q1的位置= (n+1) × 0.25

    Q2的位置= (n+1) × 0.5

    Q3的位置= (n+1) × 0.75

    n表示数据的数据个数。

    上面的是大家常用的n+1法。还有一种是n-1法

    方法2:n-1法

    Q1的位置=1+(n-1)x 0.25

    Q2的位置=1+(n-1)x 0.5

    Q3的位置=1+(n-1)x 0.75

    当位置结果为小数时,则用两个位置上的数分别乘以小数和(1-小数)后相加。例如,当结果为6.25时,就用第六个位置的数*0.25+第七个位置的数*0.75后得到结果。

    下面举例说明。

    举例1(奇数个)假设有一组数据6,7,15,36,39,40,41,42,43,47,49。此数据已按从小到大顺序拍寻,因此不需要再排序,如未拍寻,需先进行排序。

    1、下面根据公式(n+1)法计算

    第一四分位数(下四分位数):(11+1)/4 =3,说明它在第三个位置,所以是15,即Q1=15。

    中位数:(11+1)/4*2=6,所以是40。

    第三四分位数(上四分位数):(11+1)/4*3=9, 所以是43。

    至此,Q1=15,Q2=40,Q3=43。

    2、下面根据公式(n-1)法计算

    第一四分位数(下四分位数):1+(11-1)x 0.25 =3.5,则Q1=15x0.5+36x0.5=25.5

    中位数:1+(11-1)x 0.5 =6,则Q2=15x0.5+36x0.5=40

    第三四分位数(上四分位数):1+(11-1)x 0.75 =8.5,则Q3=42x0.5+43x0.5=42.5

    下面用python实现计算。

    1 importpandas as pd2 s1 = pd.Series([6,7,15,36,39,40,41,42,43,47,49])3 s1.describe()

    结果如下:

    count 11.000000mean33.181818std15.873362min6.000000

    25% 25.500000

    50% 40.000000

    75% 42.500000max49.000000dtype: float64

    可见,python运行出来的结果是Q1=25.5 Q2=40 Q3=42.5。

    运行结果与n-1法一样,说明python用的是这种方法。

    举例2(偶数个)

    1 importnumpy as np2 importpandas as pd3 ser_obj=pd.Series([1,2,3,4,5,6])4 ser_obj.describe()

    1、下面根据公式(n+1)法计算

    第一四分位数(下四分位数):(6+1)/4 =1.75,说明它在第1.75位置,所以是1*0.25+2*0.75,即Q1=1.75。

    中位数:(6+1)/4*2=3.5,所以是3*0.5+4*0.5=3.5。

    第三四分位数(上四分位数):(6+1)/4*3=5.25, 所以是5*0.75+6*0.25=5.25。

    至此,Q1=1.75,Q2=3.5,Q3=5.25。

    2、下面根据公式(n-1)法计算

    第一四分位数(下四分位数):1+(6-1)x 0.25 =2.25,则Q1=2x0.75+3x0.25=2.25

    中位数:1+(6-1)x 0.5 =3.5,则Q2=3x0.5+4x0.5=3.5

    第三四分位数(上四分位数):1+(6-1)x 0.75 =4.75,则Q3=4*0.25+5*0.75=4.75

    下面用python实现计算。

    count 6.000000

    mean 3.500000

    std 1.870829

    min 1.000000

    25% 2.250000

    50% 3.500000

    75% 4.750000

    max 6.000000

    因此,pandas使用的是n-1法,人们通常使用n+1法。

    展开全文
  • [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8] S = pd.Series(series) percentage_rank = S.rank(method="max", pct=True) print(percentage_rank) 这基本上给出了Series中每个条目的百分比: ^{pr2}$ 为了检索这三个百分的...

    假设数据总是排序的(谢谢@胡安帕.阿里维拉加),您可以使用Pandas^{}类中的rank方法。rank()接受多个参数。其中之一是pct:pct : boolean, default False

    Computes percentage rank of data

    百分比排名有不同的计算方法。这些方法由参数method控制:method : {‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}

    您需要方法"max":max: highest rank in group

    让我们看看rank()方法的输出,其中包含以下参数:import numpy as np

    import pandas as pd

    series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]

    S = pd.Series(series)

    percentage_rank = S.rank(method="max", pct=True)

    print(percentage_rank)

    这基本上给出了Series中每个条目的百分比:

    ^{pr2}$

    为了检索这三个百分位的索引,您可以在Series中查找第一个元素,该元素的百分比排名等于或高于您感兴趣的百分位。该元素的索引就是您需要的索引。在index25 = S.index[percentage_rank >= 0.25][0]

    index50 = S.index[percentage_rank >= 0.50][0]

    index75 = S.index[percentage_rank >= 0.75][0]

    print("25 percentile: index {}, value {}".format(index25, S[index25]))

    print("50 percentile: index {}, value {}".format(index50, S[index50]))

    print("75 percentile: index {}, value {}".format(index75, S[index75]))

    这将为您提供以下输出:25 percentile: index 1, value 2

    50 percentile: index 1, value 2

    75 percentile: index 11, value 5

    展开全文
  • My attempt in Python is as follows: >>> a = numpy.array([1, 2, 3, 4, 5, 6, 7]) >>> numpy.percentile(a, 25) 2.5 >>> numpy.percentile(a, 75) 5.5 >>> numpy.percentile(a, 75) - numpy.percentile(a, 25) # ...

    I have a list of numbers [1, 2, 3, 4, 5, 6, 7] and I want to have a function to return the interquartile range of this list of numbers. The interquartile range is the difference between the upper and lower quartiles. I have attempted to calculate the interquartile range using NumPy functions and using Wolfram Alpha. I find all of the answers, from my manual one, to the NumPy one, tothe Wolfram Alpha, to be different. I do not know why this is.

    My attempt in Python is as follows:

    >>> a = numpy.array([1, 2, 3, 4, 5, 6, 7])

    >>> numpy.percentile(a, 25)

    2.5

    >>> numpy.percentile(a, 75)

    5.5

    >>> numpy.percentile(a, 75) - numpy.percentile(a, 25) # IQR

    3.0

    My attempt in Wolfram Alpha is as follows:

    So, I find that the values returned by NumPy and Wolfram Alpha for what I think are the first quartile, the third quartile and the interquartile range are not consistent. Why is this? What should I be doing in Python to calculate the interquartile range correctly?

    As far as I am aware, the interquartile range of [1, 2, 3, 4, 5, 6, 7] should be the following:

    median(5, 6, 7) - median(1, 2, 3) = 4.

    解决方案

    You have 7 numbers which you are attempting to split into quartiles. Because 7 is not divisible by 4 there are a couple of different ways to do this as mentioned here.

    Your way is the first given by that link, wolfram alpha seems to be using the third. Numpy is doing basically the same thing as wolfram however its interpolating based on percentiles (as shown here) rather than quartiles so its getting a different answer. You can choose how numpy handles this using the interpolation option (I tried to link to the documentation but apparently I'm only allowed two links per post).

    You'll have to choose which definition you prefer for your application.

    展开全文
  • My attempt in Python is as follows: >>> a = numpy.array([1, 2, 3, 4, 5, 6, 7]) >>> numpy.percentile(a, 25) 2.5 >>> numpy.percentile(a, 75) 5.5 >>> numpy.percentile(a, 75) - numpy.percentile(a, 25) # ...

    1586010002-jmsa.png

    I have a list of numbers [1, 2, 3, 4, 5, 6, 7] and I want to have a function to return the interquartile range of this list of numbers. The interquartile range is the difference between the upper and lower quartiles. I have attempted to calculate the interquartile range using NumPy functions and using Wolfram Alpha. I find all of the answers, from my manual one, to the NumPy one, tothe Wolfram Alpha, to be different. I do not know why this is.

    My attempt in Python is as follows:

    >>> a = numpy.array([1, 2, 3, 4, 5, 6, 7])

    >>> numpy.percentile(a, 25)

    2.5

    >>> numpy.percentile(a, 75)

    5.5

    >>> numpy.percentile(a, 75) - numpy.percentile(a, 25) # IQR

    3.0

    My attempt in Wolfram Alpha is as follows:

    So, I find that the values returned by NumPy and Wolfram Alpha for what I think are the first quartile, the third quartile and the interquartile range are not consistent. Why is this? What should I be doing in Python to calculate the interquartile range correctly?

    As far as I am aware, the interquartile range of [1, 2, 3, 4, 5, 6, 7] should be the following:

    median(5, 6, 7) - median(1, 2, 3) = 4.

    解决方案

    You have 7 numbers which you are attempting to split into quartiles. Because 7 is not divisible by 4 there are a couple of different ways to do this as mentioned here.

    Your way is the first given by that link, wolfram alpha seems to be using the third. Numpy is doing basically the same thing as wolfram however its interpolating based on percentiles (as shown here) rather than quartiles so its getting a different answer. You can choose how numpy handles this using the interpolation option (I tried to link to the documentation but apparently I'm only allowed two links per post).

    You'll have to choose which definition you prefer for your application.

    展开全文
  • 函数原型DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation=’linear’)参数- q : float or array-like, default 0.5 (50% quantile 即中数-第2四分位数)0
  • 1. 分位数计算案例与Python代码案例1Ex1: Given a data = [6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36],求Q1, Q2, Q3, IQRSolving:步骤:1. 排序,从小到大排列data,data = [6, 7, 15, 36, 39, 40, 41, 42, 43, ...
  • np.percentile(nums, (25, 50, 75), interpolation='midpoint') 返回一个依次包含所有四分位数的列表:[25.5 40. 42.5]。 可以很容易看出来,这个方法可以一次性求任意的分位数。附纯python写法:def median(x): ...
  • 一、计算四分位 #!/usr/bin/python # -*- coding: UTF-8 -*- """ @author:ZSW @file:quantile_distance.py @time:2021/02/05 """ import pandas as pd import numpy as np # 读取excle文件 excel_data = pd.read...
  • python numpy求四分位

    千次阅读 2019-05-20 16:21:00
    import numpy as np ages=[3,3,6,7,7,10,10,10,...lower_q=np.quantile(ages,0.25,interpolation='lower')#下四分位数 higher_q=np.quantile(ages,0.75,interpolation='higher')#上四分位数 int_r=higher_q-lower...
  • 统计学的Python实现-009:四分位数

    千次阅读 2020-05-20 12:10:25
    四分位数有三个,第一个四分位数称为下四分位数,第二个四分位数就是中数,第三个四分位数称为上四分位数,分别用Q1、Q2、Q3表示。 统计学解释 四分位数位置的确定方法有两种。其一是Excel函数QUARTILE.EXC的方法...
  • 四分位距:四分位距(interquartile range),是一种衡量一组数据离散程度的统计量,用IQR表示。其值为第一四分位数和第三四分位数的差距。 四分位距的计算公式如下: IQR=Q3−Q1 IQR=Q_3-Q_1 IQR=Q3​−Q1​ 其中Q1...
  • python 3.8】 AttributeError: 'numpy.ndarray' object has no attribute 'quantile' 报错在于nparray无法使用quantile函数,修改为dataframe后可以使用 ValueError: The truth value of a Series is ambiguous...
  • Python清除异常值四分位

    千次阅读 2019-06-13 12:55:44
    数我们都知道,就是将一组数字按从小到大的顺序排序后,处于中间位置(也就是50%位置)的数字。 同理,第一四分位数、第三四分位数是按从小到大的顺序排序后,处于25%、75%的数字。 令 IQR=Q3−Q1IQR=Q3−Q1 ...
  • 异常值的存在给建模带来极大困扰,在模型构建之前,采用四分位间距法去掉异常值是我们常用的方法,我把代码总结如下: ## del_cols:不用盖帽法处理的列名集合 ## df_data_1:待处理的数据框 def OutliersDeal(df_data...
  • 四分位数和百分位数_20种四分位数

    千次阅读 2020-07-22 10:43:06
    四分位数和百分位数 四分位数 (Quartiles) To calculate a quartile of a sample is in theory easy, and is much like calculating the median. The difficult part is the implementation; contrary to ...
  • #计算中数 def count_median(lis): if len(lis) % 2 == 0: mid = float((lis[len(lis) / 2] + lis[len(lis) / 2 - 1])) / 2 else: mid = lis[len(lis) / 2] return mid #计算上下四分位数 def count_...
  • python_异常值_EllipticEnvelope法和四分位差法 # 加载库 import numpy as np from sklearn.covariance import EllipticEnvelope from sklearn.datasets import make_blobs ​ # 创建爱模拟数据 ​ # sklearn 中 ...
  • 中间的一般为平均值或者中间值,上下表现为四分位范围(但这张图表现为17%-83%),由于最近经常使用这种图,所以结合网上的资料自己修改写了一个子函数可以在python中直接使用 子函数 def tsplot(ax, x, y, n=20, ...
  • 1. 首先介绍Boxplot(箱形图)的定义,这里参考:Understanding Boxplots,非常精彩的一篇介绍boxplot的博文。 该图片显示的即是一个boxplot的... 中值或中数(median), 或第二个四分位数 (second quartile,...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 7,957
精华内容 3,182
关键字:

python四分位

python 订阅