精华内容
下载资源
问答
  • 四分位数(上下四分位数计算公式)
    万次阅读
    2021-07-27 01:20:27

    四分位差(quartile deviation),也称为内距或四分间距(inter-quartile range),它是上四分位数(QU,即位于75%)与下四分位数(QL,即位于25%)的差。计算公式.

    将所有数值按大小顺序排列并分成四等份,处于三个分割点位置的得分就是四分位数。最小的四分位数称为下四分位数,所有数值中,有四分之一小于下四分位数,四分之.

    把一个数组从小到大排序,中位数是中间那个数上四分位数是排在1/4的那个数下四分位数是排在3/4的那个数如果用EXCEL计算($A$1:$A$9为数列)最小值=QUARTILE.

    哪位大神可以给我详细说一下4分位数的具体求法。。我举一个例子。。这里。

    四分位数(Quartile),即统计学中,把所有数值由小到大排列并分成四等份,处于三个分割点位置的得分就是四分位数。第一四分位数 (Q1),又称“较小四分位数”,.

    有一个函数是专门求四分位数的。=quartile(a1:a10,1)

    四分位数和中位数是同一类的概念,将一组数据按大小顺序排列后,按数据的个数分成四份,而这三个分割点上的数值,就称四分位数,具体分别称为:第1四分位数,第2.

    统计学中,把所有数值由小到大排列并分成四等份,处于三个分割点位置的数值就是四分位数。第一四分位数 (Q1),又称“较小四分位数”,等于该样本中所有数值由.

    晕死,这个貌似不是佛法,是财务方法吧。——你看这样解释对不对?——四分位法是zhidao统计学的一种分析方法。简单地说,就是将全部数据从小到大排列,正好排 列.

    众数从=10中位数=10.5下四分位数=9.25上四分位数=13.5平均数=11.1667标准差=2.7579

    如题,是一个数字,比如10,还是一个范围,比如2-12?怎么求中四分位范围。

    四分位数是将全部数据分成相等的四部分,其中每部分包括25%的数据,处在各分位点的数值就是四分位数。 四分位数作为分位数的一种形式,在统计中有着十分重要的.

    要计算过程,怎么算出来的?

    从小到大排序:17,19,20,22,23,23,,24,25 下四分位数等于该样本中所有数值由小到大排列后第25%的数字,即第2个数19。上四分位数等于该样本中所有数值由小到大排列.

    四分位数(Quartile),即统计学中,把所有数值由小到大排列并分成四等份,处于三个分割点位置的数值就是四分位数。 第一四分位数 (Q1),又称“较小四分位数”.

    4分位数有两个25%和75%把一组数据按照大小的顺序排列其中前者的求法是,这个数的前面占全部数据的25%后者是这个数的前面占全部数据的75%

    1/4的我知道,3/4怎么算

    使用excel中quartile的函数.语法(array,quart).参数array为需要求得四分位数值的数组或数字引用区域,quart决定返回哪个四分位值.如果quart取0,1,2,3或4则函数quartile返.

    四分位差是上四分位数与下四分位数之差,也称为内距或四分间距。它主要用于测度顺序数据的离散程度。当然对于数值型数据也可以计算四分位差,但它不适合于分类数.

    lz你好IQR = Q3 ? Q1 四分位距通常是用来构建箱形图,以及对概率分布的简要图表概述。对一个对称性分布数据(其中位数必然等于第三四分位数与第一四分位数的算术.

    75、85、87、95、99、100、101、105、113、115、125 第一个四分位数:。

    75 85 87 |95 99、100、101 105 | 113 115 125 分4段,100为中点 Q1=(87+95)/2=91 Q2=100 Q3=(105+113)/2=109 四分位数:将所有数值按大小顺序排列并分成四等份,.

    嗯,最好举例说一下说得明了一点,用话自己的话解释一下,容易看懂一些各。

    英语是quartile? 你要问的是lower quartile和 upper quartile?将所有的样本从小到大排列,并分成四等份,处于三个分割点位置(是一个数值)的得分就是四分位数。最小.

    下四分位数怎么求啊还有upper extreme和 lower extreme 怎么求,本人在美国。

    四分位数(Quartile),即统计学中,把所有数值由小到大排列并分成四等份,处于三个分割点位置的得分就是四分位数。 第一四分位数 (Q1),又称“较小四分位数”,.

    更多相关内容
  • )就像进行数据处理的时候,有时会遇到求极值(最大值、最小值)、平均值、中位数和四分位数(25%、 75%)的情况。 这一篇博客就是你的福音,让你绝对0基础使用python 进行数据分析。 1、下载py的环境。 这里引用一...
  • 四分位数和百分位数_20种四分位数

    千次阅读 2020-07-22 10:43:06
    四分位数和百分位数 四分位数 (Quartiles) To calculate a quartile of a sample is in theory easy, and is much like calculating the median. The difficult part is the implementation; contrary to ...

    四分位数和百分位数

    四分位数 (Quartiles)

    To calculate a quartile of a sample is in theory easy, and is much like calculating the median. The difficult part is the implementation; contrary to calculating the median, there exists no single specific method that stands above the rest or can be considered the "best" method among the about twenty known methods for calculating a quartile. The "best" method will be the method that fits the purpose or - in some areas - is considered a de-facto standard.

    从理论上说,计算样本的四分位数很容易,并且很像计算中位数。 困难的部分是执行; 与计算中位数相反,在计算四分位数的大约二十种已知方法中,没有任何一种特定的方法可以胜过其他方法,也可以认为是“最佳”方法。 “最佳”方法将是适合目的的方法,或者在某些方面被认为是事实上的标准。

    Why, how, and when to calculate quartiles using which method is outside the scope of this article. Many articles and even books covering this have been written. However, the day you face the task to calculate a quartile using some specific method, the functions here will help you.

    为什么,如何以及何时 使用哪种方法来计算四分位数不在本文的讨论范围之内。 已经写了许多有关此的文章,甚至书籍。 但是,当您面对使用某种特定方法计算四分位数的任务时,此处的功能将为您提供帮助。

    方法 (Methods)

    It is quite hard to even obtain a list of known methods for calculating a quartile, not to say proven results from these. The best source, I've located (see bottom of the article), is quite old and lists 14 methods:

    甚至很难获得已知的计算四分位数的方法的列表,更不用说从中得出的可靠结果。 我找到的最好的资源(请参阅本文的底部)已经很旧了,并列出了14种方法:

    The additional six methods, I have located here and there. Unfortunately, the sources have vanished.

    我已经在这里和那里找到了另外六个方法。 不幸的是,消息来源已经消失了。

    If you are aware of any good source, please add a comment to the article.

    如果您知道任何好的来源,请在文章中添加评论。

    The methods have been collected as an enum including as in-line comments their names, applications, and sources, together with their basic calculation methods for the first and the third quartile (the second is always calculated as the median):

    这些方法已作为一个枚举收集,其中包括它们的名称,应用程序和来源以及它们在第一第三四分位数中的基本计算方法(作为内联注释,以内嵌注释)( 第二个始终以中位数计算):

    ' Quartile calculation methods.
    ' Values equal those listed in the source. See function Quartile.
    '
    ' Common names of variables used in calculation formulas.
    '
    ' L: Q1, Lower quartile.
    ' H: Q3, Higher quartile.
    ' M: Q2, Median (not used here).
    ' n: Count of elements.
    ' p: Calculated position of quartile.
    ' j: Element of dataset.
    ' g: Decimal part of p to be used for interpolation between j and j+1.
    '
    Public Enum ApQuartileMethod
        [_First] = 1
        
        ' Basic calculation methods.
        
        ' Step. Mendenhall and Sincich method.
        '   SAS #3.
        '   Round up to actual element of dataset.
        '   L:  -Int(-n/4)
        '   H: n-Int(-n/4)
        apMendenhallSincich = 1
        
        ' Average step.
        '   SAS #5, Minitab (%DESCRIBE), GLIM (percentile).    '
        '   Add bias of one on basis of n/4.
        '   L:   CLng((n+2)/2)/2
        '   H: n-Clng((n+2)/2)/2
        '   Note:
        '       Replaces these original formulas that don't return the expected values.
        '   L:   (Int((n+1)/4)+Int(n/4))/2+1
        '   H: n-(Int((n+1)/4)+Int(n/4))/2+1
        apAverage = 2
        
        ' Nearest integer to np.
        '   SAS #2.
        '   Round to nearest integer on basis of n/4.
        '   L:   CLng(n/4)
        '   H: n-CLng(n/4)
        '   Note:
        '       Replaces these original formulas that don't return the expected values.
        '   L:   Int((n+2)/4)
        '   H: n-Int((n+2)/4)
        apNearestInteger = 3
        
        ' Parzen method.
        '   Method 1 with interpolation.
        '   SAS #1.
        '   L: n/4
        '   H: 3n/4
        apParzen = 4
        
        ' Hazen method.
        '   Values midway between method 1 steps.
        '   GLIM (interpolate).
        '   Wikipedia method 3.
        '   Add bias of 2, don't round to actual element of dataset.
        '   L: (n+2)/4
        '   H: 3(n+2)/4-1
        apHazen = 5
        
        ' Weibull method.
        '   SAS #4. Minitab (DECRIBE), SPSS, BMDP, Excel exclusive.
        '   Add bias of 1, don't round to actual element of dataset.
        '   L: (n+1)/4
        '   H: 3(n+1)/4
        apWeibull = 6
        
        ' Freund, J. and Perles, B., Gumbell method.
        '   S-PLUS, R, Excel legacy, Excel inclusive, Star Office Calc.
        '   Add bias of 3, don't round to actual element of dataset.
        '   L: (n+3)/4
        '   H: (3n+1)/4
        apFreundPerlesGumbell = 7
        
        ' Median Position.
        '   Median unbiased.
        '   L: (3n+5)/12
        '   H: (9n+7)/12
        apMedianPosition = 8
        
        ' Bernard and Bos-Levenbach.
        '   L: (n/4)+0.4
        '   H: (3n/4)/+0.6
        '   Note:
        '       Reference claims L to be (n/4)+0.31.
        apBernardBosLevenbach = 9
        
        ' Blom's Plotting Position.
        '   Better approximation when the distribution is normal.
        '   L: (4n+7)/16
        '   H: (12n+9)/16
        apBlom = 10
        
        ' Moore's first method.
        '   Add bias of one half step.
        '   L: (n+0.5)/4
        '   H: n-(n+0.5)/4
        apMooreFirst = 11
        
        ' Moore's second method.
        '   Add bias of one or two steps on basis of (n+1)/4.
        '   L:   (Int((n+1)/4)+Int(n/4))/2+1
        '   H: n-(Int((n+1)/4)+Int(n/4))/2+1
        apMooreSecond = 12
        
        ' John Tukey's method.
        '   Include median from odd dataset in dataset for quartile.
        '   Wikipedia method 2.
        '   L:   (1-Int(-n/2))/2
        '   H: n-(-1-Int(-n/2))/2
        apTukey = 13
        
        ' Moore and McCabe (M & M), variation of John Tukey's method.
        '   TI-83.
        '   Wikipedia method 1.
        '   Exclude median from odd dataset in dataset for quartile.
        '   L:   (Int(n/2)+1)/2
        '   H: n-(Int(n/2)-1)/2
        apTukeyMooreMcCabe = 14
        
        ' Additional variations between Weibull's and Hazen's methods, from
        '   (i-0.000)/(n+1.00)
        ' to
        '   (i-0.500)/(n+0.00)
        
        ' Variation of Weibull.
        '   L: n(n/4-0)/(n+1)
        '   H: n(3n/4-0)/(n+1)
        apWeibullVariation = 15
        
        ' Variation of Blom.
        '   L: n(n/4-3/8)/(n+1/4)
        '   H: n(3n/4-3/8)/(n+1/4)
        apBlomVariation = 16
        
        ' Variation of Tukey.
        '   L: n(n/4-1/3)/(n+1/3)
        '   H: n(3n/4-1/3)/(n+1/3)
        apTukeyVariation = 17
        
        ' Variation of Cunnane.
        '   L: n(n/4-2/5)/(n+1/5)
        '   H: n(3n/4-2/5)/(n+1/5)
        apCunnaneVariation = 18
        
        ' Variation of Gringorten.
        '   L: n(n/4-0.44)/(n+0.12)
        '   H: n(3n/4-0.44)/(n+0.12)
        apGringortenVariation = 19
        
        ' Variation of Hazen.
        '   L: n(n/4-1/2)/n
        '   H: n(3n/4-1/2)/n
        apHazenVariation = 20
        
        [_Last] = 20
    End Enum 
    

    The actual calculation methods have been tweaked a little to fit VBA and to correct for weird results when a sample consists of very few elements.

    实际计算方法已进行了一些调整,以适合VBA并在样本包含很少元素的情况下纠正怪异的结果。

    功能 (Functions)

    The main function is named Quartile and has the native domain aggregate functions, DAvg etc., in mind as it takes an Expression, a Domain, and a Criteria (filter) as arguments. Other arguments are the quartile Part to return and the Method to use:

    主函数被命名为四分位数,并具有本机域聚合函数DAvg等,因为它需要一个表达式,一个和一个条件 (过滤器)作为参数。 其他参数是要返回的四分位数部分和要使用的方法

    Expression: Name of the field or an expression to analyse.
    Domain    : Name of the source/query, or an SQL select query, to analyse.
    Criteria  : Optional. A filter expression for Domain.
    Part      : Optional. Which median/quartile or min/max value to return.
                Default is the median value.
    Method    : Optional. Method for calculation of lower/higher quartile.
                Default is the method by Freund, Perles, and Gumbell (Excel).  
    

    The function can be regarded to have four main parts:

    该功能可以认为具有四个主要部分:

    1. Build the SQL to retrieve the ordered samples

      构建SQL以检索有序的样本
    2. Calculate either the minimum or maximum value, the first or third quartile, or the median

      计算最小值或最大值,第一或第三四分位数或中位数
    3. Prepare for interpolation

      准备插值
    4. Calculate the final output

      计算最终输出
    Public Function Quartile( _
        ByVal Expression As String, _
        ByVal Domain As String, _
        Optional ByVal Criteria As String, _
        Optional ByVal Part As ApQuartilePart = ApQuartilePart.apMedian, _
        Optional ByVal Method As ApQuartileMethod = ApQuartileMethod.apFreundPerlesGumbell) _
        As Double
      
        ' SQL.
        Const SqlMask           As String = "Select {0} From {1} {2}"
        Const SqlLead           As String = "Select "
        Const SubMask           As String = "({0}) As T"
        Const FilterMask        As String = "Where {0} "
        Const OrderByMask       As String = "Order By {0} Asc"
        
        Dim Records     As DAO.Recordset
        
        Dim Sql         As String
        Dim SqlSub      As String
        Dim Filter      As String
        Dim Count       As Long     ' n.
        Dim Position    As Double   ' p.
        Dim Element     As Long     ' j.
        Dim Interpolate As Double   ' g.
        Dim ValueOne    As Double
        Dim ValueTwo    As Double
        Dim Value       As Double
        
        ' Return default quartile part if choice of part is
        ' outside the range of ApQuartilePart.
        If Not IsQuartilePart(Part) Then
            Part = ApQuartilePart.apMedian
        End If
        
        ' Use a default calculation method if choice of method is
        ' outside the range of ApQuartileMethod.
        If Not IsQuartileMethod(Method) Then
            Method = ApQuartileMethod.apFreundPerlesGumbell
        End If
        
        If Domain <> "" And Expression <> "" Then
            ' Build SQL to lookup values.
            If InStr(1, LTrim(Domain), SqlLead, vbTextCompare) = 1 Then
                ' Domain is an SQL expression.
                SqlSub = Replace(SubMask, "{0}", Domain)
            Else
                ' Domain is a table or query name.
                SqlSub = Domain
            End If
            If Trim(Criteria) <> "" Then
                ' Build Where clause.
                Filter = Replace(FilterMask, "{0}", Criteria)
            End If
            ' Build final SQL.
            Sql = Replace(Replace(Replace(SqlMask, "{0}", Expression), "{1}", SqlSub), "{2}", Filter) & _
                Replace(OrderByMask, "{0}", Expression)
            Set Records = CurrentDb.OpenRecordset(Sql, dbOpenSnapshot)
          
            With Records
                If Not .EOF = True Then
                    If Part = ApQuartilePart.apMinimum Then
                        ' No need to count records.
                        Count = 1
                    Else
                        ' Count records.
                        .MoveLast
                        Count = .RecordCount
                    End If
                    Select Case Part
                        Case ApQuartilePart.apMinimum
                            ' Current record is first record.
                            ' Read value of this record.
                        Case ApQuartilePart.apMaximum
                            ' Current record is last record.
                            ' Read value of this record.
                        Case ApQuartilePart.apMedian
                            ' Locate position of median.
                            Position = (Count + 1) / 2
                        Case ApQuartilePart.apLower
                            Select Case Method
                                Case ApQuartileMethod.apMendenhallSincich
                                    Position = -Int(-Count / 4)
                                Case ApQuartileMethod.apAverage
                                    Position = CLng((Count + 2) / 2) / 2
                                Case ApQuartileMethod.apNearestInteger
                                    Position = CLng(Count / 4)
                                Case ApQuartileMethod.apParzen
                                    Position = Count / 4
                                Case ApQuartileMethod.apHazen
                                    Position = (Count + 2) / 4
                                Case ApQuartileMethod.apWeibull
                                    Position = (Count + 1) / 4
                                Case ApQuartileMethod.apFreundPerlesGumbell
                                    Position = (Count + 3) / 4
                                Case ApQuartileMethod.apMedianPosition
                                    Position = (3 * Count + 5) / 12
                                Case ApQuartileMethod.apBernardBosLevenbach
                                    Position = (Count / 4) + 0.4
                                Case ApQuartileMethod.apBlom
                                    Position = (4 * Count + 7) / 16
                                Case ApQuartileMethod.apMooreFirst
                                    Position = (Count + 0.5) / 4
                                Case ApQuartileMethod.apMooreSecond
                                    Position = (Int((Count + 1) / 4) + Int(Count / 4)) / 2 + 1
                                Case ApQuartileMethod.apTukey
                                    Position = (1 - Int(-Count / 2)) / 2
                                Case ApQuartileMethod.apTukeyMooreMcCabe
                                    Position = (Int(Count / 2) + 1) / 2
                                Case ApQuartileMethod.apWeibullVariation
                                    Position = Count * (Count / 4) / (Count + 1)
                                Case ApQuartileMethod.apBlomVariation
                                    Position = Count * (Count / 4 - 3 / 8) / (Count + 1 / 4)
                                Case ApQuartileMethod.apTukeyVariation
                                    Position = Count * (Count / 4 - 1 / 3) / (Count + 1 / 3)
                                Case ApQuartileMethod.apCunnaneVariation
                                    Position = Count * (Count / 4 - 2 / 5) / (Count + 1 / 5)
                                Case ApQuartileMethod.apGringortenVariation
                                    Position = Count * (Count / 4 - 0.44) / (Count + 0.12)
                                Case ApQuartileMethod.apHazenVariation
                                    Position = Count * (Count / 4 - 1 / 2) / Count
                            End Select
                        Case ApQuartilePart.apUpper
                            ' Default position for very low counts for several methods
                            Position = Count
                            Select Case Method
                                Case ApQuartileMethod.apMendenhallSincich
                                    If Count > 2 Then
                                        Position = Count - (-Int(-Count / 4))
                                    End If
                                Case ApQuartileMethod.apAverage
                                    If Count > 2 Then
                                        Position = Count - CLng((Count + 2) / 2) / 2
                                    End If
                                Case ApQuartileMethod.apNearestInteger
                                    Position = Count - CLng(Count / 4)
                                Case ApQuartileMethod.apParzen
                                    Position = 3 * Count / 4
                                Case ApQuartileMethod.apHazen
                                    If Count > 1 Then
                                        Position = 3 * (Count + 2) / 4 - 1
                                    End If
                                Case ApQuartileMethod.apWeibull
                                    If Count > 2 Then
                                        Position = 3 * (Count + 1) / 4
                                    End If
                                Case ApQuartileMethod.apFreundPerlesGumbell
                                    Position = (3 * Count + 1) / 4
                                Case ApQuartileMethod.apMedianPosition
                                    If Count > 2 Then
                                        Position = (9 * Count + 7) / 12
                                    End If
                                Case ApQuartileMethod.apBernardBosLevenbach
                                    If Count > 2 Then
                                        Position = (3 * Count / 4) + 0.6
                                    End If
                                Case ApQuartileMethod.apBlom
                                    If Count > 2 Then
                                        Position = (12 * Count + 9) / 16
                                    End If
                                Case ApQuartileMethod.apMooreFirst
                                    Position = Count - (Count + 0.5) / 4
                                Case ApQuartileMethod.apMooreSecond
                                    ' Basic calculation method. Will fail for 2 or 3 elements.
                                    '   Position = Count - (Int((Count + 1) / 4) + Int(Count / 4)) / 2 + 1
                                    ' Calculation method adjusted to accept 2 or 3 elements.
                                    Position = Count - (Int((Count + Int((Count * 2) / (Count + 4))) / 4) + Int(Count / 4)) / 2 + 1
                                Case ApQuartileMethod.apTukey
                                    Position = Count - (-1 - Int(-Count / 2)) / 2
                                Case ApQuartileMethod.apTukeyMooreMcCabe
                                    If Count > 1 Then
                                        Position = Count - (Int(Count / 2) - 1) / 2
                                    End If
                                Case ApQuartileMethod.apWeibullVariation
                                    Position = Count * (3 * Count / 4) / (Count + 1)
                                Case ApQuartileMethod.apBlomVariation
                                    Position = Count * (3 * Count / 4 - 3 / 8) / (Count + 1 / 4)
                                Case ApQuartileMethod.apTukeyVariation
                                    Position = Count * (3 * Count / 4 - 1 / 3) / (Count + 1 / 3)
                                Case ApQuartileMethod.apCunnaneVariation
                                    Position = Count * (3 * Count / 4 - 2 / 5) / (Count + 1 / 5)
                                Case ApQuartileMethod.apGringortenVariation
                                    Position = Count * (3 * Count / 4 - 0.44) / (Count + 0.12)
                                Case ApQuartileMethod.apHazenVariation
                                    Position = Count * (3 * Count / 4 - 1 / 2) / Count
                            End Select
                    End Select
                    Select Case Part
                        Case ApQuartilePart.apMinimum, ApQuartilePart.apMaximum
                            ' Read current row.
                        Case Else
                            .MoveFirst
                            ' Find position of first observation to retrieve.
                            ' If Element is 0, then upper position is first record.
                            ' If Element is not 0 and position is not an integer, then
                            ' read the next observation too.
                            Element = Fix(Position)
                            Interpolate = Position - Element
                            If Count = 1 Then
                                ' Nowhere else to move.
                                If Interpolate < 0 Then
                                    ' Prevent values to be created by extrapolation beyond zero from observation one
                                    ' for these methods:
                                    '   ApQuartileMethod.apBlomVariation
                                    '   ApQuartileMethod.apTukeyVariation
                                    '   ApQuartileMethod.apCunnaneVariation
                                    '   ApQuartileMethod.apGringortenVariation
                                    '   ApQuartileMethod.apHazenVariation
                                    '
                                    ' Comment this line out, if reading by extrapolation *is* requested.
                                    Interpolate = 0
                                End If
                            ElseIf Element > 1 Then
                                ' Move to the record to read.
                                .Move Element - 1
                                ' Special case for apMooreSecond and upper quartile for 2 and 3 elements.
                                If .EOF Then
                                    .MoveLast
                                End If
                            End If
                    End Select
                    ' Retrieve value from first observation.
                    ValueOne = .Fields(0).Value
              
                    Select Case Part
                        Case ApQuartilePart.apMinimum, ApQuartilePart.apMaximum
                            Value = ValueOne
                        Case Else
                            If Interpolate = 0 Then
                                ' Only one observation to read.
                                If Element = 0 Then
                                    ' Return 0.
                                Else
                                    Value = ValueOne
                                End If
                            Else
                                If Element = 0 Or Element = Count Then
                                    ' No first/last observation to retrieve.
                                    ValueTwo = ValueOne
                                    If ValueOne > 0 Then
                                        ' Use 0 as other observation.
                                        ValueOne = 0
                                    Else
                                        ValueOne = 2 * ValueOne
                                    End If
                                Else
                                    ' Move to next observation.
                                    .MoveNext
                                    ' Retrieve value from second observation.
                                    ValueTwo = .Fields(0).Value
                                End If
                                ' For positive values interpolate between 0 and ValueOne.
                                ' For negative values interpolate between 2 * ValueOne and ValueOne.
                                ' Calculate quartile using linear interpolation.
                                Value = ValueOne + Interpolate * CDec(ValueTwo - ValueOne)
                            End If
                    End Select
                End If
                .Close
            End With
        End If
          
        Quartile = Value
    
    End Function 
    

    Two important features are, that the Domain argument can be an SQL select query, and the samples in the passed records do not have to be sorted. The function will itself take care of sorting the samples. 

    两个重要功能是,Domain参数可以是SQL select查询 ,并且传递记录中的样本不必排序 。 该函数本身将负责对样本进行排序。

    Thus, typical usages can be as listed here where the resulting SQL has been included for better understanding of the parsing of the Domain argument done by the function:

    因此,典型用法可以列在此处,其中包括了生成SQL,以更好地理解函数完成的Domain参数的解析:

    ' Example calls and the internally generated SQL:
    '
    '   With fieldname as expression, table (or query) as domain, no filter, and default sorting:
    '       Q1 = Quartile("Data", "Observation", , apFirst, apFreundPerlesGumbell)
    '       Select Data From Observation Order By Data Asc
    '
    '   With two fieldnames as expression, table (or query) as domain, no filter, and sorting on two fields:
    '       Q1 = Quartile("Data, Step", "Observation", , apFirst, apFreundPerlesGumbell)
    '       Select Data, Step From Observation Order By Data, Step Asc
    '
    '   With fieldname as expression, SQL as domain, no filter, and default sorting:
    '       Q1 = Quartile("Data", "Select Data From Observation", , apFirst, apFreundPerlesGumbell)
    '       Select Data From (Select Data From Observation) As T Order By Data Asc
    '
    '   With fieldname as expression, SQL as domain, simple filter, and sorting on one field:
    '       Q1 = Quartile("Data", "Select Data, Step From Observation", "Step = 10", apFirst, apFreundPerlesGumbell)
    '       Select Data From (Select Data, Step From Observation) As T Where Step = 10 Order By Data Asc
    '
    '   With calculated expression, SQL as domain, extended filter, and sorting on one field:
    '       Q1 = Quartile("Data * 10", "Select Data, Step From Observation", "Step = 10 And Data <= 40", apFirst, apFreundPerlesGumbell)
    '       Select Data * 10 From (Select Data, Step From Observation) As T Where Step = 10 And Data <= 40 Order By Data * 10 Asc
    '
    '   With filtered SQL domain, additional filter, and sorting on one field:
    '       Q1 = Quartile("Data", "Select Data, Step From Observation Where Step = 10", "Data <= 40", apFirst, apFreundPerlesGumbell)
    '       Select Data From (Select Data, Step From Observation Where Step = 10) As T Where Data <= 40 Order By Data Asc
    '
    '   With filtered SQL domain, additional filter, and sorting on two fields:
    '       Q1 = Quartile("Step, Data", "Select Data, Step From Observation Where Step = 10", "Data <= 40", apFirst, apFreundPerlesGumbell)
    '       Select Step, Data From (Select Data, Step From Observation Where Step = 10) As T Where Data <= 40 Order By Step, Data Asc 
    

    Note please, that the function is heavily in-line documented as the code otherwise would be uncomprehensive.

    请注意,该函数已大量内联文档,否则代码将不完整。

    域功能 (Domain functions)

    To ease the use, indeed in queries, two domain functions supplement the main function:

    为了简化在查询中的使用,两个域函数补充了主要功能:

    DMedian

    DMedian

    DQuartile

    四分位数

    These mimic the native Dxxx domain aggregate functions and take only the arguments needed, using default values - for DMedian, for the part to return and, for DQuartile, for the calculation method to use; that method has been chosen to be the original method used by Excel (formulas QUARTILE and QUARTILE.INCL):

    它们模仿本地的Dxxx域聚合函数,并使用默认值仅接受所需的参数-对于DMedian,用于返回的部分,对于DQuartile,用于使用的计算方法; 该方法已被选为Excel所使用的原始方法(公式QUARTILE和QUARTILE.INCL):

    ' Returns the median of a field of a table/query.
    '
    ' Parameters:
    '   Expression: Name of the field or an expression to analyse.
    '   Domain    : Name of the source/query, or an SQL select query, to analyse.
    '   Criteria  : Optional. A filter expression for Domain.
    '
    ' Reference and examples: See function Quartile.
    '
    ' Data must be in ascending order by Field.
    '
    ' 2019-08-15. Gustav Brock, Cactus Data ApS, CPH.
    '
    Public Function DMedian( _
        ByVal Expression As String, _
        ByVal Domain As String, _
        Optional ByVal Criteria As String) _
        As Double
        
        Dim Value       As Double
        
        Value = Quartile(Expression, Domain, Criteria)
        
        DMedian = Value
    
    End Function 
    
    ' Returns the upper or lower quartile or the median or the
    ' minimum or maximum value of a field of a table/query
    ' using the method by Freund, Perles, and Gumbell (Excel).
    '
    ' Parameters:
    '   Expression: Name of the field or an expression to analyse.
    '   Domain    : Name of the source/query, or an SQL select query, to analyse.
    '   Criteria  : Optional. A filter expression for Domain.
    '   Part      : Optional. Which median/quartile or min/max value to return.
    '               Default is the median value.
    '
    ' Reference and examples: See function Quartile.
    '
    ' 2019-08-15. Gustav Brock, Cactus Data ApS, CPH.
    '
    Public Function DQuartile( _
        ByVal Expression As String, _
        ByVal Domain As String, _
        Optional ByVal Criteria As String, _
        Optional ByVal Part As ApQuartilePart = ApQuartilePart.apMedian) _
        As Double
        
        Dim Value       As Double
        
        Value = Quartile(Expression, Domain, Criteria, Part)
        
        DQuartile = Value
    
    End Function 
    

    结果 (Results)

    An example workbook with generated results from the Excel formulas is attached for reference. 

    随附一个示例工作簿,其中包含从Excel公式生成的结果,以供参考。

    It displays like this:

    它显示如下:

    The output from the function ListExcelQuartile, found in the attached Access example file, lists identical values.

    在附件的Access示例文件中找到的ListExcelQuartile函数的输出列出了相同的值。

    The two methods are our methods 7 and 6, or the enum elements apFreundPerlesGumbell and apWeibull:

    这两种方法是我们的​​方法76,或者是枚举元素apFreundPerlesGumbellapWeibull

                   100           99            98            97            96            95 
    INCLUDE (LEGACY)
     7            25,75         25,50         25,25         25,00         24,75         24,50         
     7            50,50         50,00         49,50         49,00         48,50         48,00         
     7            75,25         74,50         73,75         73,00         72,25         71,50         
    
    EXCLUDE
     6            25,25         25,00         24,75         24,50         24,25         24,00         
     6            50,50         50,00         49,50         49,00         48,50         48,00         
     6            75,75         75,00         74,25         73,50         72,75         72,00  
    

    Likewise, the function ListFirstQuartile returns an output similar to the results from the main source (table H-4 at top):

    同样,函数ListFirstQuartile返回的输出类似于主源的结果(顶部的表H-4):

                   40            50            60            70 
     1            10,00         20,00         20,00         20,00         
     2            15,00         20,00         20,00         20,00         
     3            10,00         10,00         20,00         20,00         
     4            10,00         12,50         15,00         17,50         
     5            15,00         17,50         20,00         22,50         
     6            12,50         15,00         17,50         20,00         
     7            17,50         20,00         22,50         25,00         
     8            14,17         16,67         19,17         21,67         
     9            14,00         16,50         19,00         21,50         
     10           14,38         16,88         19,38         21,88         
     11           11,25         13,75         16,25         18,75         
     12           20,00         20,00         20,00         25,00         
     13           15,00         20,00         20,00         25,00         
     14           15,00         15,00         20,00         20,00         
     15           8,00          10,42         12,86         15,31         
     16           5,88          8,33          10,80         13,28         
     17           6,15          8,59          11,05         13,52         
     18           5,71          8,17          10,65         13,13         
     19           5,44          7,91          10,39         12,88         
     20           5,00          7,50          10,00         12,50         
    
                   100           99            98            97            96            95 
     1            25,00         25,00         25,00         25,00         24,00         24,00         
     2            25,50         25,00         25,00         25,00         24,50         24,00         
     3            25,00         25,00         24,00         24,00         24,00         24,00         
     4            25,00         24,75         24,50         24,25         24,00         23,75         
     5            25,50         25,25         25,00         24,75         24,50         24,25         
     6            25,25         25,00         24,75         24,50         24,25         24,00         
     7            25,75         25,50         25,25         25,00         24,75         24,50         
     8            25,42         25,17         24,92         24,67         24,42         24,17         
     9            25,40         25,15         24,90         24,65         24,40         24,15         
     10           25,44         25,19         24,94         24,69         24,44         24,19         
     11           25,13         24,88         24,63         24,38         24,13         23,88         
     12           26,00         25,50         25,00         25,00         25,00         24,50         
     13           25,50         25,50         25,00         25,00         24,50         24,50         
     14           25,50         25,00         25,00         24,50         24,50         24,00         
     15           24,75         24,50         24,25         24,00         23,75         23,50         
     16           24,56         24,31         24,06         23,81         23,56         23,31         
     17           24,58         24,33         24,08         23,83         23,58         23,33         
     18           24,55         24,30         24,05         23,80         23,55         23,30         
     19           24,53         24,28         24,03         23,78         23,53         23,28         
     20           24,50         24,25         24,00         23,75         23,50         23,25          
    

    Note please, that column 100-96 here contain the correct values, while in Table H-4 they hold the values for samples 99-95.

    请注意,此处的100-96列包含正确的值,而在表H-4中,它们保留了样本99-95的值。

    The two small examples found on Wikipedia display the results using three different methods which equal our methods 14, 13, and 5 respectively, or the enum elements apTukeyMooreMcCabe, apTukey, and apHazen:

    维基百科上发现的两个小的例子显示使用,它们分别等于我们的方法14,图13,图5三种不同的方法,或枚举元素apTukeyMooreMcCabe,apTukey和 apHazen结果

    例子1
    例子1
    (Example 1 )

    例子2 (Example 2)

    These can be reproduced by the function ListWikipediaSamples:

    这些可以由功能ListWikipediaSamples复制

                  Method 1      Method 2      Method 3
                  
    Q1             15            25,5          20,25 
    Q2             40            40            40 
    Q3             43            42,5          42,75 
    
    Q1             15            15            15 
    Q2             37,5          37,5          37,5 
    Q3             40            40            40  
    

    Also, a query, FirstQuartileAllMethods, is included which will list the results for all sets of samples between 1 and 100 for all 20 methods for the lower quartile. Here's a snip:

    此外,还包含一个查询FirstQuartileAllMethods ,它将针对下四分位数的所有20种方法列出1至100之间的所有样本集的结果。 这是一个片段:

    Finally, a form is included which lets you select any method and then have the results for all three quartiles for every sample between 1 and 100 listed:

    最后,包含一个表格,您可以选择任何方法,然后列出列出的1至100之间的每个样本的所有三个四分位数的结果:


    (
    )

    实作 (Implementation)

    To be able to calculate quartiles, import the module QuartileCode in your application. That's all.

    为了能够计算四分位数,请在您的应用程序中导入模块QuartileCode 。 就这样。

    The other module, QuartileDemo, is only needed for testing and for the demo form (also named QuartileDemo) to display.

    其他模块QuartileDemo仅用于测试和显示的演示表单(也称为QuartileDemo)。

    Bonus tip: Study the form's code to see how to right-align numbers in a Listbox column.

    温馨提示: 研究表单的代码以查看如何在“列表框”列中将数字右对齐。

    结论 (Conclusion)

    From the sparse sources to be located, a function has been created that for just about any practical purpose will allow for the quartiles of a sample of records to be calculated by twenty different methods.

    从要定位的稀疏源中创建了一个函数,该函数几乎可以用于任何实际目的,从而可以通过二十种不同的方法来计算记录样本的四分位数。

    In addition, simplified functions intended to supplement the native domain aggregate functions have been presented. Also, a collection of functions and a query for testing and demonstration have been included.

    另外,已经提出了旨在补充本地域聚合功能的简化功能。 此外,还包括功能集合以及用于测试和演示的查询。

    资料来源 (Sources)

    Original source (now off-line) by David A. Heiser: http://www.daheiser.info/excel/notes/NOTE%20N.pdf

    David A. Heiser的原始资源(现已离线): http ://www.daheiser.info/excel/notes/NOTE%20N.pdf

    Archived source at The Internet Archive: NOTE 20

    Internet存档中的存档源: NOTE 20

    Notes: 

    笔记:

    1. Table H-4, p. 4, has correct data for the dataset for 1-96 while the datasets for 1-100 to 1-97 actually are the datasets for 1-99 to 1-96 shifted one column left. Thus, the dataset for 1-100 is missing, and that for 1-96 is listed twice.

      表H-4,第6页。 4,具有1-96数据集的正确数据,而1-100到1-97的数据集实际上是1-99到1-96的数据集向左移动了一列。 因此,缺少1-100的数据集,并且两次列出了1-96的数据集。
    2. Method 3b is not implemented as no one seems to use it. Neither is no example data given. Thus method 3a has here been labeled method

      方法3b未实现,因为似乎没有人使用它。 没有给出示例数据。 因此,方法3a在这里被标记为方法

    Further notes on quartiles and methods can be found here:

    有关四分位数和方法的更多说明,请参见:

    Wikipedia

    维基百科

    Math Forum

    数学论坛

    HaiWeb

    海网

    murdoch.edu.au (archived)

    murdoch.edu.au(已归档)

    Should you be aware of any good source that can supplement or improve this article, please do not hesitate posting a link as comment.

    如果您知道可以补充或改进本文的任何好的资源,请不要犹豫发布链接作为评论。

    下载 (Download)

    The full and current code is available for download at GitHub: VBA.Quartiles

    完整和当前的代码可从GitHub下载: VBA.Quartiles

    Also, code and a demo application is here: Quartiles 1.0.1.zip 

    另外,代码和演示应用程序也在这里: Quartiles 1.0.1.zip

    An Excel workbook with the presented example: Quartiles.xlsx

    一个带有示例的Excel工作簿: Quartiles.xlsx

    I hope you found this article useful. You are encouraged to ask questions, report any bugs or make any other comments about it below.

    希望本文对您有所帮助。 鼓励您在下面提出问题,报告任何错误或对此作出任何其他评论。

    Note: If you need further "Support" about this topic, please consider using the Ask a Question feature of Experts Exchange. I monitor questions asked and would be pleased to provide any additional support required in questions asked in this manner, along with other EE experts.

    注意 :如果您需要有关此主题的更多“支持”,请考虑使用Experts Exchange 的“提问”功能。 我会监督提出的问题,并很高兴与其他电子工程师一起为以这种方式提出的问题提供所需的任何其他支持。

    Please do not forget to press the "Thumbs Up" button if you think this article was helpful and valuable for EE members.

    如果您认为本文对EE成员有用且有价值,请不要忘记按下“竖起大拇指”按钮。

    翻译自: https://www.experts-exchange.com/articles/33718/20-Varieties-of-Quartiles.html

    四分位数和百分位数

    展开全文
  • 四分位数计算以及使用pandas计算

    千次阅读 2020-12-11 08:18:15
    遇到了四分位数计算问题,因四分位数计算公式不一致,导致结果不一样,坑爹的百度只给了一种计算方法,容易迷惑初学者,故总结如下:计算方法三个四分位数的确定:先按从小到大方法排序,然后使用下列方法。...

    最近学习python数据分析,遇到了四分位数计算问题,因四分位数计算公式不一致,导致结果不一样,坑爹的百度只给了一种计算方法,容易迷惑初学者,故总结如下:

    计算方法

    三个四分位数的确定:

    先按从小到大方法排序,然后使用下列方法。

    方法1:n+1法

    Q1的位置= (n+1) × 0.25

    Q2的位置= (n+1) × 0.5

    Q3的位置= (n+1) × 0.75

    n表示数据的数据个数。

    上面的是大家常用的n+1法。还有一种是n-1法

    方法2:n-1法

    Q1的位置=1+(n-1)x 0.25

    Q2的位置=1+(n-1)x 0.5

    Q3的位置=1+(n-1)x 0.75

    当位置结果为小数时,则用两个位置上的数分别乘以小数和(1-小数)后相加。例如,当结果为6.25时,就用第六个位置的数*0.25+第七个位置的数*0.75后得到结果。

    下面举例说明。

    举例1(奇数个)假设有一组数据6,7,15,36,39,40,41,42,43,47,49。此数据已按从小到大顺序拍寻,因此不需要再排序,如未拍寻,需先进行排序。

    1、下面根据公式(n+1)法计算

    第一四分位数(下四分位数):(11+1)/4 =3,说明它在第三个位置,所以是15,即Q1=15。

    中位数:(11+1)/4*2=6,所以是40。

    第三四分位数(上四分位数):(11+1)/4*3=9, 所以是43。

    至此,Q1=15,Q2=40,Q3=43。

    2、下面根据公式(n-1)法计算

    第一四分位数(下四分位数):1+(11-1)x 0.25 =3.5,则Q1=15x0.5+36x0.5=25.5

    中位数:1+(11-1)x 0.5 =6,则Q2=15x0.5+36x0.5=40

    第三四分位数(上四分位数):1+(11-1)x 0.75 =8.5,则Q3=42x0.5+43x0.5=42.5

    下面用python实现计算。

    1 importpandas as pd2 s1 = pd.Series([6,7,15,36,39,40,41,42,43,47,49])3 s1.describe()

    结果如下:

    count 11.000000mean33.181818std15.873362min6.000000

    25% 25.500000

    50% 40.000000

    75% 42.500000max49.000000dtype: float64

    可见,python运行出来的结果是Q1=25.5 Q2=40 Q3=42.5。

    运行结果与n-1法一样,说明python用的是这种方法。

    举例2(偶数个)

    1 importnumpy as np2 importpandas as pd3 ser_obj=pd.Series([1,2,3,4,5,6])4 ser_obj.describe()

    1、下面根据公式(n+1)法计算

    第一四分位数(下四分位数):(6+1)/4 =1.75,说明它在第1.75位置,所以是1*0.25+2*0.75,即Q1=1.75。

    中位数:(6+1)/4*2=3.5,所以是3*0.5+4*0.5=3.5。

    第三四分位数(上四分位数):(6+1)/4*3=5.25, 所以是5*0.75+6*0.25=5.25。

    至此,Q1=1.75,Q2=3.5,Q3=5.25。

    2、下面根据公式(n-1)法计算

    第一四分位数(下四分位数):1+(6-1)x 0.25 =2.25,则Q1=2x0.75+3x0.25=2.25

    中位数:1+(6-1)x 0.5 =3.5,则Q2=3x0.5+4x0.5=3.5

    第三四分位数(上四分位数):1+(6-1)x 0.75 =4.75,则Q3=4*0.25+5*0.75=4.75

    下面用python实现计算。

    count 6.000000

    mean 3.500000

    std 1.870829

    min 1.000000

    25% 2.250000

    50% 3.500000

    75% 4.750000

    max 6.000000

    因此,pandas使用的是n-1法,人们通常使用n+1法。

    展开全文
  • 数据库查询,最小值,最大值,平均值,上四分位数,中位数,下四分位数

    数据库求分组后求,平均成数,上四分位数,下四分位数,中位数

    1. 新建成绩表,包含字段(序号,班级id,学科名称,成绩,学生id)
    2. 查询班级id(CLASSID )为9的各个学科名称的平均成数,上四分位数,下四分位数,中位数
    """先创建一个数表SUBJECT,插入数据"""
    import pymysql,random
    dbs = pymysql.connect(host='localhost', user='root', password='root', db='demo', port=3306)
    db = dbs.cursor()
    # 创建成绩表 序号,班级id,学科名称,成绩,学生id
    db.execute("""
            CREATE TABLE SUBJECT(
            ID INT(3) PRIMARY KEY,
            CLASSID INT(10),
            SUBJECTNAME VARCHAR(20),
            STATYHOUR float(4),
            GARDEID int(10))
            """
            )
    # 插入数据
    a = 0
    CLASSIDS = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
    SUBJECTNAMEs = ['python','java','c','c++','mysql','db','gauss']
    while a<=1000: # 插入1000条数据
        a += 1
        CLASSID = CLASSIDS[random.randint(0,len(CLASSIDS)-1)]
        SUBJECTNAME = SUBJECTNAMEs[random.randint(0,len(SUBJECTNAMEs)-1)]
        STATYHOUR = round(random.uniform(1,150),2)
        GARDEID = random.randint(1,100)
        sql = "insert into SUBJECT(ID,CLASSID,SUBJECTNAME,STATYHOUR,GARDEID) values ({0},{1},'{2}',{3},{4});".format(a,CLASSID, SUBJECTNAME, STATYHOUR, GARDEID)
        db.execute(sql)
        dbs.commit()
        print(a)
    db.close()
    dbs.close()
    

    先模拟一组数据库数据分析上4:excel表格使用函数 =QUARTILE(B$3:B4,1)
    在这里插入图片描述
    分析发现:
    当行号 N==4的倍数+1的时候,得出的上四分位数恰好为当前行号,理解为:行号R = (N-1)/4+1 ,校验数据,若N=5得出R=2, 当N=9得出R=3, N=1得出R=1

    若N非4的倍数+1,这种情况下计算套用推理公式(R = (N-1)/4*1+1):
    若N=6,得出R=2.25 ,相近行号对应2,3 ;
    套用数据1,行号2,3对应的数据为2,3,
    上四分位数为1,25,而当前位置和差为1,
    推测出:
    上四分位数==(模拟数据(最大值)-模拟数(最小值))×(R-最小行号)+模拟数最小值

    1. 验证
      若N=6 得出R=2.25 — 模拟数据3对上四分位数 = (33-12)×(2.25-2)+12=17.25
      若N=7 得出R=2.5 — 模拟数据3对上四分位数 = (33-12)×(2.5-2)+12=22.5
      若N=8 得出R=2.75 — 模拟数据3对上四分位数 = (33-12)×(2.75-2)+12=27.25
      若N=10 得出R = 3.25 — 模拟数据1对上四分位数== … == 3.25
      若N=11 得出R = 3.5 — 模拟数据1对上四分位数== … == 3.5
      若N=12 得出R = 3.75 — 模拟数据1对上四分位数== … == 3.75
      … 和表格数据校验该方程成立

    第一次写上四分位数sql:

    with 
    t1 as(	select * 
    	from SUBJECT
    	where CLASSID = 9
    	),
    t2 as (	select 
    			SUBJECTNAME as b,
    			round((case
    				count(STATYHOUR) when '1' then max(STATYHOUR)
    			else
    			(
    			(max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*1+1)-min(r))+min(STATYHOUR)
    			) end ),2)as lowerQuartile
    		from(	select 
    				*
    				from (	select *,
    							row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
    							count(*) over(partition by SUBJECTNAME) as n
    						from t1
    					) as a1
    				where r >=((n-1)/4*1+1-0.75) and r<=((n-1)/4*1+1+0.75) 
    		)as t2
    	group by SUBJECTNAME)
    select * from t2;
    
    
    /*
    c	29.8
    c++	47.72
    db	29.18
    gauss	13.55
    java	58.96
    python	59.52
    */
    
    --优化后sql,去除case ... else ...验证结果也正确,这样可以少两行代码
    with 
    t1 as(	select * from 
    		SUBJECT
    	where CLASSID = 9
    	),
    t2 as (	select 
    			SUBJECTNAME as b,
    			round(((max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*1+1)-min(r))+min(STATYHOUR)),2)as lowerQuartile
    		from(	select 
    				*
    				from (	select *,
    							row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
    							count(*) over(partition by SUBJECTNAME) as n
    						from t1
    					) as a1
    				where r >=((n-1)/4*1+1-0.75) and r<=((n-1)/4*1+1+0.75) 
    		)as a2
    	group by SUBJECTNAME)
    select * from t2;
    
    
    /*
    c	29.8
    c++	47.72
    db	29.18
    gauss	13.55
    java	58.96
    python	59.52
    */
    

    分析尝试R = (N-1)/41+1==上四分,那么中位数换位2,下四分换位*3,尝试SQL

    with 
    t1 as(	select * from SUBJECT
    	where CLASSID = 9
    	),
    t2 as (	select 
    			SUBJECTNAME as b,
    			round(((max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*1+1)-min(r))+min(STATYHOUR)),2)as lowerQuartile
    		from(	select 
    				*
    				from (	select *,
    							row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
    							count(*) over(partition by SUBJECTNAME) as n
    						from t1
    					) as a1
    				where r >=((n-1)/4*1+1-0.75) and r<=((n-1)/4*1+1+0.75) 
    		)as a2
    	group by SUBJECTNAME),
    t3 as (	select 
    			SUBJECTNAME as b,
    			round(((max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*2+1)-min(r))+min(STATYHOUR)),2)as median
    		from(	select 
    				*
    				from (	select *,
    							row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
    							count(*) over(partition by SUBJECTNAME) as n
    						from t1
    					) as a1
    				where (r >=((n-1)/4*2+1-0.75) and r<=((n-1)/4*2+1+0.75))
    		)as a2
    	group by SUBJECTNAME),
    t4 as (	select 
    			SUBJECTNAME as b,
    			round(((max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*3+1)-min(r))+min(STATYHOUR)),2)as upperQuartitle
    		from(	select 
    				*
    				from (	select *,
    							row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
    							count(*) over(partition by SUBJECTNAME) as n
    						from t1
    					) as a1
    				where (r >=((n-1)/4*3+1-0.75) and r<=((n-1)/4*3+1+0.75))
    		)as a2
    	group by SUBJECTNAME)
    select t2.b as SUBJECTNAME,lowerQuartile,median,upperQuartitle
    from t2,t3,t4 where t2.b=t3.b and t3.b=t4.b;
    
    
    /*
    c	29.8	50.36	107.87
    c++	47.72	73.86	112.25
    db	29.18	60.29	137.41
    gauss	13.55	29.61	92.71
    java	58.96	107.58	127.41
    python	59.52	71.44	112.83
    */
    
    

    SQL执行成功啦,可是,返回的结果是否正确?
    通过结果 推算公式 编写sql以后继续验证该结果是否正确;
    这种情况下第一想到的是excel表格,突然想起来,我是一名测试,那么,是否可以尝试python的库呢?

    # 创建一个数组,查是否可以找到对应的分位数`
    # pip install numpy 安装库
    import numpy as np
    number = [1,2,3,4,5]
    z = np.percentile(number,(25,50,75), interpolation='midpoint')
    print(z)
    print(type(z))
    
    
    >> [2. 3. 4.]
    >> <class 'numpy.ndarray'>
    

    python实现校验:

    
    import numpy as np
    import pymysql
    
    calculated_value={} #存放数据库查询的分位数
    verify_calculated_value={}  #存放数据库查询的分位数
    class MySqlS():
        """创建一个类,用来校验数据库获取的上四分数,中四分数,下四分数是否一致"""
        def __init__(self):
            self.dbs = pymysql.connect(host='localhost', user='root', password='root', db='demo', port=3306)
            self.db = self.dbs.cursor()
        def __del__(self):
            self.db.close()
            self.dbs.close()
        def query_table_data(self):
            """查询要验证的sql数据,得出:科目名称(SUBJECTNAME):{科目名称(SUBJECTNAME),上四分数(lowerQuartile),中位数(median),下四分数(upperQuartitle)}}"""
            sql = """with 
                    t1 as(	select * from SUBJECT
                        where CLASSID = 9
                        ),
                    t2 as (	select 
                                SUBJECTNAME as b,
                                round(((max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*1+1)-min(r))+min(STATYHOUR)),2)as lowerQuartile
                            from(	select 
                                    *
                                    from (	select *,
                                                row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
                                                count(*) over(partition by SUBJECTNAME) as n
                                            from t1
                                        ) as a1
                                    where r >=((n-1)/4*1+1-0.75) and r<=((n-1)/4*1+1+0.75) 
                            )as t2
                        group by SUBJECTNAME),
                    t3 as (	select 
                                SUBJECTNAME as b,
                                round(((max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*2+1)-min(r))+min(STATYHOUR)),2)as median
                            from(	select 
                                    *
                                    from (	select *,
                                                row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
                                                count(*) over(partition by SUBJECTNAME) as n
                                            from t1
                                        ) as a1
                                    where (r >=((n-1)/4*2+1-0.75) and r<=((n-1)/4*2+1+0.75))
                            )as t2
                        group by SUBJECTNAME),
                    t4 as (	select 
                                SUBJECTNAME as b,
                                round(((max(STATYHOUR)-min(STATYHOUR))*(((max(n)-1)/4*3+1)-min(r))+min(STATYHOUR)),2)as upperQuartitle
                            from(	select 
                                    *
                                    from (	select *,
                                                row_number() over(partition by SUBJECTNAME order by STATYHOUR) as r,
                                                count(*) over(partition by SUBJECTNAME) as n
                                            from t1
                                        ) as a1
                                    where (r >=((n-1)/4*3+1-0.75) and r<=((n-1)/4*3+1+0.75))
                            )as t2
                        group by SUBJECTNAME)
                    select t2.b as SUBJECTNAME,lowerQuartile,median,upperQuartitle
                    from t2,t3,t4 where t2.b=t3.b and t3.b=t4.b;
                    """
            self.db.execute(sql)
            temp_val_1 = self.db.fetchall()
            for i in temp_val_1:
                calculated_value_temp = {i[0]:{ "SUBJECTNAME":i[0],"lowerQuartile":i[1],"median":i[2],"upperQuartitle":i[3]}}
                calculated_value.update(calculated_value_temp)
        def query_sql_data(self):
            """ 查询验证的数值,得出 {科目名称(SUBJECTNAME}:{科目名称(SUBJECTNAME),上四分数(lowerQuartile),中位数(median),下四分数(upperQuartitle)}}"""
            for SUBJECTNAME in calculated_value.keys():
                sql = "select STATYHOUR from SUBJECT where CLASSID = 9 and SUBJECTNAME = '{0}'order by STATYHOUR asc;".format(SUBJECTNAME)
                self.db.execute(sql)
                temp_val_1 = self.db.fetchall()
                SUBJECTNAMES = []
                for i in temp_val_1:
                    SUBJECTNAMES +=list(i)
                z = np.percentile(SUBJECTNAMES, (25, 50, 75), interpolation='midpoint')
                lowerQuartile = round(z[0],2)
                median = round(z[1],2)
                upperQuartitle = round(z[2],2)
                verify_calculated_value_temps =  {SUBJECTNAME :{"SUBJECTNAME":SUBJECTNAME,"lowerQuartile":lowerQuartile,"median":median,"upperQuartitle":upperQuartitle}}
                verify_calculated_value.update(verify_calculated_value_temps)
        def verify_sql_data(self):
            """校验数据"""
            for SUBJECTNAME in calculated_value.keys():
                try:
                    assert calculated_value.get(SUBJECTNAME).get('lowerQuartile')==verify_calculated_value.get(SUBJECTNAME).get('lowerQuartile'),\
                        '{0}=={1}报错\t{2}'.format(calculated_value.get(SUBJECTNAME).get('lowerQuartile'),verify_calculated_value.get(SUBJECTNAME).get('lowerQuartile'),(calculated_value.get(SUBJECTNAME),verify_calculated_value.get(SUBJECTNAME)))
                    assert calculated_value.get(SUBJECTNAME).get('median')==verify_calculated_value.get(SUBJECTNAME).get('median'),\
                        '{0}=={1}报错\t{2}'.format(calculated_value.get(SUBJECTNAME).get('median'),verify_calculated_value.get(SUBJECTNAME).get('median'),(calculated_value.get(SUBJECTNAME),verify_calculated_value.get(SUBJECTNAME)))
                    assert calculated_value.get(SUBJECTNAME).get('upperQuartitle')==verify_calculated_value.get(SUBJECTNAME).get('upperQuartitle'),\
                        '{0}=={1}报错\t{2}'.format(calculated_value.get(SUBJECTNAME).get('upperQuartitle'),verify_calculated_value.get(SUBJECTNAME).get('upperQuartitle'),(calculated_value.get(SUBJECTNAME),verify_calculated_value.get(SUBJECTNAME)))
                except AssertionError as err:
                    print(err)
    if __name__ == '__main__':
        MySqlS().query_table_data()
        MySqlS().query_sql_data()
        MySqlS().verify_sql_data()
    
    
    >> 47.72==44.49报错	({'SUBJECTNAME': 'c++', 'lowerQuartile': 47.72, 'median': 73.86, 'upperQuartitle': 112.25}, {'SUBJECTNAME': 'c++', 'lowerQuartile': 44.49, 'median': 73.86, 'upperQuartitle': 112.65})
    >> 13.55==14.71报错	({'SUBJECTNAME': 'gauss', 'lowerQuartile': 13.55, 'median': 29.61, 'upperQuartitle': 92.71}, {'SUBJECTNAME': 'gauss', 'lowerQuartile': 14.71, 'median': 29.62, 'upperQuartitle': 81.44})
    >> 59.52==59.53报错	({'SUBJECTNAME': 'python', 'lowerQuartile': 59.52, 'median': 71.44, 'upperQuartitle': 112.83}, {'SUBJECTNAME': 'python', 'lowerQuartile': 59.53, 'median': 71.44, 'upperQuartitle': 112.83})
    
    

    ok,执行,然后,断言出错了:

    执行sql,分别找出这几条数据:

    select STATYHOUR from SUBJECT where CLASSID = 9 and SUBJECTNAME= 'c++' order by STATYHOUR asc;
    select STATYHOUR from SUBJECT where CLASSID = 9 and SUBJECTNAME= 'gauss' order by STATYHOUR asc;
    select STATYHOUR from SUBJECT where CLASSID = 9 and SUBJECTNAME= 'python' order by STATYHOUR asc;
    

    数据粘贴到excel:
    在这里插入图片描述

    得到的中位数和我执行查询到的中位数是 一致的,把数据粘贴到python里,查询中

    import numpy as np
    c = [21.86,38.03,50.95,72.3,75.42,111.86,113.43,147.58,21.86,38.03,50.95,72.3,75.42,111.86,113.43,147.58]
    gauss=[1.59,3.52,12.4,17.01,20.29,38.94,58.88,103.99,144.65,144.93]
    python = [16.74,19.82,59.21,59.84,66.57,71.44,87.05,111.84,113.82,122.19,123.31]
    c_lowerQuartile = np.percentile(c,(25), interpolation='midpoint')
    gauss_lowerQuartile =np.percentile(gauss,(25), interpolation='midpoint')
    python_lowerQuartile = np.percentile(python,(25), interpolation='midpoint')
    
    print(c_lowerQuartile)
    print(gauss_lowerQuartile)
    print(python_lowerQuartile)
    
    
    >> 44.49
    >> 14.705000000000002
    >> 59.525000000000006
    

    现在得到的问题是,得到的上四分数,
    excel算出来的和python(numpy )算出来的数值不一致;

    暂时推测:
    excel上四分 = (模拟数据(最大值)-模拟数(最小值))×(R-最小行号)+模拟数最小值

    python上四分 = (模拟数据(最大值)+模拟数(最小值))*(R-最小行号)
    两个公式下得到的结果不一致,
    细推,我的数据不止这三组,其他的数据为什么断言就成功了,找出成功的数据,试试规则;
    在这里插入图片描述
    在计算上次执行成功的数据的时候,遇到了一个新问题,第二次执行断言测试,班级java也出现了断言失败;错误数由上次的三个变成了4个。

    
    
    >> 47.72==44.49报错	({'SUBJECTNAME': 'c++', 'lowerQuartile': 47.72, 'median': 73.86, 'upperQuartitle': 112.25}, {'SUBJECTNAME': 'c++', 'lowerQuartile': 44.49, 'median': 73.86, 'upperQuartitle': 112.65})
    >> 13.55==14.71报错	({'SUBJECTNAME': 'gauss', 'lowerQuartile': 13.55, 'median': 29.61, 'upperQuartitle': 92.71}, {'SUBJECTNAME': 'gauss', 'lowerQuartile': 14.71, 'median': 29.62, 'upperQuartitle': 81.44})
    >> 58.96==58.22报错	({'SUBJECTNAME': 'java', 'lowerQuartile': 58.96, 'median': 107.58, 'upperQuartitle': 127.41}, {'SUBJECTNAME': 'java', 'lowerQuartile': 58.22, 'median': 107.58, 'upperQuartitle': 128.8})
    >> 59.52==59.53报错	({'SUBJECTNAME': 'python', 'lowerQuartile': 59.52, 'median': 71.44, 'upperQuartitle': 112.83}, {'SUBJECTNAME': 'python', 'lowerQuartile': 59.53, 'median': 71.44, 'upperQuartitle': 112.83})
    
    

    简单输入

    z = [1,2,3,4,5,6]
    x= np.percentile(z,(25), interpolation='midpoint')
    print(x)
    
    
    >> 2.5
    进程已结束,退出代码为 0
    

    这里实际的上四分口算应该是2.25
    这种情况下暂时定义excel的结果是正确的,本人编写的sql找到的结果是正确的。
    后续再查找numpy库的逻辑,看看这个区别在哪里

    展开全文
  • 四分位数怎么算excel?

    万次阅读 2021-08-02 08:40:56
    熟练了以后使用excel就可以提高我们的办公效率了,接下来就给大家讲讲四分位数在excel中该怎么计算。操作步骤如下:1.打开excel表格打开需要进行计算四分位数的excel表格,选择要计算的数据单元格,在最小值后面的...
  • 四分位数计算过程

    千次阅读 2020-12-01 01:32:20
    今天在学统计学,被一个 四分位数搞得焦头烂额,网上各种不靠谱,在这里提一句(垃圾百度)最后通过各种途径找方法总结了下面这篇文章第一次写就当是个人的记录吧。其实无论是python(describle方法)还是excel的...
  • I have a list of numbers [1, 2, 3, 4, 5, 6, 7] and I want to have a function to return the interquartile range of this list of numbers. The interquartile range is the difference between the upper and ...
  • 四分位数介绍

    千次阅读 2019-12-25 00:02:38
    描述统计学就是将一系列复杂的数据减少为几个能够起到描述作用的数字,这些有代表性的数字来代表所有的数据,其中有4个很重要的知识点,分别是平均值(μ)、四分位数、标准差(σ)、标准分(z) 四分位数简介 ...
  • 统计学的Python实现-009:四分位数

    千次阅读 2020-05-20 12:10:25
    四分位数有三个,第一个四分位数称为下四分位数,第二个四分位数就是中位数,第三个四分位数称为上四分位数,分别Q1、Q2、Q3表示。 统计学解释 四分位数位置的确定方法有两种。其一是Excel函数QUARTILE.EXC的方法...
  • 使用的是pyspark 1.5.2。我有一个列为“ID”和“Height”的pyspark数据帧,如下所示:| ID1| ID2| height|---------------------------------------------| 1| 000| 44|| ...
  • Python提取CSV数据统计四分位数

    千次阅读 2022-03-13 20:42:34
    # 统计列表中数据的四分位数 import pandas as pd import numpy as np import math # 转换为dataframe数据框形式 df = pd.read_csv('test.csv', sep=',', header=None) # 提取该列数据转换为list形式 data1 = df....
  • 2009年第20期中国高新技术企业NO.20.2009总第131期Chinesehi-techenterprisesCumulativetyNO.131统计学中四分位数的计算张云华江西财经职业学院江西
  • 四分位数

    千次阅读 2019-09-30 10:41:42
    四分位数是统计学里一个很重要的概念,实际应用中,所画出来的箱图,就使用到了这个概念,只有懂了四分位的概念才能看懂箱图所表达的意思。我这里通过一个实际的案例来说明四分位数的求取过程。 首先我们看下数据的...
  • 四分位数SQL实现

    千次阅读 2021-02-22 08:42:29
    四分位数(Quartile),即统计学中,把所有数值由小到大排列并分成四等份,处于三个分割点位置的得分就是四分位数 第一四分位数 (Q1),又称'较小四分位数',等于该样本中所有数值由小到大排列后第25%的数字 第二四分位数 ...
  • 四分位数的两种计算方法

    万次阅读 2020-12-29 20:16:54
    如一组数据 【1,3,6,8,10】 根据公式先求出第一个四分位数的位置1.5,然后再 1 * 0.5 + 3 * 0.5 = 2 得出第一个四分位数为2,如此类推可以得到Q2=6,Q3=9,如此一来,我们便掌握了四分位数的求法。 课后老师...
  • 编辑本词条缺少概述、信息栏、名片图,补充相关内容...是上四分位数与下四分位数之差,用四分位数间距可反映变异程度的大小.即:Q3 --Q1四分位数求法编辑第一步确定四分位数的位置四分位数是将数列等分成四个部分的数...
  • 四分位数计算方法总结

    万次阅读 2019-12-21 20:16:54
    总结一个小知识,仅供参考。 1、数列项为奇数时: 3、5、9、11、17、19、35 先计算位置,在通过位置计算对应的数值 ...当下标正好为整数时,对应的数值为Q1=5、Q2=11、Q3=19 ...Q2:(n+1)*...
  • 本期给大家介绍的是数据分析基础系列,主要给大家介绍的是四分位数的原理与应用,四分位数的计算方式,并基于四分位数,画出箱体图,简要介绍如何通过箱体图来检测数据离群值。结合学习成绩与收入的案例分析,内容...
  • 中位数,四分位数

    千次阅读 2021-04-19 04:05:40
    2. 未经变换的资料可使用中位数描述其集中趋势,用四分位数间距描述其离散程度。 三、Excel,Matlab求四分位数 先说Excel: MEDIAN(array)中位数 QUARTILE(array,quart) 第二参数为:0--最小值,相当于min 1--25%的值...
  • 四分位数求法

    千次阅读 2020-03-28 17:52:45
    四分位数间距:是上四分位数与下四分位数之差,用四分位数间距可反映变异程度的大小. 即:Q3 --Q1 四分位数求法 第一步  确定四分位数的位置 四分位数是将数列等分成四个部分的数,一个数列有三个四分位数,设下四分...
  • Java获取四分位数

    2021-12-06 17:30:39
    金额贡献的四分位 int[] param = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12,13}; // BigDecimal[] datas = new BigDecimal[param.length]; for (int i = 0; i < param.length; i++) { datas[i] = BigDecimal....
  • 函数原型DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation=’linear’)参数- q : float or array-like, default 0.5 (50% quantile 即中位数-第2四分位数)0
  • 四分位数的数学计算以及使用pandas计算前言参考举例1(奇数个)第一四分位数(下四分位数)中位数第三四分位数(上四分位数)举例2(偶数个)第一四分位数(下四分位数)中位数第三四分位数(上四分位数) ...
  • 脚本的第一行包含一个示例数据集。... 吝啬的1-sigma(标准偏差) 中位数第一个四分位数(第 25 个百分位数) 第二个四分位数(第 50 个百分位数) 第三四分位数(第 75 个百分位数) 第 k 个百分位智商标准识别码
  • 计算核验: 1、数列项为奇数时: 3、5、9、11、17、19、35 先计算位置,在通过位置计算对应的数值 Q1:(n+1)*0.25=2 ...当下标正好为整数时,对应的数值为Q1=5、Q2=11、Q3=19 ...当计算的下标不是整数时,对应的数值...
  • 四分位数及matlab实现

    千次阅读 2021-04-18 13:22:13
    四分位数(quantile),解释及调用形式如下。quantile(x,y,z)的三个参数的说明如下:x表示要求的矩阵或者向量;y的取值为表示要求的分位数,如四分之一中位数0.25,四分之三中位数0.75等;z的取值为1或者2,若值为1则...
  • 否则采用中位数(四分位数间距)进行统计描述,采用非参数检验进行组间比较。大家对于四分位数间距可能会比较陌生,一般遇到数据不符合正态分布时,手足无措。今天,我们一起来看看。1四分位数(Quartile)是统计学中分...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 11,590
精华内容 4,636
关键字:

四分位数什么用

友情链接: johnson.rar