精华内容
下载资源
问答
  • scipy.stats

    2018-11-23 16:12:26
    scipy.stats.describe(a, axis=0, ddof=1, bias=True, nan_policy=‘propagate’) a:数据 axis:计算统计数据的轴。默认值为0。如果没有,计算整个数组a ddof:自由度(仅用于方差)默认值为1 bias:如果为假,则对偏态和...

    scipy.stats
    参数
    scipy.stats.describe(a, axis=0, ddof=1, bias=True, nan_policy=‘propagate’)
    a:数据
    axis:计算统计数据的轴。默认值为0。如果没有,计算整个数组a
    ddof:自由度(仅用于方差)默认值为1
    bias:如果为假,则对偏态和峰度计算进行统计偏差校正
    nan_policy:定义当输入包含nan时如何处理。“propagate”返回nan,“raise”抛出错误,“ignore”执行忽略nan值的计算。默认设置是“propagate”

    返回值
    1.观测次数(沿轴线的数据长度)。当“omit”被选择为nan_policy时,每一列都被单独计算
    2.数据数组的最小值和最大值
    3.数据沿轴的算术平均值
    4.数据沿轴的无偏方差,分母为观测次数减1
    5.偏度,基于矩计算,分母等于观测次数,即无自由度修正。
    6.峰度(费舍尔)。峰度被归一化,使其在正态分布中为零。没有使用自由度

    展开全文
  • 今天小编就为大家分享一篇python统计函数库scipy.stats的用法解析,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
  • Statistics ( scipy.stats) TOC \o "1-3" \h \z \u Statistics (scipy.stats) 1 介绍 1 随机变量 2 取得帮助 2 通用方法 4 位移和缩放 6 形态参数 8 冻结分布 9 广播 10 离散分布特殊之处 11 分布拟合 13 性能问题和...
  • Statistics ( scipy.stats) TOC \o "1-3" \h \z \u Statistics (scipy.stats) 1 介绍 1 随机变量 2 获得帮助 2 通用方法 4 位移与缩放 6 形态参数 8 冻结分布 9 广播 10 离散分布的特殊之处 11 分布拟合 13 性能问题...
  • scipy.stats.norm

    千次阅读 2018-04-03 16:56:07
    参考:https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html 首先来看下stats模块。Scipy的stats模块包含了多种概率分布的随机变量,随机变量分为连续和离散两种。所有连续随机变量都是rv_...

    参考:https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

    首先来看下stats模块。Scipy的stats模块包含了多种概率分布的随机变量,随机变量分为连续和离散两种。所有连续随机变量都是rv_continuous的派生类的对象,而所有的离散随机变量都是rv_discrete的派生类的对象。

    可以使用下面的语句获得stats模块中的所有的连续随机变量:

    from scipy import stats
    print([k for k,v in stats.__dict__.items() if isinstance(v,stats.rv_continuous)])
    

    输出:

    ['genpareto', 'kappa4', 'pareto', 'genexpon', 'dweibull', 'frechet_l', 'fisk', 'erlang', 'exponpow', 
    'gumbel_r', 'nakagami', 't', 'mielke', 'rdist', 'gausshyper', 'triang', 'levy_stable', 'halfnorm', 
    'skewnorm', 'cosine', 'kstwobign', 'gumbel_l', 'invgamma', 'johnsonsu', 'expon', 'norm', 
    'truncnorm', 'dgamma', 'kappa3', 'gennorm', 'foldnorm', 'halfgennorm', 'pearson3', 'exponweib', 
    'truncexpon', 'loggamma', 'tukeylambda', 'rice', 'uniform', 'powernorm', 'genlogistic', 
    'recipinvgauss', 'reciprocal', 'gengamma', 'lomax', 'alpha', 'laplace', 'hypsecant', 'ksone', 'ncf', 
    'vonmises', 'maxwell', 'fatiguelife', 'loglaplace', 'levy', 'genextreme', 'chi2', 'argus', 'burr12', 
    'johnsonsb', 'frechet_r', 'gilbrat', 'invweibull', 'ncx2', 'semicircular', 'wrapcauchy', 'gamma', 
    'levy_l', 'weibull_max', 'bradford', 'invgauss', 'gompertz', 'cauchy', 'chi', 'powerlognorm', 
    'weibull_min', 'wald', 'halfcauchy', 'powerlaw', 'exponnorm', 'beta', 'arcsine', 'f', 
    'halflogistic', 'vonmises_line', 'trapz', 'anglit', 'burr', 'lognorm', 'betaprime', 'logistic', 
    'nct', 'rayleigh', 'foldcauchy', 'genhalflogistic']
    
    

    连续随机变量对象都有如下方法:

    • rvs:对随机变量进行随机取值,可以通过size参数指定输出的数组的大小。
    • pdf:随机变量的概率密度函数
    • cdf:随机变量的累积分布函数,它是概率密度函数的积分
    • sf:随机变量的生存函数,它的值是1-cdf(t)
    • ppf:累积分布函数的反函数
    • stats:计算随机变量的期望值和方差
    • fit:对一组随机采样进行拟合,找出最合适取样数据的概率密度函数的系数。

    接下来看下scipy.stats.norm例子:

    from scipy.stats import norm
    import matplotlib.pyplot as plt
    import numpy as np
    
    fig, ax = plt.subplots(1, 1)
    mean, var, skew, kurt = norm.stats(moments='mvsk')
    
    x = np.linspace(norm.ppf(0.01), norm.ppf(0.99), 100)
    ax.plot(x, norm.pdf(x), 'r-', lw=5, alpha=0.6, label='norm pdf')
    
    rv = norm()
    ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')
    
    vals = norm.ppf([0.001, 0.5, 0.999])
    np.allclose([0.001, 0.5, 0.999], norm.cdf(vals))
    r = norm.rvs(size=1000)
    
    ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2)
    ax.legend(loc='best', frameon=False)
    plt.show()
    

    输出:

    这里写图片描述

    展开全文
  • Scipy教程 - 统计函数库scipy.stats

    万次阅读 多人点赞 2015-10-30 18:44:12
    http://blog.csdn.net/pipisorry/article/details/49515215...那就是scipy里面的statsScipystats模块包含了多种概率分布的随机变量,随机变量分为连续的和离散的两种。所有的连续随机变量都是rv_continuous的派生类

    http://blog.csdn.net/pipisorry/article/details/49515215

    统计函数Statistical functions(scipy.stats)

    Python有一个很好的统计推断包。那就是scipy里面的stats。

    Scipy的stats模块包含了多种概率分布的随机变量,随机变量分为连续的和离散的两种。
    所有的连续随机变量都是rv_continuous的派生类的对象,而所有的离散随机变量都是 rv_discrete的派生类的对象。

    This module contains a large number of probability distributions as well as a growing library of statistical functions.

    Each univariate distribution is an instance of a subclass of rv_continuous(rv_discrete for discrete distributions):

    rv_continuous([momtype, a, b, xtol, ...])A generic continuous random variable class meant for subclassing.
    rv_discrete([a, b, name, badvalue, ...])A generic discrete random variable class meant for subclassing.

    皮皮blog



    连续分布及其相关的函数

    连续分布

    alphaAn alpha continuous random variable.
    anglitAn anglit continuous random variable.
    arcsineAn arcsine continuous random variable.
    betaA beta continuous random variable.
    betaprimeA beta prime continuous random variable.
    bradfordA Bradford continuous random variable.
    burrA Burr (Type III) continuous random variable.
    burr12A Burr (Type XII) continuous random variable.
    cauchyA Cauchy continuous random variable.
    chiA chi continuous random variable.
    chi2A chi-squared continuous random variable.
    cosineA cosine continuous random variable.
    dgammaA double gamma continuous random variable.
    dweibullA double Weibull continuous random variable.
    erlangAn Erlang continuous random variable.
    exponAn exponential continuous random variable.
    exponnormAn exponentially modified Normal continuous random variable.
    exponweibAn exponentiated Weibull continuous random variable.
    exponpowAn exponential power continuous random variable.
    fAn F continuous random variable.
    fatiguelifeA fatigue-life (Birnbaum-Saunders) continuous random variable.
    fiskA Fisk continuous random variable.
    foldcauchyA folded Cauchy continuous random variable.
    foldnormA folded normal continuous random variable.
    frechet_rA Frechet right (or Weibull minimum) continuous random variable.
    frechet_lA Frechet left (or Weibull maximum) continuous random variable.
    genlogisticA generalized logistic continuous random variable.
    gennormA generalized normal continuous random variable.
    genparetoA generalized Pareto continuous random variable.
    genexponA generalized exponential continuous random variable.
    genextremeA generalized extreme value continuous random variable.
    gausshyperA Gauss hypergeometric continuous random variable.
    gammaA gamma continuous random variable.
    gengammaA generalized gamma continuous random variable.
    genhalflogisticA generalized half-logistic continuous random variable.
    gilbratA Gilbrat continuous random variable.
    gompertzA Gompertz (or truncated Gumbel) continuous random variable.
    gumbel_rA right-skewed Gumbel continuous random variable.
    gumbel_lA left-skewed Gumbel continuous random variable.
    halfcauchyA Half-Cauchy continuous random variable.
    halflogisticA half-logistic continuous random variable.
    halfnormA half-normal continuous random variable.
    halfgennormThe upper half of a generalized normal continuous random variable.
    hypsecantA hyperbolic secant continuous random variable.
    invgammaAn inverted gamma continuous random variable.
    invgaussAn inverse Gaussian continuous random variable.
    invweibullAn inverted Weibull continuous random variable.
    johnsonsbA Johnson SB continuous random variable.
    johnsonsuA Johnson SU continuous random variable.
    kappa4Kappa 4 parameter distribution.
    kappa3Kappa 3 parameter distribution.
    ksoneGeneral Kolmogorov-Smirnov one-sided test.
    kstwobignKolmogorov-Smirnov two-sided test for large N.
    laplaceA Laplace continuous random variable.
    levyA Levy continuous random variable.
    levy_lA left-skewed Levy continuous random variable.
    levy_stableA Levy-stable continuous random variable.
    logisticA logistic (or Sech-squared) continuous random variable.
    loggammaA log gamma continuous random variable.
    loglaplaceA log-Laplace continuous random variable.
    lognormA lognormal continuous random variable.
    lomaxA Lomax (Pareto of the second kind) continuous random variable.
    maxwellA Maxwell continuous random variable.
    mielkeA Mielke’s Beta-Kappa continuous random variable.
    nakagamiA Nakagami continuous random variable.
    ncx2A non-central chi-squared continuous random variable.
    ncfA non-central F distribution continuous random variable.
    nctA non-central Student’s T continuous random variable.
    normA normal continuous random variable.
    paretoA Pareto continuous random variable.
    pearson3A pearson type III continuous random variable.
    powerlawA power-function continuous random variable.
    powerlognormA power log-normal continuous random variable.
    powernormA power normal continuous random variable.
    rdistAn R-distributed continuous random variable.
    reciprocalA reciprocal continuous random variable.
    rayleighA Rayleigh continuous random variable.
    riceA Rice continuous random variable.
    recipinvgaussA reciprocal inverse Gaussian continuous random variable.
    semicircularA semicircular continuous random variable.
    skewnormA skew-normal random variable.
    tA Student’s T continuous random variable.
    trapzA trapezoidal continuous random variable.
    triangA triangular continuous random variable.
    truncexponA truncated exponential continuous random variable.
    truncnormA truncated normal continuous random variable.
    tukeylambdaA Tukey-Lamdba continuous random variable.
    uniformA uniform continuous random variable.
    vonmisesA Von Mises continuous random variable.
    vonmises_lineA Von Mises continuous random variable.
    waldA Wald continuous random variable.
    weibull_minA Frechet right (or Weibull minimum) continuous random variable.
    weibull_maxA Frechet left (or Weibull maximum) continuous random variable.
    wrapcauchyA wrapped Cauchy continuous random variable.

    连续随机变量对象的方法

    rvs(*args, **kwds)Random variates of given type.产生服从这种分布的一个样本,对随机变量进行随机取值,可以通过size参数指定输出的数组大小。
    pdf(x, *args, **kwds)Probability density function at x of the given RV.随机变量的概率密度函数。产生对应x的这种分布的y值。
    logpdf(x, *args, **kwds)Log of the probability density function at x of the given RV.
    cdf(x, *args, **kwds)Cumulative distribution function of the given RV.随机变量的累积分布函数,它是概率密度函数的积分(也就是x时p(X<x)的概率)。产生对应x的这种分布的累积分布函数的值。
    logcdf(x, *args, **kwds)Log of the cumulative distribution function at x of the given RV.
    sf(x, *args, **kwds)Survival function (1 - cdf) at x of the given RV.随机变量的生存函数,它的值是1-cdf(t)。
    logsf(x, *args, **kwds)Log of the survival function of the given RV.
    ppf(q, *args, **kwds)Percent point function (inverse of cdf) at q of the given RV.累积分布函数的反函数。q=0.01时,ppf就是p(X<x)=0.01时的x值。
    isf(q, *args, **kwds)Inverse survival function (inverse of sf) at q of the given RV.
    moment(n, *args, **kwds)n-th order non-central moment of distribution.
    stats(*args, **kwds)Some statistics of the given RV.计算随机变量的期望值和方差
    entropy(*args, **kwds)Differential entropy of the RV.
    expect([func, args, loc, scale, lb, ub, ...])Calculate expected value of a function with respect to the distribution.
    median(*args, **kwds)Median of the distribution.
    mean(*args, **kwds)Mean of the distribution.
    std(*args, **kwds)Standard deviation of the distribution.
    var(*args, **kwds)Variance of the distribution.
    interval(alpha, *args, **kwds)Confidence interval with equal areas around the median.
    __call__(*args, **kwds)Freeze the distribution for the given arguments.
    fit(data, *args, **kwds)Return MLEs for shape, location, and scale parameters from data.对一组随机取样进行拟合,找出最适合取样数据的概率密度函数的系数。如stats.norm.fit(x)就是将x看成是某个norm分布的抽样,求出其最好的拟合参数(mean, std)。
    fit_loc_scale(data, *args)Estimate loc and scale parameters from data using 1st and 2nd moments.
    nnlf(theta, x)Return negative loglikelihood function.
    [ Continuous distributions]

    [scipy.stats.rv_continuous]

    多变量分布Multivariate distributions

    multivariate_normalA multivariate normal random variable.
    matrix_normalA matrix normal random variable.
    dirichletA Dirichlet random variable.
    wishartA Wishart random variable.
    invwishartAn inverse Wishart random variable.
    special_ortho_groupA matrix-valued SO(N) random variable.
    ortho_groupA matrix-valued O(N) random variable.
    random_correlationA random correlation matrix.

    multivariate_normal

    >>> x, y = np.mgrid[-1:1:.01, -1:1:.01]
    >>> pos = np.dstack((x, y))   #二维坐标组合成三维坐标点坐标
    >>> rv = multivariate_normal([0.5, -0.2], [[2.0, 0.3], [0.3, 0.5]])
    >>> rv.pdf(pos)  #接受的参数是三维数据,第三维代表一个数据坐标,1、2维代表网格坐标位置。

    皮皮blog



    离散分布及其相关的函数

    当分布函数的值域为离散时,称之为离散概率分布。例如投掷有6个面的骰子时,只能获得1到6的整数,因此得到的概率分布为离散的。

    对于离散随机分布,通常使用概率质量函数(PMF)描述其分布情况。在stats库中所有描述离散分布的随机变量都从rv_discrete类继承。

    直接用rv_discrete 类自定义离散概率分布

    stats.rv_discrete(values=(x,p))中的参数表示随机变量x和其对应的概率。

    设有一个不均匀的骰子,各点出现的概率不相等。可以用下面的数组x保存骰子的所有可能值,数组p保存每个值出现的概率:
    >>> x = range(1,7)
    >>> p = (0.4, 0.2, 0.1, 0.1, 0.1, 0.1)
    用下面的语句定义表示这个特殊骰子的随机变量,并调用其rvs()方法投掷此骰子20次,获得符合概率p的随机数:
    >>> dice = stats.rv_discrete(values=(x,p))
    >>> dice.rvs(size=20)
    Array([2, 5, 1, 2, 1, 1, 2, 4, 1, 3, 1, 1, 4, 3, 1, 1, 1, 2, 6, 4])

    from scipy import stats
    import numpy as np
    import matplotlib.pyplot as plt
    fs_meetsig = np.random.random(30)
    fs_xk = np.sort(fs_meetsig)
    fs_pk = np.ones_like(fs_xk) / len(fs_xk)
    fs_rv_dist = stats.rv_discrete(name='fs_rv_dist', values=(fs_xk, fs_pk))
    
    plt.plot(fs_xk, fs_rv_dist.cdf(fs_xk), 'b-', ms=12, mec='r', label='friend')
    plt.show()
    

    [rv_discrete Examples]

    离散分布

    bernoulliA Bernoulli discrete random variable.
    binomA binomial discrete random variable.
    boltzmannA Boltzmann (Truncated Discrete Exponential) random variable.
    dlaplaceA Laplacian discrete random variable.
    geomA geometric discrete random variable.
    hypergeomA hypergeometric discrete random variable.
    logserA Logarithmic (Log-Series, Series) discrete random variable.
    nbinomA negative binomial discrete random variable.
    planckA Planck discrete exponential random variable.
    poissonA Poisson discrete random variable.
    randintA uniform discrete random variable.
    skellamA Skellam discrete random variable.
    zipfA Zipf discrete random variable.

    离散分布的函数

    rvs(*args, **kwargs)Random variates of given type.
    pmf(k, *args, **kwds)Probability mass function at k of the given RV.
    logpmf(k, *args, **kwds)Log of the probability mass function at k of the given RV.
    cdf(k, *args, **kwds)Cumulative distribution function of the given RV.
    logcdf(k, *args, **kwds)Log of the cumulative distribution function at k of the given RV.
    sf(k, *args, **kwds)Survival function (1 - cdf) at k of the given RV.
    logsf(k, *args, **kwds)Log of the survival function of the given RV.
    ppf(q, *args, **kwds)Percent point function (inverse of cdf) at q of the given RV.
    isf(q, *args, **kwds)Inverse survival function (inverse of sf) at q of the given RV.
    moment(n, *args, **kwds)n-th order non-central moment of distribution.
    stats(*args, **kwds)Some statistics of the given RV.
    entropy(*args, **kwds)Differential entropy of the RV.
    expect([func, args, loc, lb, ub, ...])Calculate expected value of a function with respect to the distribution for discrete distribution.
    median(*args, **kwds)Median of the distribution.
    mean(*args, **kwds)Mean of the distribution.
    std(*args, **kwds)Standard deviation of the distribution.
    var(*args, **kwds)Variance of the distribution.
    interval(alpha, *args, **kwds)Confidence interval with equal areas around the median.
    __call__(*args, **kwds)Freeze the distribution for the given arguments.

    皮皮blog



    统计函数Statistical functions

    {scipy.stats顶层函数,可以应用于很多分布的函数}

    Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

    describe(a[, axis, ddof, bias, nan_policy])Computes several descriptive statistics of the passed array.
    gmean(a[, axis, dtype])Compute the geometric mean along the specified axis.
    hmean(a[, axis, dtype])Calculates the harmonic mean along the specified axis.
    kurtosis(a[, axis, fisher, bias, nan_policy])Computes the kurtosis (Fisher or Pearson) of a dataset.
    kurtosistest(a[, axis, nan_policy])Tests whether a dataset has normal kurtosis
    mode(a[, axis, nan_policy])Returns an array of the modal (most common) value in the passed array.
    moment(a[, moment, axis, nan_policy])Calculates the nth moment about the mean for a sample.
    normaltest(a[, axis, nan_policy])Tests whether a sample differs from a normal distribution.
    skew(a[, axis, bias, nan_policy])Computes the skewness of a data set.
    skewtest(a[, axis, nan_policy])Tests whether the skew is different from the normal distribution.
    kstat(data[, n])Return the nth k-statistic (1<=n<=4 so far).
    kstatvar(data[, n])Returns an unbiased estimator of the variance of the k-statistic.
    tmean(a[, limits, inclusive, axis])Compute the trimmed mean.
    tvar(a[, limits, inclusive, axis, ddof])Compute the trimmed variance
    tmin(a[, lowerlimit, axis, inclusive, ...])Compute the trimmed minimum
    tmax(a[, upperlimit, axis, inclusive, ...])Compute the trimmed maximum
    tstd(a[, limits, inclusive, axis, ddof])Compute the trimmed sample standard deviation
    tsem(a[, limits, inclusive, axis, ddof])Compute the trimmed standard error of the mean.
    variation(a[, axis, nan_policy])Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.
    find_repeats(arr)Find repeats and repeat counts.
    trim_mean(a, proportiontocut[, axis])Return mean of array after trimming distribution from both tails.
    cumfreq(a[, numbins, defaultreallimits, weights])Returns a cumulative frequency histogram, using the histogram function.
    histogram2(*args, **kwds)histogram2 is deprecated!
    histogram(*args, **kwds)histogram is deprecated!
    itemfreq(a)Returns a 2-D array of item frequencies.
    percentileofscore(a, score[, kind])The percentile rank of a score relative to a list of scores.
    scoreatpercentile(a, per[, limit, ...])Calculate the score at a given percentile of the input sequence.
    relfreq(a[, numbins, defaultreallimits, weights])Returns a relative frequency histogram, using the histogram function.
    binned_statistic(x, values[, statistic, ...])Compute a binned statistic for one or more sets of data.
    binned_statistic_2d(x, y, values[, ...])Compute a bidimensional binned statistic for one or more sets of data.
    binned_statistic_dd(sample, values[, ...])Compute a multidimensional binned statistic for a set of data.
    obrientransform(*args)Computes the O’Brien transform on input data (any number of arrays).
    signaltonoise(*args, **kwds)signaltonoise is deprecated!
    bayes_mvs(data[, alpha])Bayesian confidence intervals for the mean, var, and std.
    mvsdist(data)‘Frozen’ distributions for mean, variance, and standard deviation of data.
    sem(a[, axis, ddof, nan_policy])Calculates the standard error of the mean (or standard error of measurement) of the values in the input array.
    zmap(scores, compare[, axis, ddof])Calculates the relative z-scores.
    zscore(a[, axis, ddof])Calculates the z score of each value in the sample, relative to the sample mean and standard deviation.
    iqr(x[, axis, rng, scale, nan_policy, ...])Compute the interquartile range of the data along the specified axis.
    sigmaclip(a[, low, high])Iterative sigma-clipping of array elements.
    threshold(*args, **kwds)threshold is deprecated!
    trimboth(a, proportiontocut[, axis])Slices off a proportion of items from both ends of an array.
    trim1(a, proportiontocut[, tail, axis])Slices off a proportion from ONE end of the passed array distribution.
    f_oneway(*args)Performs a 1-way ANOVA.
    pearsonr(x, y)Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
    spearmanr(a[, b, axis, nan_policy])Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.
    pointbiserialr(x, y)Calculates a point biserial correlation coefficient and its p-value.
    kendalltau(x, y[, initial_lexsort, nan_policy])Calculates Kendall’s tau, a correlation measure for ordinal data.
    linregress(x[, y])Calculate a linear least-squares regression for two sets of measurements.
    theilslopes(y[, x, alpha])Computes the Theil-Sen estimator for a set of points (x, y).
    f_value(*args, **kwds)f_value is deprecated!
    ttest_1samp(a, popmean[, axis, nan_policy])Calculates the T-test for the mean of ONE group of scores.
    ttest_ind(a, b[, axis, equal_var, nan_policy])Calculates the T-test for the means of two independent samples of scores.
    ttest_ind_from_stats(mean1, std1, nobs1, ...)T-test for means of two independent samples from descriptive statistics.
    ttest_rel(a, b[, axis, nan_policy])Calculates the T-test on TWO RELATED samples of scores, a and b.
    kstest(rvs, cdf[, args, N, alternative, mode])Perform the Kolmogorov-Smirnov test for goodness of fit.
    chisquare(f_obs[, f_exp, ddof, axis])Calculates a one-way chi square test.
    power_divergence(f_obs[, f_exp, ddof, axis, ...])Cressie-Read power divergence statistic and goodness of fit test.
    ks_2samp(data1, data2)Computes the Kolmogorov-Smirnov statistic on 2 samples.
    mannwhitneyu(x, y[, use_continuity, alternative])Computes the Mann-Whitney rank test on samples x and y.
    tiecorrect(rankvals)Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests.
    rankdata(a[, method])Assign ranks to data, dealing with ties appropriately.
    ranksums(x, y)Compute the Wilcoxon rank-sum statistic for two samples.
    wilcoxon(x[, y, zero_method, correction])Calculate the Wilcoxon signed-rank test.
    kruskal(*args, **kwargs)Compute the Kruskal-Wallis H-test for independent samples
    friedmanchisquare(*args)Computes the Friedman test for repeated measurements
    combine_pvalues(pvalues[, method, weights])Methods for combining the p-values of independent tests bearing upon the same hypothesis.
    ss(*args, **kwds)ss is deprecated!
    square_of_sums(*args, **kwds)square_of_sums is deprecated!
    jarque_bera(x)Perform the Jarque-Bera goodness of fit test on sample data.
    ansari(x, y)Perform the Ansari-Bradley test for equal scale parameters
    bartlett(*args)Perform Bartlett’s test for equal variances
    levene(*args, **kwds)Perform Levene test for equal variances.
    shapiro(x[, a, reta])Perform the Shapiro-Wilk test for normality.
    anderson(x[, dist])Anderson-Darling test for data coming from a particular distribution
    anderson_ksamp(samples[, midrank])The Anderson-Darling test for k-samples.
    binom_test(x[, n, p, alternative])Perform a test that the probability of success is p.
    fligner(*args, **kwds)Perform Fligner-Killeen test for equality of variance.
    median_test(*args, **kwds)Mood’s median test.
    mood(x, y[, axis])Perform Mood’s test for equal scale parameters.
    boxcox(x[, lmbda, alpha])Return a positive dataset transformed by a Box-Cox power transformation.
    boxcox_normmax(x[, brack, method])Compute optimal Box-Cox transform parameter for input data.
    boxcox_llf(lmb, data)The boxcox log-likelihood function.
    entropy(pk[, qk, base])Calculate the entropy of a distribution for given probability values.
    chisqprob(*args, **kwds)chisqprob is deprecated!
    betai(*args, **kwds)betai is deprecated!

    describe函数

    这个函数的输出太难看了!

    age = [23, 23, 27, 27, 39, 41, 47, 49, 50, 52, 54, 54, 56, 57, 58, 58, 60, 61]
    fat_percent = [9.5, 26.5, 7.8, 17.8, 31.4, 25.9, 27.4, 27.2, 31.2, 34.6, 42.5, 28.8, 33.4, 30.2, 34.1, 32.9, 41.2, 35.7]
    age = np.array(age)
    fat_percent = np.array(fat_percent)
    data = np.vstack([age, fat_percent]).reshape([-1, 2])
    
    
    print(stats.describe(data))
    
    DescribeResult(nobs=18, minmax=(array([  7.8,  17.8]), array([ 60.,  61.])), mean=array([ 37.36111111,  37.86666667]), variance=array([ 236.58604575,  188.78588235]), skewness=array([-0.30733374,  0.40999364]), kurtosis=array([-0.65245849, -1.26315357]))

    修改了一个输出结果形式

    for key, value in stats.describe(data)._asdict().items():
        print(key, ':', value)
    nobs : 18
    minmax : (array([  7.8,  17.8]), array([ 60.,  61.]))
    mean : [ 37.36111111  37.86666667]
    variance : [ 236.58604575  188.78588235]
    skewness : [-0.30733374  0.40999364]
    kurtosis : [-0.65245849 -1.26315357]

    也可以使用pandas中的函数进行替代,这样输出比较舒服[python数据处理库pandas]

    概率分布的熵和kl散度的计算 scipy.stats.entropy

     scipy.stats.entropy(pk, qk=None, base=None)[source]
        Calculate the entropy of a distribution for given probability values.
        If only probabilities pk are given, the entropy is calculated as S = -sum(pk * log(pk), axis=0).
        If qk is not None, then compute the Kullback-Leibler divergence S = sum(pk * log(pk / qk), axis=0).
        This routine will normalize pk and qk if they don’t sum to 1.

    香农熵的计算entropy

    shannon_entropy = stats.entropy(ij/sum(ij), base=None)
    print(shannon_entropy)
    

    entropy的python直接实现

    shannon_entropy_func = lambda pij: -sum(pij*np.log(pij))
    shannon_entropy = shannon_entropy_func(ij[np.nonzero(ij)])
    print(shannon_entropy)
    def entropy(counts):
        '''Compute entropy.'''
        ps = counts/float(sum(counts))  # coerce to float and normalize
        ps = ps[nonzero(ps)]            # toss out zeros
        H = -sum(ps * numpy.log2(ps))   # compute entropy

        return H

    两个分布的kl散度的计算

    kl = sp.stats.entropy(fs_rv_dist, nonfs_rv_dist)

    kl散度的其它实现[距离和相似度度量方法]

    [scipy.stats.entropy]

    假设检验相关的

    ttest_1samp(a, popmean[, axis]) Calculates the T-test for the mean of ONE group of scores.
    ttest_ind(a, b[, axis, equal_var]) Calculates the T-test for the means of TWO INDEPENDENT samples of scores.
    ttest_rel(a, b[, axis]) Calculates the T-test on TWO RELATED samples of scores, a and b.
    kstest(rvs, cdf[, args, N, alternative, mode]) Perform the Kolmogorov-Smirnov test for goodness of fit.
    chisquare(f_obs[, f_exp, ddof, axis]) Calculates a one-way chi square test.
    power_divergence(f_obs[, f_exp, ddof, axis, ...]) Cressie-Read power divergence statistic and goodness of fit test.
    ks_2samp(data1, data2) Computes the Kolmogorov-Smirnov statistic on 2 samples.
    mannwhitneyu(x, y[, use_continuity]) Computes the Mann-Whitney rank test on samples x and y.
    tiecorrect(rankvals) Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests.
    rankdata(a[, method]) Assign ranks to data, dealing with ties appropriately.
    ranksums(x, y) Compute the Wilcoxon rank-sum statistic for two samples.
    wilcoxon(x[, y, zero_method, correction]) Calculate the Wilcoxon signed-rank test.
    kruskal(*args) Compute the Kruskal-Wallis H-test for independent samples
    friedmanchisquare(*args) Computes the Friedman test for repeated measurements

    ttest_1samp实现了单样本t检验。因此,如果我们想检验数据Abra列的稻谷产量均值,通过零假设,这里我们假定总体稻谷产量均值为15000,我们有:

    from scipy import stats as ss
    # Perform one sample t-test using 1500 as the true mean
    print ss.ttest_1samp(a = df.ix[:, 'Abra'], popmean = 15000)

    # OUTPUT
    (-1.1281738488299586, 0.26270472069109496)

    返回下述值组成的元祖:

    • t : 浮点或数组类型
      t统计量
    • prob : 浮点或数组类型
      two-tailed p-value 双侧概率值

    通过上面的输出,看到p值是0.267远大于α等于0.05,因此没有充分的证据说平均稻谷产量不是150000。将这个检验应用到所有的变量,同样假设均值为15000,我们有:

    print ss.ttest_1samp(a = df, popmean = 15000)

    # OUTPUT
    (array([ -1.12817385,   1.07053437, -65.81425599,  -4.564575  ,   6.17156198]),
     array([  2.62704721e-01,   2.87680340e-01,   4.15643528e-70,
              1.83764399e-05,   2.82461897e-08]))

    第一个数组是t统计量,第二个数组则是相应的p值。

    皮皮blog



    列联表函数Contingency table functions

    chi2_contingency(observed[, correction, lambda_]) Chi-square test of independence of variables in a contingency table.
    contingency.expected_freq(observed) Compute the expected frequencies from a contingency table.
    contingency.margins(a) Return a list of the marginal sums of the array a.
    fisher_exact(table[, alternative]) Performs a Fisher exact test on a 2x2 contingency table.

    绘图测试Plot-tests

    ppcc_max(x[, brack, dist]) Returns the shape parameter that maximizes the probability plot correlation coefficient for ppcc_plot(x, a, b[, dist, plot, N]) Returns (shape, ppcc), and optionally plots shape vs.
    probplot(x[, sparams, dist, fit, plot]) Calculate quantiles for a probability plot, and optionally show the plot.
    boxcox_normplot(x, la, lb[, plot, N]) Compute parameters for a Box-Cox normality plot, optionally show it.

    Statistical functions for masked arrays (scipy.stats.mstats)

    蒙面统计函数Masked statistics functions

    argstoarray(*args) Constructs a 2D array from a group of sequences.
    betai(a, b, x) Returns the incomplete beta function.
    chisquare(f_obs[, f_exp, ddof, axis]) Calculates a one-way chi square test.
    count_tied_groups(x[, use_missing]) Counts the number of tied values.
    describe(a[, axis]) Computes several descriptive statistics of the passed array.
    f_oneway(*args) Performs a 1-way ANOVA, returning an F-value and probability given any f_value_wilks_lambda(ER, EF, dfnum, dfden, a, b) Calculation of Wilks lambda F-statistic for multivariate data, per Maxwell find_repeats(arr) Find repeats in arr and return a tuple (repeats, repeat_count).
    friedmanchisquare(*args) Friedman Chi-Square is a non-parametric, one-way within-subjects ANOVA.
    kendalltau(x, y[, use_ties, use_missing]) Computes Kendall’s rank correlation tau on two variables x and y.
    kendalltau_seasonal(x) Computes a multivariate Kendall’s rank correlation tau, for seasonal data.
    kruskalwallis(*args) Compute the Kruskal-Wallis H-test for independent samples
    kruskalwallis(*args) Compute the Kruskal-Wallis H-test for independent samples
    ks_twosamp(data1, data2[, alternative]) Computes the Kolmogorov-Smirnov test on two samples.
    ks_twosamp(data1, data2[, alternative]) Computes the Kolmogorov-Smirnov test on two samples.
    kurtosis(a[, axis, fisher, bias]) Computes the kurtosis (Fisher or Pearson) of a dataset.
    kurtosistest(a[, axis]) Tests whether a dataset has normal kurtosis
    linregress(*args) Calculate a regression line
    mannwhitneyu(x, y[, use_continuity]) Computes the Mann-Whitney statistic
    plotting_positions(data[, alpha, beta]) Returns plotting positions (or empirical percentile points) for the data.
    mode(a[, axis]) Returns an array of the modal (most common) value in the passed array.
    moment(a[, moment, axis]) Calculates the nth moment about the mean for a sample.
    mquantiles(a[, prob, alphap, betap, axis, limit]) Computes empirical quantiles for a data array.

    msign(x) Returns the sign of x, or 0 if x is masked.
    normaltest(a[, axis]) Tests whether a sample differs from a normal distribution.
    obrientransform(*args) Computes a transform on input data (any number of columns).
    pearsonr(x, y) Calculates a Pearson correlation coefficient and the p-value for testing non-plotting_positions(data[, alpha, beta]) Returns plotting positions (or empirical percentile points) for the data.
    pointbiserialr(x, y) Calculates a point biserial correlation coefficient and the associated p-value.
    rankdata(data[, axis, use_missing]) Returns the rank (also known as order statistics) of each data point along scoreatpercentile(data, per[, limit, ...]) Calculate the score at the given ‘per’ percentile of the sequence a.
    sem(a[, axis, ddof]) Calculates the standard error of the mean (or standard error of measurement) signaltonoise(data[, axis]) Calculates the signal-to-noise ratio, as the ratio of the mean over standard skew(a[, axis, bias]) Computes the skewness of a data set.
    skewtest(a[, axis]) Tests whether the skew is different from the normal distribution.
    spearmanr(x, y[, use_ties]) Calculates a Spearman rank-order correlation coefficient and the p-value theilslopes(y[, x, alpha]) Computes the Theil slope as the median of all slopes between paired values.
    threshold(a[, threshmin, threshmax, newval]) Clip array to a given value.
    tmax(a, upperlimit[, axis, inclusive]) Compute the trimmed maximum
    tmean(a[, limits, inclusive]) Compute the trimmed mean.
    tmin(a[, lowerlimit, axis, inclusive]) Compute the trimmed minimum
    trim(a[, limits, inclusive, relative, axis]) Trims an array by masking the data outside some given limits.
    trima(a[, limits, inclusive]) Trims an array by masking the data outside some given limits.
    trimboth(data[, proportiontocut, inclusive, ...]) Trims the smallest and largest data values.
    trimmed_stde(a[, limits, inclusive, axis]) Returns the standard error of the trimmed mean along the given axis.
    trimr(a[, limits, inclusive, axis]) Trims an array by masking some proportion of the data on each end.
    trimtail(data[, proportiontocut, tail, ...]) Trims the data by masking values from one tail.
    tsem(a[, limits, inclusive]) Compute the trimmed standard error of the mean.
    ttest_onesamp(a, popmean[, axis]) Calculates the T-test for the mean of ONE group of scores.
    ttest_ind(a, b[, axis]) Calculates the T-test for the means of TWO INDEPENDENT samples of ttest_onesamp(a, popmean[, axis]) Calculates the T-test for the mean of ONE group of scores.
    ttest_rel(a, b[, axis]) Calculates the T-test on TWO RELATED samples of scores, a and b.
    tvar(a[, limits, inclusive]) Compute the trimmed variance
    variation(a[, axis]) Computes the coefficient of variation, the ratio of the biased standard deviation winsorize(a[, limits, inclusive, inplace, axis]) Returns a Winsorized version of the input array.
    zmap(scores, compare[, axis, ddof]) Calculates the relative z-scores.
    zscore(a[, axis, ddof]) Calculates the z score of each value in the sample, relative to the sample

    单变量和多变量核密度估计Univariate and multivariate kernel density estimation (scipy.stats.kde)

    gaussian_kde(dataset[, bw_method]) Representation of a kernel-density estimate using Gaussian kernels.

    皮皮blog



    统计函数使用举例

    连续分布-Norm高斯分布

    {高斯[正态]分布随机变量,A normal continuous random variable.}

    生成服从高斯分布的随机向量(从正态分布中采样)stats.norm.rvs(loc, scale, size)

    参数:

    The location (loc) keyword specifies the mean.

    The scale (scale) keyword specifies the standard deviation.

    norm通过loc和scale参数可以指定随机变量的偏移和缩放参数。 对于正态分布的随机变量来说,这两个参数相当于指定其期望值和标准差。

    高斯分布N(0,0.01)随机偏差
    y = stats.norm.rvs(loc=0, scale=0.1, size=10)
    输出:array([ 0.05419826,  0.04151471, -0.10784729,  0.18283546,  0.02348312, -0.04611974,  0.0069336 ,  0.03840133, -0.05015316,  0.23315205])
    

    y.stats()
    (array(0.0), array(0.1)

    Note: 也可以使用numpy.random.norm函数生成高斯分布随机数[numpy库 - 随机数模块numpy.random]。

    求正态分布最佳拟合参数stats.norm.fit(x)

    >>> X =stats.norm(loc=1.0,scale=2.0,size = 100)
    可以使用fit()方法对随机取样序列x进行拟合,返回的是与随机取样值最吻合的随机变量的参数
    >>> stats.norm.fit(x) #得到随机序列的期望值和标准差
    array([ 1.01810091, 2.00046946])


    求正态分布N(1,1)概率密度函数某个x对应的值

    lambda x: norm.pdf(x, 1, 1)
    Note: 从正态分布概率密度中看出,这个和norm.pdf(x - 1)是不一样的,只有标准差为1时才相等。

    求正态分布N(1,1)累积分布函数某个x对应的值

    lambda x: norm.cdf(x, 1, 1)

    绘制一维和二维正态分布概率密度图

    [ 概率论:高斯分布]

    [scipy.stats.norm]

    均匀分布

    mu = uniform.rvs(size=N)  # 从均匀分布采样

    伽玛分布

    伽玛分布需要额外的形状参数。伽玛分布可用于描述等待k个独立的随机事件发生所需的时间,k就是伽玛分布的形状参数。
    伽玛分布的尺度参数theta和随机事件发生的频率相关,由scale参数指定。
    >>> stats.gamma.stats(2.0,scale=2) 
    (array(4.0), array(8.0))
    根据伽玛分布的数学定义可知其期望值为k*theta,方差为k*theta^2 。上面的程序验证了这两个公式。 当随机分布有额外的形状参数时,它所对应的rvs()、pdf()等方法都会增加额外的参数以接收形状参数。

    离散分布-二项分布

    假设有一种只有两个结果的试验,其成功概率为 P,那么二项分布描述了进行n次这样的独立试验而成功k次的概率。
    二项分布的概率质量函数公式如下: 


    使用二项分布的概率质量函数pmf()可以很容易计算出现k次6点的概率。

    pmf()

    pmf()的第一个参数为随机变量的取值,后面的参数为描述随机分布所需的参数。对于二项分布来说,参数分别为n和P,而取值范围则为0到n之间的整数。

    程序通过二项分布的概率质量公式计算投掷5次骰子出现0到6所对应的概率:

    >>> stats.binom.pmf(range(6), 5, 1/6.0)
    array([0.401878, 0.401878, 0.166751, 0.032150, 0.003215, 0.000129])

    由结果可知:出现0或1次6点的概率为40.2%,而出现3次6点的概率为3.215%

    泊松分布

    在二项分布中,如果试验次数n很大,而每次试验成功的概率p很小,其乘积np比较适中,那么试验成功次数的概率可以用泊松分布近似描述。
    在泊松分布中,使用lambda描述单位时间(或单位面积)内随机事件的平均发生率。如果将二项分布中的试验次数n看作单位时间内所做的试验次数,那么它和事件出现概率P的乘积就是事件的平均发生率,即lambda = np。
    泊松分布的概率质量函数公式如下:

    二项分布的近似分布

    程序分别计算二项分布和泊松分布的概率质量函数,当n足够大时,二者是十分接近的。
    程序中事件平均发生率lambda恒等于10。根据二项分布的试验次数计算每次事件出现的概率p=lambda/n。
    >>> _lambda = 10.0 
    >>> k = np.arange(20)
    >>> possion = stats .poisson .pmf(k, _lambda) # 泊松分布 
    >>> binom100 = stats.binom.pmf(k, 100, _lambda/100) #二项式分布 100
    >>> binom1000=stats.binom.pmf(k, 1000 , _lambda/1000) #二项式分布 1000
    >>> np.max(np.abs(binom100-possion)) # 计算最大误差
     0.006755311103353312
    >>> np.max(np.abs(binom1000-possion))# n为 1000时,误差较小
    0.00063017540509099912

    泊松分布的模拟过程

    泊松分布适合描述单位时间内随机事件发生次数的分布情况。例如某设施在一定时间内的 使用次数。机器出现故障的次数。自然灾害发生的次数等等。

    下面使用随机数模拟泊松分布,并与其概率质量函数进行比较,事件每秒的平均发生次数为lambda=10。其中观察时间分别为1000秒,50000秒。可以看出:观察时间越长,事件每秒发生的次数就越符合泊松分布。

    >>> _lambda = 10
    >>> time = 10000
    >>> t = np.random.rand(_lambda*time )*time
    >>> count, time_edges = np.histogram(t, bins=time, range=(0,time))
    >>> count
    array([10, 9, 8, …, 11, 10, 18])
    >>>x = count_edges[:-1] 
    >>> dist, count_edges = np. histogram (count, bins=20, range= (0,20), normed=True)
    >>> poisson = stats .poisson.pmf(x, _lambda)
    >>> np.max(np.abs(dist-poisson)) #最大误差很小,符合泊松分布
     0.0088356241037075706


    Note: 用rand()产生平均分布于0到time之间的_lambda*time 个事件所发生的时刻。
    用histogram()可以统计数组t中每秒之内事件发生的次数count。
    根据泊松分布的定义,count数组中数值的分布情况应该符合泊松分布。统计事件次数在0到20区间内的概率分布。当histogram()的normed参数为True并且每个统计区间的长度为1时,其结果和概率质量函数相等。

    泊松分布的时间间隔:伽玛分布

    还可以换一个角度看随机事件的分布问题。可以观察相邻两个事件之间时间间隔的分布情况,或者隔k个事件的时间间隔的分布情况。根据概率论,事件之间的时间间隔应符合伽玛分布,由于时间间隔可以是任意数值,因此伽玛分布是一种连续概率分布。伽玛分布的概率密度函数公式如下,它描述第k个亊件发生所需的等待时间的概率分布。伽玛函数,当 k为整数时,它的值和k的阶乘k!相等。


    程序模拟事件的时间间隔的伽玛分布,观察时间为1 000秒,平均每秒产生10个事件。
    图中“k=1”,它表示相邻两个事件之间的时间间 隔的分布,而“k=2”则表示相隔一个事件的两个事件之间的时间间隔的分布,可以看出它们都符合伽玛分布.


    >>> _lambda = 10
    >>> time = 10000
    >>> t = np.random.rand(_lambda*time)*time
    >>> t.sort()#计算事性前后的时间间隔,需要先对随机时刻进行排序
    >>> s1 = t[1:] - t[:-1] #相邻两个事件之间的时间间隔 
    >>> s2 = t[2:] - t[:-2] #相隔一个事件的两个亊件之间的时间间隔
    >>> dist1, x1= np.histogram(s1, bins=100, normed=True)
    >>> dist2, x2 = np.histogram(s2 , bins=100, normed=True)
    >>> gamma1 = stats.gamma.pdf((x1[:-1]+x1[1:])/2, 1, scale=1.0/_lambda)
    >>> gamma2 = stats.gamma.pdf((x2[:-1]+x2[1:])/2, 2, scale=1.0/_lambda)
    >>> np.max(np.abs(gamma1 - dist1))
    0.13557317865888141
    >>> np.max(np.abs(gamma2 - dist2))
    0.087375030861794656
    >>> np.max(gamma1), np.max(gamma2)
    (9.3483221580498537, 3.6767953241013656) #由于概率密度函数的值本身比较大,因此上面的误差已经很小了:
    Note:模拟伽玛分布:
    首先在10000秒之内产生100000个随机事件发生的时刻.因此事件的平均发生次数为每秒10次;
    为了计算事性前后的时间间隔,需要先对随机时刻进行排序;
    histogram()返回的第二个值为统计区间的边界,采用gamma.pdf()计算伽玛分布的概率密度时,使用各个区间的中值进行计算。Pdf()的第二个参数为k值,scale参数为1/λ;

    from:http://blog.csdn.net/pipisorry/article/details/49515215

    ref:Statistical functions (scipy.stats)

    python标准库中的随机分布函数


    展开全文
  • scipy.stats简单索引

    2021-08-12 15:53:54
    Q-Q图 scipy.stats.probplot
    展开全文
  • scipy.stats.norm函数

    万次阅读 2019-05-03 23:14:29
    scipy.stats.norm函数 可以实现正态分布(也就是高斯分布) pdf : 概率密度函数 标准形式是: norm.pdf(x, loc, scale) 等同于 norm.pdf(y) / scale ,其中 y = (x - loc) / scale 调用方式用两种,见代码 import ...
  • scipy.stats.exponweib:scipy包中计算weibull分布的函数。 from scipy.stats import exponweib 密度函数的格式:exponweib.pdf(x, a, c) = a * c * (1-exp(-xc))(a-1) * exp(-x*c)x(c-1),这个形式很奇怪 在官方...
  • scipy.stats.chi2 scipy.stats.chi2(* args,** kwds )= <scipy.stats._continuous_distns.chi2_gen object>源码 卡方分布连续随机变量。 作为rv_continuous类的实例,chi2继承了这个类的一切通用方法(请...
  • scipy.stats统计库函数

    2021-04-13 19:46:44
    常见通用函数 pdf:概率密度函数 cdf:累积分布函数,已知x位置求累计概率 ppf:分位点函数,已知累计概率求x位置(cdf的逆函数) ...sem=scipy.stats.sem(arr) SEM详解见:SEM 实例:求置信区间临界点 先验知
  • m, b, r_value, p_value, std_err = scipy.stats.linregress(x0, y0) 我用这个函数,返回值只返回斜率的标准差,现在我想要得到斜率和截距的标准差,有方法实现吗?
  • 文章目录统计(scipy.stats)简介随机变量获取帮助公共的方法移动和缩放形状参数广播离散分布的特定点<以下未翻译>拟合分布性能问题和注意事项剩下的问题建立具体分布进行连续分布,即子类化rv_continuous子类...
  • scipy.stats.multivariate_normal(mean=None, cov=1, allow_singular=False, seed=None) = <scipy.stats._multivariate.multivariate_normal_gen object> 描述 一个多元正态随机变量。 mean关键字指定了...
  • SciPy的统计模块是scipy.stats,其中有一个类是连续分布的实现,一个类是离散分布的实现。...generated = stats.norm.rvs(size=900) # 2.用正态分布去拟合生成的数据,得到其均值和标准差 print("Mean", "Std", sta
  • Edit: Basically solved I think.I am using spearmanr from scipy.stats to find the correlations between variables across a number of different samples. I have around 2500 variables and 36 samples (or 'o...
  • 【Python笔记】scipy.stats.norm函数解析

    千次阅读 2021-01-02 20:41:08
    scipy.stats.norm函数 可以实现正态分布(也就是高斯分布) pdf ——概率密度函数标准形式是: norm.pdf(x, loc, scale)等同于norm.pdf(y) / scale ,其中 y = (x - loc) / scale 调用方式用两种,见代码: from ...
  • scipy.stats.boxcox

    千次阅读 2019-03-07 15:17:33
    scipy.stats.boxcox ValueError: Data must be positive. 在使用stats.boxcox的时候,碰到下面的情况 stats.boxcox(data[col].dropna()+1) 其中:data为DataFrame类型,col为列名, +1就比较蒙了。 官方文档...
  • scipy.stats.multivariate_normal高斯分布

    千次阅读 2018-06-19 10:21:48
    参考地址:...scipy.stats.multivariate_normal Parameters: x: array_like Quantiles, with the...
  • 在python中,如何重现scipy.stats.truncnorm.rvs随机出的截断正态分布数据,也就是说如何获取该随机数据所依据的种子seed,在哪里设置seed?</p>
  • Python Scipy.stats 用法 | rvs pdf pmf用法

    万次阅读 多人点赞 2019-06-28 16:09:36
    scipy.stats.poisson.rvs(loc=期望, scale=标准差, size=生成随机数的个数) #从泊松分布中生成指定个数的随机数 stats连续型随机变量的公共方法 名称:备注 rvs:产生服从指定分布的随机数 pdf:概率密度函数 ...
  • 参考:https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.multivariate_normal.html 一个多元正态随机变量。mean关键字指定平均值,cov关键字指定协方差矩阵。新版本0.14.0。 补充:...
  • python 来做统计分析时一般使用 scipy 中的 stats。 numpy 也能生成一定概率分布的随机数,但如果需要更具体的概率密度,累积概率等,就用用到 scipy.stats 了,感觉它类似 java 中的 ssj 包。下面简单总结一些它的...
  • python3数据分析与挖掘建模实战学习目录 代码实例下载 Scipy教程 - 统计函数库scipy.stats scipy.stats 举例 目录
  • 原博客的链接在这里,我只是对原博客做了一个整理。 python统计函数库scipy.stats的用法1/3 如若侵权,请联系删除。
  • scipy.statsscipy 专门用于统计的函数库,所有的统计函数都位于子包 scipy.stats 中,可以使用 scipy.info(scipy.stats) 函数获得这些函数的完整列表。该模块包含大量的概率分以及不断增长的统计函数库。每一个...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 13,382
精华内容 5,352
关键字:

scipy.stats