• 网上大部分后面没有答案，这个是很全的 (Solution Manual)Probability and Statistics,4th Edition by Morris H. Degroot
• Statistics

2019-06-10 11:14:35
When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. Hypothesis tests are used to test the validity of a claim that is made about a...
P-Value
When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that’s on trial, in essence, is called the null hypothesis. The alternative hypothesis is the one you would believe if the null hypothesis is concluded to be untrue. The evidence in the trial is your data and the statistics that go along with it. All hypothesis tests ultimately use a p-value to weigh the strength of the evidence. The p-value is a number between 0 and 1 and interpreted in the following way:

A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
p-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the p-value so your readers can draw their own conclusions.
The p-value can be calculated using scipy.stats.ttest_1sam or scipy.stats.ttest_ind function under the Python engine.


展开全文
•  2.AUTO_CREATE_STATISTICS=ON, 查询时自动在查询条件列上创建统计.  3.手动建立:   create statistics [统计名] on [表]([字段]) where [条件]  create statistics [统计名] on [表]([字段]) wher
1.统计信息建立的时机,
1.建立索引时, 自动为索引定义中的第一列创建统计.
2.AUTO_CREATE_STATISTICS=ON, 查询时自动在查询条件列上创建统计.
3.手动建立:    create statistics [统计名] on [表]([字段]) where [条件]
create statistics [统计名] on [表]([字段]) where [条件] with SAMPLE [百分比] PERCENT --按百分比抽样建立统计.   create statistics [统计名] on [表]([字段]) where [条件] with SAMPLE [行数] ROWS --按指定行数抽样建立统计.
参考 http://technet.microsoft.com/zh-cn/library/ms188038.aspx

2.统计更新的时机,
1.AUTO_CREATE_STATISTICS=ON,自动更新.
2.手动更新1:      update statistics [表名] with FULLSCAN --全表扫描.
update statistics [表名] with SAMPLE [百分比] PERCENT --按百分比抽样更新统计.
update statistics [表名] with SAMPLE [行数] ROWS --按指定行数抽样更新统计.
参考 http://technet.microsoft.com/zh-cn/library/ms187348.aspx
dbcc show_statistics ('[表名]','[统计名]')
4.判断是否需手工更新统计的时机
1.数据仓库中,大量导入数据后,平时无DML变更,需更新.
2.执行计划中,Rows与EstimateRows差距较大的情况.
3.以下SQL语法中,rowmodctr(未更新统计的记录数)较大的情况.
select object_name(a.object_id) 'table_name',a.name 'stats_name', c.name 'col_name',e.rowcnt,e.rowmodctr,
case when auto_created=1 then 'auto_created' when user_created=1 then 'user_created' when e.rowcnt>0 then 'index_col' end stype
from sys.stats a
inner join sys.stats_columns b on a.object_id=b.object_id and a.stats_id=b.stats_id and b.stats_column_id=1
inner join sys.columns c on b.object_id=c.object_id and b.column_id=c.column_id
inner join sysobjects d on a.object_id=d.id and d.xtype='U'
inner join sysindexes e on a.object_id=e.id and a.name=e.name and e.name is not null
order by a.object_id

展开全文
• numpy教程：统计函数Statistics

万次阅读 2015-09-27 23:23:27
lz总结的一般统计函数 np.unique() 返冋其参数数组中所有不同的值，并且按照从小到大的顺序排列。它有两个可选参数： return_index : Ture表示同时返回原始数组中的下标。 Return_inverse: True表示返冋重建原始数...
http://blog.csdn.net/pipisorry/article/details/48770785lz总结的一般统计函数np.unique()返冋其参数数组中所有不同的值，并且按照从小到大的顺序排列。它有两个可选参数：return_index : Ture表示同时返回原始数组中的下标。Return_inverse: True表示返冋重建原始数组用的下标数组。a = np.array([1, 1, 9, 5, 2, 6, 7, 6, 2, 9])>>> np.unique(a)array([l, 2, 5, 6, 7, 9]) >>> x, idx = np.unique(a, return_index=True)>>> idxarray([0, 4, 3, 5, 6, 2]) >>> a[idx]array([1, 2, 5, 6, 7, 9])>>> x, ridx = np.unique(a, return_inverse=True)>>> ridxarray([0, 0, 5, 2, 1, 3, 4, 3, 1, 5])>>>all(x[ridx]==a) #原始数组a和x[ridx]完全相同True皮皮BlogStatisticsOrder statistics顺序统计量Averages and variances均值和方差median(a[, axis, out, overwrite_input, keepdims])Compute the median along the specified axis.average(a[, axis, weights, returned])Compute the weighted average along the specified axis.mean(a[, axis, dtype, out, keepdims])Compute the arithmetic mean along the specified axis.std(a[, axis, dtype, out, ddof, keepdims])Compute the standard deviation along the specified axis.var(a[, axis, dtype, out, ddof, keepdims])Compute the variance along the specified axis.nanmedian(a[, axis, out, overwrite_input, ...])Compute the median along the specified axis, while ignoring NaNs.nanmean(a[, axis, dtype, out, keepdims])Compute the arithmetic mean along the specified axis, ignoring NaNs.nanstd(a[, axis, dtype, out, ddof, keepdims])Compute the standard deviation along the specified axis, while ignoring NaNs.nanvar(a[, axis, dtype, out, ddof, keepdims])Compute the variance along the specified axis, while ignoring NaNs.[numpy教程- 函数库和ufunc函数]Correlating相关Note: 要实现估计量的无偏性，numpy中的方差计算是除以N，而协方差计算是除以N-1，所以会发现单独计算向量的方差并不会与计算两个向量的协方差矩阵对角线上的元素相等！a= [0, 1, 2]
print(np.var(a) * 3 == np.cov(a) * 2, '\n')

输出：
True 当然可以设置度参数bias : int, optional来改变这种计算模式Default normalization is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is 1, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.Histogramsnp.histogram()对一维数组进行直方图统计，其参数列表如下：Histogram(a,bins=10,range=None,normed=False,weights=None)其中，a是保存待统计数据的数组，bins指定统计的区间个数，即对统计范围的等分数。 range是一个长度为2的元组，表示统计范围的最小值和最大值，默认值为None,表示范围由 数据的范围决定，即(a.min(), a.max()).当normed参数为False时，函数返回数组a中的数据在每个区间的个数，否则对个数进行正规化处理，使它等于每个区间的概宇密度。weights参数和 bincount()的类似。NumPy中histogram函数应用到一个数组返回一对变量：直方图数组和箱式向量，即两个一维数组--hist和bin_edges,第一个数组是每个区间的统计结果， 第二个数组长度为len(hist)+1,每两个相邻的数值构成一个统计区间。Note： matplotlib也有一个用来建立直方图的函数(叫作hist,正如matlab中一样)，与NumPy主要的差别是pylab.hist自动绘制直方图，而numpy.histogram仅仅产生数据。>>> a = np. random.rand (100)>>> np.histogram(a,bins=5,range=(0,1))(array([20,26,20,16,18]), array([ 0. , 0.2, 0.4, 0.6, 0.8, 1.])如果需要统计的区间长度不等，可以将表示区间分隔位置的数组传递给bins参数:>>> np.histogram(a,bins=[0, 0.4, 0.8, 1.0], range=(0,1))(array([46, 36, 18]), array([ 0. , 0.4, 0.8, 1.]))用weights参数指定了数组a中每个元素对应的权值，那么histogram()将对区间中数 值对应的权值进行求和。统计男青少年年龄和身高的例子:sums是每个年龄段的身高总和，cnts是每个年龄段的数据个数，因此很容易计算出每个年龄段的平均身高>>> sums = np.histogram(d[:,0]，bins=range(7,21)，range=(7,20)，weights=d[:,1])[0]>>> cnts =np.histogram (d[:,0], bins=range(7,21), range=(7,20))[0]>>>sums/cntsarray([ 125.96,  132.06666667, 137.82857143, 143.8 ,148. 14 ,153.44, 162.15555556, 166.86666667, 172.83636364, 173.3,175.275, 174.19166667,175.075])hist转换成plot折线图plt.hist直接绘制数据是hist图plt.hist(z, bins=500, normed=True)hist图转换成折线图cnts, bins = np.histogram(z, bins=500, normed=True)
bins = (bins[:-1] + bins[1:]) / 2
plt.plot(bins, cnts)[matplotlib绘图实例 pyplot、pylab模块及作图参数:hist]Note: lz建议使用seaborn.distplot()。np.bincount()对整数数组中各个元素出现的次数进行统计，它要求数组中所有元素都是非负的。其返回数组中第i个元素的值表示整数i在参数数组中出现的次数。>>> np.bincount(a)array([0, 2, 2, 0, 0, 1, 2, 1, 0, 2])由上面的结果可知,在数组a中有两个1、两个2、一个5、两个6、一个7和两个9,而 0、3、4、8等数没有在数组a中出现。当指定weights参数时，bincount(x, weights=w)返冋数组x中每个整数所对应的w中的权值之和。>>> x =np.array([0 , 1, 2, 2, 1, 1, 0])>>> w = np.array([0.1, 0.3, 0.2,0.4,0.5,0.8,1.2])>>> np.bincount(x, w)array ([ 1.3,1.6,0.6])要求平均值:>>> np.bincount(x,w)/np.bincount(x)array([ 0.65 , 0.53333333, 0.3])但是np.ndarray怎么统计数组每个元素出现的个数呢？list.count(element)只能先将np.array.tolist()转换成python list，再使用list的count方法计数某个元素出现次数。>>> a = ['a', 'b', 'c', 3, '4', '2', '2', 2, 2]>>> a.count(2)2[python入门教程、基础知识、基本类型的操作及转换                    ][numpy-ref-1.8.1 : 3.30 Statistics p1256][numpy/reference/routines.statistics]皮皮Blog统计函数cov协方差矩阵计算示例空间中有三个点，值得注意的是，这三个点是随机变量的观测值，而坐标系x,y（维度）是随机变量！也就是有N个点，这N个点就是观测值，而每个点有K维，K就是随机变量个数！！！Consider two variables, x0 and x1, which correlate perfectly, but in opposite directions:>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T>>> xarray([[0, 1, 2],[2, 1, 0]])Note how x0 increases while x1 decreases. The covariance matrix shows this clearly:>>> np.cov(x)array([[ 1., -1.],[-1., 1.]])Note that element C0;1, which shows the correlation between x0 and x1, is negative.Further, note how x and y are combined:>>> x = [-2.1, -1, 4.3]>>> y = [3, 1.1, 0.12]>>> X = np.vstack((x,y))>>> print np.cov(X)[[ 11.71 -4.286 ][ -4.286 2.14413333]]>>> print np.cov(x, y)[[ 11.71 -4.286 ][ -4.286 2.14413333]]>>> print np.cov(x)11.713.30.Note: 1. 上面的X等价于np.array([[-2.1, -1, 4.3], [3, 1.1, 0.12]])2.  从这里可以看出，cov函数的输入可以是矩阵（二维向量），计算的是矩阵中行向量（R.V.）间的协方差矩阵，其对角线上的元素分别是单个行向量（R.V.）的方差。所以如果初始数据[[0, 2], [1, 1], [2, 0]]是观测值要先转置再求协方差！3. 矩阵的协方差矩阵的计算等价于单独将不同R.V.分量拿出来作为多个参数输入到cov函数中的协方差。from: http://blog.csdn.net/pipisorry/article/details/48770785ref:
展开全文
• 统计模块statistics python3.7 https://docs.python.org/3/library/statistics.html 统计模块statistics 方法介绍 statistics.mean(data) statistics.harmonic_mean(data) statistics.median(data) ...
统计模块statistics

作者:Shawn
python3.7
https://docs.python.org/3/library/statistics.html

统计模块statistics
方法介绍

statistics.mean(data)
statistics.harmonic_mean(data)
statistics.median(data)
statistics.median_low(data)
statistics.median_high(data)
statistics.median_grouped(data, interval=1)
statistics.mode(data)
statistics.pstdev(data, mu=None)
statistics.pvariance(data, mu=None)
statistics.stdev(data, xbar=None)
statistics.variance(data, xbar=None)

方法介绍

statistics.mean(data)

就是平均值
支持的输入非常多，包括fractions模块和decimal模块

>>> mean([1, 2, 3, 4, 4])
2.8
>>> mean([-1.0, 2.5, 3.25, 5.75])
2.625

>>> from fractions import Fraction as F
>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
Fraction(13, 21)

>>> from decimal import Decimal as D
>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
Decimal('0.5625')

statistics.harmonic_mean(data)

调和平均数

>>> from statistics import *
>>> harmonic_mean([1,2,3])
1.6363636363636365
>>> 1/sum([1./1,1./2,1./3])*3
1.6363636363636367

statistics.median(data)

中值

>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

statistics.median_low(data)

小中值
若中值有两个，则选择较小的那个

>>> median_low([1, 3, 5])
3
>>> median_low([1, 3, 5, 7])
3

statistics.median_high(data)

大中值
若中值有两个，则选择较大的那个

>>> median_high([1, 3, 5])
3
>>> median_high([1, 3, 5, 7])
5

statistics.median_grouped(data, interval=1)

用组距式来求中位数
公式：中位数=中位数所在组下限+{[(样本总数/2-到中位数所在组下限的累加次数)/中位数所在组的次数]*中位数的组距}
参数说明：
interval：组距，例子：
如果组距为1，则1在组0.5~1.5；
如果组距为2，则3在组2~4

>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
3.7
>>> median_grouped([1, 3, 3, 5, 7], interval=1)
3.25
>>> median_grouped([1, 3, 3, 5, 7], interval=2)
3.5

示例说明：
[1, 2, 2, 3, 4, 4, 4, 4, 4, 5]中位数在4这个分组里面
默认组距为1
所在分组的下限为3.5
样本总数为10
4分组里有5个数
小于3.5的有4个数
所以中位数为：3.5+(10/2-4)/5*1=3.5+1/5=3.7
[1, 3, 3, 5, 7], interval=2，中位数在3分组里
组距为2
所在分组下限为2
总数为5
3分组里有2个数
小于2的有1个数
中位数：2+(5/2-1)/2*2=2+1.5=3.5

statistics.mode(data)

众数

>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'

statistics.pstdev(data, mu=None)

总体标准差
设定已知均值mu可以减少计算量

>>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
0.986893273527251

statistics.pvariance(data, mu=None)

总体方差
设定已知均值mu可以减少计算量

>>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
>>> pvariance(data)
1.25

statistics.stdev(data, xbar=None)

样本标准差

>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
1.0810874155219827

statistics.variance(data, xbar=None)

样本方差

>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
>>> variance(data)
1.3720238095238095
展开全文
• 有些统计学术语把握不是特别准确，担心有翻译错的，所以在不确定的地方保留了英文原文，如果有翻译错的也请路过的同行指出，多谢！ 1、mean() 计算平均值 ...>>> statistics.mean(range(1,10))
• 本节介绍 Python 中的另一个常用模块 —— statistics模块，该模块提供了用于计算数字数据的数理统计量的函数。它包含了很多函数，具体如下表： mean(data)函数 mean(data) 函数用于计算一组数字的平均值，参数 ...
• A proven and accurate book, PROBABILITY AND STATISTICS FOR ENGINEERING AND THE SCIENCES, 9th also includes graphics and screen shots from SAS, MINITAB, and Java™ Applets to give you a solid ...
• Probability and Statistics Fourth Edition(Solution) 概率与统计第四版答案
• Probability and Statistics (4th Edition) by Morris H. DeGroot答案，英文文字版
• 这本是probability and statistics for engineers and scientists第九版的答案
• Ross This updated text provides a superior introduction to applied probability and statistics for engineering or science majors. Ross emphasizes the manner in which probability yields insight into ...
• 概率论与数理统计经典教材Probability And Statistics For Engineering And The Sciences (Jay L. Devore) 5th Ed. - Solution Manual 答案
• FOURTH EDITION Anthony Hayter University of Denver
• Probability and Statistics for Engineering and the Sciences by Jay L. Devore
• Probability and Statistics by DeGroot & Schervish
• Probability and statistics are fascinating subjects on the interface between mathematics and applied sciences that help us understand and solve practical problems. We believe that you, by learning how...
• 概率论 UCLA研究生专用教材讲义 非常实用 如果有学有余力的本科生对于这个感兴趣也可以预习
• probability and statistics for engineering and the sciences (eighth edition) JAY L. DEVORE
• probability and statistics DEGROOT(第三版)完整英文版答案
• 该教科书针对大二或初三的计算机科学专业的本科生，提供了定性和定量数据分析，概率，随机变量和统计方法（包括机器学习）的综合背景。 通过认真地处理满足该课程的课程需求的主题，计算机科学的概率与统计具有以下...
• Probability and Statistics for Engineering and the Sciences 9th(1).pdf
• Probability_and_Statistical_Inference-2
• 概率与统计（理工类）Probability and Statistics for Engineering and the Sciences, Jay L. Devore 8th ed。国外高校优秀教材精选，原版pdf教材
• Probability and Statistics.pdf
• 统计分为两类： descriptive statistics 和 inferential statistics , 中文大概可以翻译为： 描述统计和推断统计。
• 1.3 Experiments and EventsProbability will be the way that we quantify how likely something is to occur概率可以量化一些可能发生的事情。An experiment is any process, real or hypothetical, in which the ...
• Schaum's Outline of Probability and Statistics, Third Edition 2009.pdf
• Jay Devore原版英文教材《Probability and Statistics》（概率论与数理统计），希望对各位有所帮助。
• 【2018新书】写给计算机科学的概率统计（Probability and Statistics for Computer Science）

...