• 用python进行配对样本差异分析
2021-02-11 05:30:57

应用场景非常简单，成对的数据需要检验组间是否存在差异

分成两步：

1、检验正态性

from scipy import stats

##检验是否正态

def norm_test(data):

t,p = stats.shapiro(data)

#print(t,p)

if p>=0.05:

return True

else:

return False

2、根据正态性的检验结果，分别选择配对样本t检验和wilcoxon检验。目标是获取统计量和P值。方法的选择可以参考https://segmentfault.com/a/1190000007626742

if norm_test(data_b) and norm_test(data_p):

print('yes')

t,p=ttest_rel(list(data_b),list(data_p))

else:

print('no')

t,p=wilcoxon(list(data_b),list(data_p),zero_method='wilcox', correction=False)#

这里有一个需要注意的坑点

scipy包里带的wilcoxon函数返回的不是统计量z和P值，返回的是负秩和和P值，因此这里需要找到wilcoxon的源码，路径为：Lib\site-packages\scipy\stats\morestats.py

点进morestats文件，将函数返回的数据改成z和p值，如下：

def wilcoxon(x, y=None, zero_method="wilcox", correction=False):

"""

Calculate the Wilcoxon signed-rank test.

The Wilcoxon signed-rank test tests the null hypothesis that two

related paired samples come from the same distribution. In particular,

it tests whether the distribution of the differences x - y is symmetric

about zero. It is a non-parametric version of the paired T-test.

Parameters

----------

x : array_like

The first set of measurements.

y : array_like, optional

The second set of measurements. If y is not given, then the x

array is considered to be the differences between the two sets of

measurements.

zero_method : string, {"pratt", "wilcox", "zsplit"}, optional

"pratt":

Pratt treatment: includes zero-differences in the ranking process

(more conservative)

"wilcox":

"zsplit":

Zero rank split: just like Pratt, but spliting the zero rank

between positive and negative ones

correction : bool, optional

If True, apply continuity correction by adjusting the Wilcoxon rank

statistic by 0.5 towards the mean value when computing the

z-statistic. Default is False.

Returns

-------

statistic : float

The sum of the ranks of the differences above or below zero, whichever

is smaller.

pvalue : float

The two-sided p-value for the test.

Notes

-----

Because the normal approximation is used for the calculations, the

samples used should be large. A typical rule is to require that

n > 20.

References

----------

.. [1] http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

"""

if zero_method not in ["wilcox", "pratt", "zsplit"]:

raise ValueError("Zero method should be either 'wilcox' "

"or 'pratt' or 'zsplit'")

if y is None:

d = asarray(x)

else:

x, y = map(asarray, (x, y))

if len(x) != len(y):

raise ValueError('Unequal N in wilcoxon. Aborting.')

d = x - y

if zero_method == "wilcox":

# Keep all non-zero differences

d = compress(np.not_equal(d, 0), d, axis=-1)

count = len(d)

if count < 10:

warnings.warn("Warning: sample size too small for normal approximation.")

r = stats.rankdata(abs(d))

r_plus = np.sum((d > 0) * r, axis=0)

r_minus = np.sum((d < 0) * r, axis=0)

if zero_method == "zsplit":

r_zero = np.sum((d == 0) * r, axis=0)

r_plus += r_zero / 2.

r_minus += r_zero / 2.

T = min(r_plus, r_minus)

mn = count * (count + 1.) * 0.25

se = count * (count + 1.) * (2. * count + 1.)

if zero_method == "pratt":

r = r[d != 0]

replist, repnum = find_repeats(r)

if repnum.size != 0:

# Correction for repeated elements.

se -= 0.5 * (repnum * (repnum * repnum - 1)).sum()

se = sqrt(se / 24)

correction = 0.5 * int(bool(correction)) * np.sign(T - mn)

z = (T - mn - correction) / se

prob = 2. * distributions.norm.sf(abs(z))

#print('hehe')

return Wilcoxonresult(z, prob)

后面就可以愉快的用这个工具啦~

...