File: comparing_two_samples.rst

package info (click to toggle)

scipy 1.16.3-3

links: PTS, VCS
area: main
in suites: forky
size: 236,088 kB
sloc: cpp: 503,720; python: 345,302; ansic: 195,677; javascript: 89,566; fortran: 56,210; cs: 3,081; f90: 1,150; sh: 857; makefile: 771; pascal: 284; csh: 135; lisp: 134; xml: 56; perl: 51

file content (39 lines) | stat: -rw-r--r-- 1,388 bytes

parent folder | download | duplicates (3)

Comparing two samples
---------------------

In the following, we are given two samples, which can come either from the
same or from different distribution, and we want to test whether these
samples have the same statistical properties.


Comparing means
^^^^^^^^^^^^^^^

Test with sample with identical means:

    >>> import scipy.stats as stats
    >>> rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)
    >>> rvs2 = stats.norm.rvs(loc=5, scale=10, size=500)
    >>> stats.ttest_ind(rvs1, rvs2)
    Ttest_indResult(statistic=-0.5489036175088705, pvalue=0.5831943748663959)  # random

Test with sample with different means:

    >>> rvs3 = stats.norm.rvs(loc=8, scale=10, size=500)
    >>> stats.ttest_ind(rvs1, rvs3)
    Ttest_indResult(statistic=-4.533414290175026, pvalue=6.507128186389019e-06)  # random

Kolmogorov-Smirnov test for two samples ks_2samp
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For the example, where both samples are drawn from the same distribution,
we cannot reject the null hypothesis, since the pvalue is high

    >>> stats.ks_2samp(rvs1, rvs2)
    KstestResult(statistic=0.026, pvalue=0.9959527565364388)  # random

In the second example, with different location, i.e., means, we can
reject the null hypothesis, since the pvalue is below 1%

    >>> stats.ks_2samp(rvs1, rvs3)
    KstestResult(statistic=0.114, pvalue=0.00299005061044668)  # random