File: stats.rst

package info (click to toggle)
statsmodels 0.14.6%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 49,956 kB
  • sloc: python: 254,365; f90: 612; sh: 560; javascript: 337; asm: 156; makefile: 145; ansic: 32; xml: 9
file content (767 lines) | stat: -rw-r--r-- 17,880 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
.. module:: statsmodels.stats
   :synopsis: Statistical methods and tests

.. currentmodule:: statsmodels.stats

.. _stats:


Statistics :mod:`stats`
=======================

This section collects various statistical tests and tools.
Some can be used independently of any models, some are intended as extension to the
models and model results.

API Warning: The functions and objects in this category are spread out in
various modules and might still be moved around. We expect that in future the
statistical tests will return class instances with more informative reporting
instead of only the raw numbers.


.. _stattools:


Residual Diagnostics and Specification Tests
--------------------------------------------

.. module:: statsmodels.stats.stattools
   :synopsis: Statistical methods and tests that do not fit into other categories

.. currentmodule:: statsmodels.stats.stattools

.. autosummary::
   :toctree: generated/

   durbin_watson
   jarque_bera
   omni_normtest
   medcouple
   robust_skewness
   robust_kurtosis
   expected_robust_kurtosis

.. module:: statsmodels.stats.diagnostic
   :synopsis: Statistical methods and tests to diagnose model fit problems

.. currentmodule:: statsmodels.stats.diagnostic

.. autosummary::
   :toctree: generated/

   acorr_breusch_godfrey
   acorr_ljungbox
   acorr_lm

   breaks_cusumolsresid
   breaks_hansen
   recursive_olsresiduals

   compare_cox
   compare_encompassing
   compare_j

   het_arch
   het_breuschpagan
   het_goldfeldquandt
   het_white
   spec_white

   linear_harvey_collier
   linear_lm
   linear_rainbow
   linear_reset


Outliers and influence measures
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. module:: statsmodels.stats.outliers_influence
   :synopsis: Statistical methods and measures for outliers and influence

.. currentmodule:: statsmodels.stats.outliers_influence

.. autosummary::
   :toctree: generated/

   OLSInfluence
   GLMInfluence
   MLEInfluence
   variance_inflation_factor

See also the notes on :ref:`notes on regression diagnostics <diagnostics>`

Sandwich Robust Covariances
---------------------------

The following functions calculate covariance matrices and standard errors for
the parameter estimates that are robust to heteroscedasticity and
autocorrelation in the errors. Similar to the methods that are available
for the LinearModelResults, these methods are designed for use with OLS.

.. currentmodule:: statsmodels.stats

.. autosummary::
   :toctree: generated/

   sandwich_covariance.cov_hac
   sandwich_covariance.cov_nw_panel
   sandwich_covariance.cov_nw_groupsum
   sandwich_covariance.cov_cluster
   sandwich_covariance.cov_cluster_2groups
   sandwich_covariance.cov_white_simple

The following are standalone versions of the heteroscedasticity robust
standard errors attached to LinearModelResults

.. autosummary::
   :toctree: generated/

   sandwich_covariance.cov_hc0
   sandwich_covariance.cov_hc1
   sandwich_covariance.cov_hc2
   sandwich_covariance.cov_hc3

   sandwich_covariance.se_cov


Goodness of Fit Tests and Measures
----------------------------------

some tests for goodness of fit for univariate distributions

.. module:: statsmodels.stats.gof
   :synopsis: Goodness of fit measures and tests

.. currentmodule:: statsmodels.stats.gof

.. autosummary::
   :toctree: generated/

   powerdiscrepancy
   gof_chisquare_discrete
   gof_binning_discrete
   chisquare_effectsize

.. currentmodule:: statsmodels.stats.diagnostic

.. autosummary::
   :toctree: generated/

   anderson_statistic
   normal_ad
   kstest_exponential
   kstest_fit
   kstest_normal
   lilliefors

Non-Parametric Tests
--------------------

.. module:: statsmodels.sandbox.stats.runs
   :synopsis: Experimental statistical methods and tests to analyze runs

.. currentmodule:: statsmodels.sandbox.stats.runs

.. autosummary::
   :toctree: generated/

   mcnemar
   symmetry_bowker
   median_test_ksample
   runstest_1samp
   runstest_2samp
   cochrans_q
   Runs

.. currentmodule:: statsmodels.stats.descriptivestats

.. autosummary::
   :toctree: generated/

   sign_test

.. currentmodule:: statsmodels.stats.nonparametric

.. autosummary::
   :toctree: generated/

   rank_compare_2indep
   rank_compare_2ordinal
   RankCompareResult
   cohensd2problarger
   prob_larger_continuous
   rankdata_2samp


Descriptive Statistics
----------------------

.. module:: statsmodels.stats.descriptivestats
   :synopsis: Descriptive statistics

.. currentmodule:: statsmodels.stats.descriptivestats

.. autosummary::
   :toctree: generated/

   describe
   Description

.. _interrater:

Interrater Reliability and Agreement
------------------------------------

The main function that statsmodels has currently available for interrater
agreement measures and tests is Cohen's Kappa. Fleiss' Kappa is currently
only implemented as a measures but without associated results statistics.

.. module:: statsmodels.stats.inter_rater
.. currentmodule:: statsmodels.stats.inter_rater

.. autosummary::
   :toctree: generated/

   cohens_kappa
   fleiss_kappa
   to_table
   aggregate_raters

Multiple Tests and Multiple Comparison Procedures
-------------------------------------------------

`multipletests` is a function for p-value correction, which also includes p-value
correction based on fdr in `fdrcorrection`.
`tukeyhsd` performs simultaneous testing for the comparison of (independent) means.
These three functions are verified.
GroupsStats and MultiComparison are convenience classes to multiple comparisons similar
to one way ANOVA, but still in development

.. module:: statsmodels.sandbox.stats.multicomp
   :synopsis: Experimental methods for controlling size while performing multiple comparisons


.. currentmodule:: statsmodels.stats.multitest

.. autosummary::
   :toctree: generated/

   multipletests
   fdrcorrection

.. currentmodule:: statsmodels.sandbox.stats.multicomp

.. autosummary::
   :toctree: generated/

   GroupsStats
   MultiComparison
   TukeyHSDResults

.. module:: statsmodels.stats.multicomp
   :synopsis: Methods for controlling size while performing multiple comparisons

.. currentmodule:: statsmodels.stats.multicomp

.. autosummary::
   :toctree: generated/

   pairwise_tukeyhsd

.. module:: statsmodels.stats.multitest
   :synopsis: Multiple testing p-value and FDR adjustments

.. currentmodule:: statsmodels.stats.multitest

.. autosummary::
   :toctree: generated/

   local_fdr
   fdrcorrection_twostage
   NullDistribution
   RegressionFDR

.. module:: statsmodels.stats.knockoff_regeffects
   :synopsis: Regression Knock-Off Effects

.. currentmodule:: statsmodels.stats.knockoff_regeffects

.. autosummary::
   :toctree: generated/

   CorrelationEffects
   OLSEffects
   ForwardEffects
   OLSEffects
   RegModelEffects

The following functions are not (yet) public

.. currentmodule:: statsmodels.sandbox.stats.multicomp

.. autosummary::
   :toctree: generated/

   varcorrection_pairs_unbalanced
   varcorrection_pairs_unequal
   varcorrection_unbalanced
   varcorrection_unequal

   StepDown
   catstack
   ccols
   compare_ordered
   distance_st_range
   ecdf
   get_tukeyQcrit
   homogeneous_subsets
   maxzero
   maxzerodown
   mcfdr
   qcrit
   randmvn
   rankdata
   rejectionline
   set_partition
   set_remove_subs
   tiecorrect

.. _tost:

Basic Statistics and t-Tests with frequency weights
---------------------------------------------------

Besides basic statistics, like mean, variance, covariance and correlation for
data with case weights, the classes here provide one and two sample tests
for means. The t-tests have more options than those in scipy.stats, but are
more restrictive in the shape of the arrays. Confidence intervals for means
are provided based on the same assumptions as the t-tests.

Additionally, tests for equivalence of means are available for one sample and
for two, either paired or independent, samples. These tests are based on TOST,
two one-sided tests, which have as null hypothesis that the means are not
"close" to each other.

.. module:: statsmodels.stats.weightstats
   :synopsis: Weighted statistics

.. currentmodule:: statsmodels.stats.weightstats

.. autosummary::
   :toctree: generated/

   DescrStatsW
   CompareMeans
   ttest_ind
   ttost_ind
   ttost_paired
   ztest
   ztost
   zconfint

weightstats also contains tests and confidence intervals based on summary
data

.. currentmodule:: statsmodels.stats.weightstats

.. autosummary::
   :toctree: generated/

   _tconfint_generic
   _tstat_generic
   _zconfint_generic
   _zstat_generic
   _zstat_generic2


Power and Sample Size Calculations
----------------------------------

The :mod:`power` module currently implements power and sample size calculations
for the t-tests, normal based test, F-tests and Chisquare goodness of fit test.
The implementation is class based, but the module also provides
three shortcut functions, ``tt_solve_power``, ``tt_ind_solve_power`` and
``zt_ind_solve_power`` to solve for any one of the parameters of the power
equations.


.. module:: statsmodels.stats.power
   :synopsis: Power and size calculations for common tests

.. currentmodule:: statsmodels.stats.power

.. autosummary::
   :toctree: generated/

   TTestIndPower
   TTestPower
   GofChisquarePower
   NormalIndPower
   FTestAnovaPower
   FTestPower
   normal_power_het
   normal_sample_size_one_tail
   tt_solve_power
   tt_ind_solve_power
   zt_ind_solve_power


.. _proportion_stats:

Proportion
----------

Also available are hypothesis test, confidence intervals and effect size for
proportions that can be used with NormalIndPower.

.. module:: statsmodels.stats.proportion
   :synopsis: Tests for proportions

.. currentmodule:: statsmodels.stats.proportion

.. autosummary::
   :toctree: generated

   proportion_confint
   proportion_effectsize

   binom_test
   binom_test_reject_interval
   binom_tost
   binom_tost_reject_interval

   multinomial_proportions_confint

   proportions_ztest
   proportions_ztost
   proportions_chisquare
   proportions_chisquare_allpairs
   proportions_chisquare_pairscontrol

   proportion_effectsize
   power_binom_tost
   power_ztost_prop
   samplesize_confint_proportion

Statistics for two independent samples
Status: experimental, API might change, added in 0.12

.. autosummary::
   :toctree: generated

   test_proportions_2indep
   confint_proportions_2indep
   power_proportions_2indep
   tost_proportions_2indep
   samplesize_proportions_2indep_onetail
   score_test_proportions_2indep
   _score_confint_inversion


Rates
-----

Statistical functions for rates. This currently includes hypothesis tests for
two independent samples.
See also example notebook for an overview
`Poisson Rates <examples/notebooks/generated/stats_poisson.ipynb>`_

Status: experimental, API might change, added in 0.12, refactored and enhanced
in 0.14

.. module:: statsmodels.stats.rates
   :synopsis: Tests for Poisson rates

.. currentmodule:: statsmodels.stats.rates

statistical function for one sample

.. autosummary::
   :toctree: generated

   test_poisson
   confint_poisson
   confint_quantile_poisson
   tolerance_int_poisson

statistical function for two independent samples

.. autosummary::
   :toctree: generated

   test_poisson_2indep
   etest_poisson_2indep
   confint_poisson_2indep
   tost_poisson_2indep
   nonequivalence_poisson_2indep

functions for statistical power

.. autosummary::
   :toctree: generated

   power_poisson_ratio_2indep
   power_equivalence_poisson_2indep
   power_poisson_diff_2indep
   power_negbin_ratio_2indep
   power_equivalence_neginb_2indep


Multivariate
------------

Statistical functions for multivariate samples.

This includes hypothesis test and confidence intervals for mean of sample
of multivariate observations and hypothesis tests for the structure of a
covariance matrix.

Status: experimental, API might change, added in 0.12

.. module:: statsmodels.stats.multivariate
   :synopsis: Statistical functions for multivariate samples.

.. currentmodule:: statsmodels.stats.multivariate

.. autosummary::
   :toctree: generated

   test_mvmean
   confint_mvmean
   confint_mvmean_fromstats
   test_mvmean_2indep
   test_cov
   test_cov_blockdiagonal
   test_cov_diagonal
   test_cov_oneway
   test_cov_spherical


.. _oneway_stats:

Oneway Anova
------------

Hypothesis test, confidence intervals and effect size for oneway analysis of
k samples.

Status: experimental, API might change, added in 0.12

.. module:: statsmodels.stats.oneway
   :synopsis: Statistical functions for oneway analysis, Anova.

.. currentmodule:: statsmodels.stats.oneway

.. autosummary::
   :toctree: generated


   anova_oneway
   anova_generic
   equivalence_oneway
   equivalence_oneway_generic
   power_equivalence_oneway
   _power_equivalence_oneway_emp

   test_scale_oneway
   equivalence_scale_oneway

   confint_effectsize_oneway
   confint_noncentrality
   convert_effectsize_fsqu
   effectsize_oneway
   f2_to_wellek
   fstat_to_wellek
   wellek_to_f2
   _fstat2effectsize

   scale_transform
   simulate_power_equivalence_oneway


.. _robust_stats:

Robust, Trimmed Statistics
--------------------------

Statistics for samples that are trimmed at a fixed fraction. This includes
class TrimmedMean for one sample statistics. It is used in `stats.oneway`
for trimmed "Yuen" Anova.

Status: experimental, API might change, added in 0.12

.. module:: statsmodels.stats.robust_compare
   :synopsis: Trimmed sample statistics.

.. currentmodule:: statsmodels.stats.robust_compare

.. autosummary::
   :toctree: generated

   TrimmedMean
   scale_transform
   trim_mean
   trimboth


Moment Helpers
--------------

When there are missing values, then it is possible that a correlation or
covariance matrix is not positive semi-definite. The following
functions can be used to find a correlation or covariance matrix that is
positive definite and close to the original matrix.
Additional functions estimate spatial covariance matrix and regularized
inverse covariance or precision matrix.

.. module:: statsmodels.stats.correlation_tools
   :synopsis: Procedures for ensuring correlations are positive semi-definite

.. currentmodule:: statsmodels.stats.correlation_tools

.. autosummary::
   :toctree: generated/

   corr_clipped
   corr_nearest
   corr_nearest_factor
   corr_thresholded
   cov_nearest
   cov_nearest_factor_homog
   FactoredPSDMatrix
   kernel_covariance

.. currentmodule:: statsmodels.stats.regularized_covariance

.. autosummary::
   :toctree: generated/

   RegularizedInvCovariance

These are utility functions to convert between central and non-central moments, skew,
kurtosis and cummulants.

.. module:: statsmodels.stats.moment_helpers
   :synopsis: Tools for converting moments

.. currentmodule:: statsmodels.stats.moment_helpers

.. autosummary::
   :toctree: generated/

   cum2mc
   mc2mnc
   mc2mvsk
   mnc2cum
   mnc2mc
   mnc2mvsk
   mvsk2mc
   mvsk2mnc
   cov2corr
   corr2cov
   se_cov


Mediation Analysis
------------------

Mediation analysis focuses on the relationships among three key variables:
an 'outcome', a 'treatment', and a 'mediator'. Since mediation analysis is a
form of causal inference, there are several assumptions involved that are
difficult or impossible to verify. Ideally, mediation analysis is conducted in
the context of an experiment such as this one in which the treatment is
randomly assigned. It is also common for people to conduct mediation analyses
using observational data in which the treatment may be thought of as an
'exposure'. The assumptions behind mediation analysis are even more difficult
to verify in an observational setting.

.. module:: statsmodels.stats.mediation
   :synopsis: Mediation analysis

.. currentmodule:: statsmodels.stats.mediation

.. autosummary::
   :toctree: generated/

   Mediation
   MediationResults


Oaxaca-Blinder Decomposition
----------------------------

The Oaxaca-Blinder, or Blinder-Oaxaca as some call it, decomposition attempts to explain
gaps in means of groups. It uses the linear models of two given regression equations to
show what is explained by regression coefficients and known data and what is unexplained
using the same data. There are two types of Oaxaca-Blinder decompositions, the two-fold
and the three-fold, both of which can and are used in Economics Literature to discuss
differences in groups. This method helps classify discrimination or unobserved effects.
This function attempts to port the functionality of the oaxaca command in STATA to Python.

.. module:: statsmodels.stats.oaxaca
   :synopsis: Oaxaca-Blinder Decomposition

.. currentmodule:: statsmodels.stats.oaxaca

.. autosummary::
   :toctree: generated/

   OaxacaBlinder
   OaxacaResults


Distance Dependence Measures
----------------------------

Distance dependence measures and the Distance Covariance (dCov) test.

.. module:: statsmodels.stats.dist_dependence_measures
   :synopsis: Distance Dependence Measures

.. currentmodule:: statsmodels.stats.dist_dependence_measures

.. autosummary::
   :toctree: generated/

   distance_covariance_test
   distance_statistics
   distance_correlation
   distance_covariance
   distance_variance


Meta-Analysis
-------------

Functions for basic meta-analysis of a collection of sample statistics.

Examples can be found in the notebook

 * `Meta-Analysis <examples/notebooks/generated/metaanalysis1.ipynb>`_

Status: experimental, API might change, added in 0.12

.. module:: statsmodels.stats.meta_analysis
   :synopsis: Meta-Analysis

.. currentmodule:: statsmodels.stats.meta_analysis

.. autosummary::
   :toctree: generated/

   combine_effects
   effectsize_2proportions
   effectsize_smd
   CombineResults

The module also includes internal functions to compute random effects
variance.


.. autosummary::
   :toctree: generated/

   _fit_tau_iter_mm
   _fit_tau_iterative
   _fit_tau_mm