File: stats_poisson.py

package info (click to toggle)
statsmodels 0.14.6%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 49,956 kB
  • sloc: python: 254,365; f90: 612; sh: 560; javascript: 337; asm: 156; makefile: 145; ansic: 32; xml: 9
file content (367 lines) | stat: -rw-r--r-- 14,062 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
#!/usr/bin/env python
# coding: utf-8

# DO NOT EDIT
# Autogenerated from the notebook stats_poisson.ipynb.
# Edit the notebook and then sync the output with this file.
#
# flake8: noqa
# DO NOT EDIT

# # Statistics and inference for one and two sample Poisson rates
#
# Author: Josef Perktold
#
# This notebook provides a brief overview of hypothesis tests, confidence
# intervals and other statistics for Poisson rates in one and two sample
# case. See docstrings for more options and additional details.
#
# All functions in `statsmodels.stats.rates` take summary statistics of
# the data as arguments. Those are counts of events and number of
# observations or total exposure. Some functions for Poisson have an option
# for excess dispersion. Functions for negative binomial, NB2, require the
# dispersion parameter. Excess dispersion and dispersion parameter need to
# be provided by the user and can be estimated from the original data with
# GLM-Poisson and discrete NegativeBinomial model, respectively.
#
# Note, some parts are still experimental and will likely change, some
# features are still missing and will be added in future versions.
#
# [One sample functions](#One-sample-functions)
# [Two sample functions](#Two-sample-functions)

import numpy as np
from numpy.testing import assert_allclose
import statsmodels.stats.rates as smr
from statsmodels.stats.rates import (
    # functions for 1 sample
    test_poisson,
    confint_poisson,
    tolerance_int_poisson,
    confint_quantile_poisson,

    # functions for 2 sample
    test_poisson_2indep,
    etest_poisson_2indep,
    confint_poisson_2indep,
    tost_poisson_2indep,
    nonequivalence_poisson_2indep,

    # power functions
    power_poisson_ratio_2indep,
    power_poisson_diff_2indep,
    power_equivalence_poisson_2indep,
    power_negbin_ratio_2indep,
    power_equivalence_neginb_2indep,

    # list of statistical methods
    method_names_poisson_1samp,
    method_names_poisson_2indep,
)

# ## One sample functions
#
# The main functions for one sample Poisson rates currently are
# test_poisson and confint_poisson. Both have several methods available,
# most of them are consistent between hypothesis test and confidence
# interval. Two additional functions are available for tolerance intervals
# and for confidence intervals of quantiles.
# See docstrings for details.

count1, n1 = 60, 514.775
count1 / n1

test_poisson(count1, n1, value=0.1, method="midp-c")

confint_poisson(count1, n1, method="midp-c")

# The available methods for hypothesis tests and confidence interval are
# available in the dictionary `method_names_poisson_1samp`.  See docstring
# for details.

method_names_poisson_1samp

for meth in method_names_poisson_1samp["test"]:
    tst = test_poisson(count1,
                       n1,
                       method=meth,
                       value=0.1,
                       alternative='two-sided')
    print("%-12s" % meth, tst.pvalue)

for meth in method_names_poisson_1samp["confint"]:
    tst = confint_poisson(count1, n1, method=meth)
    print("%-12s" % meth, tst)

# Two additional functions are currently available for one sample poisson
# rates, `tolerance_int_poisson` for tolerance intervals and
# `confint_quantile_poisson` for confidence intervals of Poisson quantiles.
#
# Tolerance intervals are similar to prediction intervals that combine the
# randomness of a new observation and uncertainty about the estimated
# Poisson rate. If the rate were known, then we can compute a Poisson
# interval for a new observation using the inverse cdf at the given rate.
# The tolerance interval adds uncertainty about the rate by using the
# confidence interval for the rate estimate.
#
# A tolerance interval is specified by two probabilities, `prob` is the
# coverage of the Poisson interval, `alpha` is the confidence level for the
# confidence interval of the rate estimate.
# Note, that probabilities cannot be exactly equal to the nominal
# probabilites because counts are discrete random variables. The properties
# of the intervals are specified in term of inequalities, coverage is at
# least `prob`, coverage of the confidence interval of the estimated rate is
# at least `1 - alpha`. However, most methods will not guarantee that the
# coverage inequalities hold in small samples even if the distribution is
# correctly specified.
#
# In the following example, we can expect to observe between 4 and 23
# events if the total exposure or number of observations is 100, at given
# coverage `prob` and confidence level `alpha`. The tolerance interval is
# larger than the Poisson interval at the observed rate, (5, 19), because
# the tolerance interval takes uncertainty about the parameter estimate into
# account.

exposure_new = 100
tolerance_int_poisson(count1,
                      n1,
                      prob=0.95,
                      exposure_new=exposure_new,
                      method="score",
                      alpha=0.05,
                      alternative='two-sided')

from scipy import stats

stats.poisson.interval(0.95, count1 / n1 * exposure_new)

# Aside: We can force the tolerance interval to ignore parameter
# uncertainty by specifying `alpha=1`.

tolerance_int_poisson(count1,
                      n1,
                      prob=0.95,
                      exposure_new=exposure_new,
                      method="score",
                      alpha=1,
                      alternative='two-sided')

# The last function returns a confidence interval for a Poisson quantile.
# A quantile is the inverse of the cdf function, named `ppf` in scipy.stats
# distributions.
#
# The following example shows the confidence interval for the upper bound
# of the Poisson interval at cdf probability 0.975. The upper confidence
# limit using the one-tail coverage probability is the same as the upper
# limit of the tolerance interval.

confint_quantile_poisson(count1,
                         n1,
                         prob=0.975,
                         exposure_new=100,
                         method="score",
                         alpha=0.05,
                         alternative='two-sided')

# ## Two sample functions
#
# Statistical function for two samples can compare the rates by either the
# ratio or the difference. Default is comparing the rates ratio.
#
# The `etest` functions can be directly accessed through
# `test_poisson_2indep`.

count1, n1, count2, n2 = 60, 514.775, 30, 543.087

test_poisson_2indep(count1, n1, count2, n2, method='etest-score')

confint_poisson_2indep(count1, n1, count2, n2, method='score', compare="ratio")

confint_poisson_2indep(count1, n1, count2, n2, method='score', compare="diff")

# The two sample test function, `test_poisson_2indep`, has a `value`
# option to specify null hypothesis that do not specify equality. This is
# useful for superiority and noninferiority testing with one-sided
# alternatives.
#
# As an example, the following test tests the two-sided null hypothesis
# that the rates ratio is 2. The pvalue for this hypothesis is 0.81 and we
# cannot reject that the first rate is twice the second rate.

test_poisson_2indep(count1, n1, count2, n2, value=2, method='etest-score')

# The `method_names_poisson_2indep` dictionary shows which methods are
# available when comparing two samples by either rates ratio or rates
# difference.
#
# We can use the dictionary to compute p-values and confidence intervals
# using all available methods.

method_names_poisson_2indep

for compare in ["ratio", "diff"]:
    print(compare)
    for meth in method_names_poisson_2indep["test"][compare]:
        tst = test_poisson_2indep(count1,
                                  n1,
                                  count2,
                                  n2,
                                  value=None,
                                  method=meth,
                                  compare=compare,
                                  alternative='two-sided')
        print("   %-12s" % meth, tst.pvalue)

# In a similar way we can compute confidence intervals for the rate ratio
# and rate difference for all currently available methods.

for compare in ["ratio", "diff"]:
    print(compare)
    for meth in method_names_poisson_2indep["confint"][compare]:
        ci = confint_poisson_2indep(count1,
                                    n1,
                                    count2,
                                    n2,
                                    method=meth,
                                    compare=compare)
        print("   %-12s" % meth, ci)

# We have two additional functions for hypothesis tests that specify
# interval hypothesis,
# `tost_poisson_2indep` and `nonequivalence_poisson_2indep`.
#
# The `TOST` function implements equivalence tests where the alternative
# hypothesis specifies that the two rates are within an interval of each
# other.
#
# The `nonequivalence` tests implements a test where the alternative
# specifies that the two rates differ by at least a given nonzero value.
# This is also often called a minimum effect test. This test uses two one-
# sided tests similar to TOST however with null and alternative hypothesis
# reversed compared to the equivalence test.
#
# Both functions delegate to `test_poisson_2indep` and, therefore, the
# same `method` options are available.

# The following equivalence test specifies the alternative hypothesis that
# the rate ratio is between 0.8 and 1/0.8. The observed rate ratio is 0.89.
# The pvalue is 0.107 and we cannot reject the null hypothesis in favor of
# the alternative hypothesis that the two rates are equivalent at the given
# margins. Thus the hypothesis test does not provide evidence that the two
# rates are equivalent.
#
# In the second example we test equivalence in the rate difference, where
# equivalence is defined by margins (-0.04, 0.04). The pvalue is around 0.2
# and the test does not support that the two rates are equivalent.

low = 0.8
upp = 1 / low

count1, n1, count2, n2 = 200, 1000, 450, 2000

tost_poisson_2indep(count1,
                    n1,
                    count2,
                    n2,
                    low,
                    upp,
                    method='score',
                    compare='ratio')

upp = 0.04
low = -upp
tost_poisson_2indep(count1,
                    n1,
                    count2,
                    n2,
                    low,
                    upp,
                    method='score',
                    compare='diff')

# The function `nonequivalence_poisson_2indep` tests the alternative
# hypothesis that the two rates differ by a non-neglibile amount.
#
# In the following example, the alternative hypothesis specifies that the
# rate ratio is outside the interval (0.95, 1/0.95). The null hypothesis is
# that the ratio ratio is in the interval. If the test rejects the null
# hypothesis, then it provides evidence that the rate ratio differ by more
# than the unimportant amount specified by the interval limits.
#
# A note on the relationship between point hypothesis test and interval
# hypothesis test in large samples. The point null hypothesis of
# test_poisson_2indep will reject any small deviation from the null
# hypothesis if the null hypothesis does not hold exactly and the sample
# size is large enough. The nonequivalence or minimum effect test will not
# reject the null hypothesis in large samples (sample approaches infinite)
# if rates differ by not more than the specified neglibible amount.
#
# In the example neither the point nor the interval null hypothesis are
# rejected. We do not have enough evidence to say that the rates are
# statistically different.
# Following that, we increase the sample size 20 times while keeping
# observed rates constant. In this case, the point null hypothesis test is
# rejected, the pvalue is 0.01, while the interval null hypothesis is not
# rejected, the pvalue is equal to 1.
#
# Note: The nonequivalence test is in general conservative, its size is
# bounded by alpha, but in the large sample limit with fixed nonequivalence
# margins the size approaches alpha / 2. If the nonequivalence interval
# shrinks to a single point, then the nonequivalence test is the same as the
# point hypothesis test. (see docstring)

count1, n1, count2, n2 = 200, 1000, 420, 2000
low = 0.95
upp = 1 / low
nf = 1
nonequivalence_poisson_2indep(count1 * nf,
                              n1 * nf,
                              count2 * nf,
                              n2 * nf,
                              low,
                              upp,
                              method='score',
                              compare='ratio')

test_poisson_2indep(count1 * nf,
                    n1 * nf,
                    count2 * nf,
                    n2 * nf,
                    method='score',
                    compare='ratio')

nf = 20
nonequivalence_poisson_2indep(count1 * nf,
                              n1 * nf,
                              count2 * nf,
                              n2 * nf,
                              low,
                              upp,
                              method='score',
                              compare='ratio').pvalue

test_poisson_2indep(count1 * nf,
                    n1 * nf,
                    count2 * nf,
                    n2 * nf,
                    method='score',
                    compare='ratio').pvalue

# ## Power
#
# Statsmodels has limited support for computing statistical power for the
# comparison of 2 sample Poisson and Negative Binomial rates. Those are
# based on Zhu and Lakkis and Zhu for ratio comparisons for both
# distributions, and basic normal based comparison for the Poisson rate
# difference. Other methods that correspond more closely to the available
# methods in the hypothesis test function, especially Gu, are not yet
# available.
#
# The available functions are

power_poisson_ratio_2indep
power_equivalence_poisson_2indep
power_negbin_ratio_2indep
power_equivalence_neginb_2indep

power_poisson_diff_2indep