File: students_t_examples.qbk

package info (click to toggle)
scipy 1.16.0-1exp7
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 234,820 kB
  • sloc: cpp: 503,145; python: 344,611; ansic: 195,638; javascript: 89,566; fortran: 56,210; cs: 3,081; f90: 1,150; sh: 848; makefile: 785; pascal: 284; csh: 135; lisp: 134; xml: 56; perl: 51
file content (785 lines) | stat: -rw-r--r-- 32,857 bytes parent folder | download | duplicates (11)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785

[section:st_eg Student's t Distribution Examples]

[section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution]

Let's say you have a sample mean, you may wish to know what confidence intervals
you can place on that mean.  Colloquially: "I want an interval that I can be
P% sure contains the true mean".  (On a technical point, note that
the interval either contains the true mean or it does not: the
meaning of the confidence level is subtly
different from this colloquialism.  More background information can be found on the
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm NIST site]).

The formula for the interval can be expressed as:

[equation dist_tutorial4]

Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation,
/N/ is the sample size, /[alpha]/ is the desired significance level and
['t[sub ([alpha]/2,N-1)]] is the upper critical value of the Students-t
distribution with /N-1/ degrees of freedom.

[note
The quantity [alpha] is the maximum acceptable risk of falsely rejecting
the null-hypothesis.  The smaller the value of [alpha] the greater the
strength of the test.

The confidence level of the test is defined as 1 - [alpha], and often expressed
as a percentage.  So for example a significance level of 0.05, is equivalent
to a 95% confidence level.  Refer to
[@http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm
"What are confidence intervals?"] in __handbook for more information.
] [/ Note]

[note
The usual assumptions of
[@http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables independent and identically distributed (i.i.d.)]
variables and [@http://en.wikipedia.org/wiki/Normal_distribution normal distribution]
of course apply here, as they do in other examples.
]

From the formula, it should be clear that:

* The width of the confidence interval decreases as the sample size increases.
* The width increases as the standard deviation increases.
* The width increases as the ['confidence level increases] (0.5 towards 0.99999 - stronger).
* The width increases as the ['significance level decreases] (0.5 towards 0.00000...01 - stronger).

The following example code is taken from the example program
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].

We'll begin by defining a procedure to calculate intervals for
various confidence levels; the procedure will print these out
as a table:

   // Needed includes:
   #include <boost/math/distributions/students_t.hpp>
   #include <iostream>
   #include <iomanip>
   // Bring everything into global namespace for ease of use:
   using namespace boost::math;
   using namespace std;

   void confidence_limits_on_mean(
      double Sm,           // Sm = Sample Mean.
      double Sd,           // Sd = Sample Standard Deviation.
      unsigned Sn)         // Sn = Sample Size.
   {
      using namespace std;
      using namespace boost::math;

      // Print out general info:
      cout <<
         "__________________________________\n"
         "2-Sided Confidence Limits For Mean\n"
         "__________________________________\n\n";
      cout << setprecision(7);
      cout << setw(40) << left << "Number of Observations" << "=  " << Sn << "\n";
      cout << setw(40) << left << "Mean" << "=  " << Sm << "\n";
      cout << setw(40) << left << "Standard Deviation" << "=  " << Sd << "\n";

We'll define a table of significance/risk levels for which we'll compute intervals:

      double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };

Note that these are the complements of the confidence/probability levels: 0.5, 0.75, 0.9 .. 0.99999).

Next we'll declare the distribution object we'll need, note that
the /degrees of freedom/ parameter is the sample size less one:

  students_t dist(Sn - 1);

Most of what follows in the program is pretty printing, so let's focus
on the calculation of the interval. First we need the t-statistic,
computed using the /quantile/ function and our significance level.  Note
that since the significance levels are the complement of the probability,
we have to wrap the arguments in a call to /complement(...)/:

   double T = quantile(complement(dist, alpha[i] / 2));

Note that alpha was divided by two, since we'll be calculating
both the upper and lower bounds: had we been interested in a single
sided interval then we would have omitted this step.

Now to complete the picture, we'll get the (one-sided) width of the
interval from the t-statistic
by multiplying by the standard deviation, and dividing by the square
root of the sample size:

   double w = T * Sd / sqrt(double(Sn));

The two-sided interval is then the sample mean plus and minus this width.

And apart from some more pretty-printing that completes the procedure.

Let's take a look at some sample output, first using the
[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
Heat flow data] from the NIST site.  The data set was collected
by Bob Zarr of NIST in January, 1990 from a heat flow meter
calibration and stability analysis.
The corresponding dataplot
output for this test can be found in
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
section 3.5.2] of the __handbook.

[pre'''
   __________________________________
   2-Sided Confidence Limits For Mean
   __________________________________

   Number of Observations                  =  195
   Mean                                    =  9.26146
   Standard Deviation                      =  0.02278881


   ___________________________________________________________________
   Confidence       T           Interval          Lower          Upper
    Value (%)     Value          Width            Limit          Limit
   ___________________________________________________________________
       50.000     0.676       1.103e-003        9.26036        9.26256
       75.000     1.154       1.883e-003        9.25958        9.26334
       90.000     1.653       2.697e-003        9.25876        9.26416
       95.000     1.972       3.219e-003        9.25824        9.26468
       99.000     2.601       4.245e-003        9.25721        9.26571
       99.900     3.341       5.453e-003        9.25601        9.26691
       99.990     3.973       6.484e-003        9.25498        9.26794
       99.999     4.537       7.404e-003        9.25406        9.26886
''']

As you can see the large sample size (195) and small standard deviation (0.023)
have combined to give very small intervals, indeed we can be
very confident that the true mean is 9.2.

For comparison the next example data output is taken from
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
The values result from the determination of mercury by cold-vapour
atomic absorption.

[pre'''
   __________________________________
   2-Sided Confidence Limits For Mean
   __________________________________

   Number of Observations                  =  3
   Mean                                    =  37.8000000
   Standard Deviation                      =  0.9643650


   ___________________________________________________________________
   Confidence       T           Interval          Lower          Upper
    Value (%)     Value          Width            Limit          Limit
   ___________________________________________________________________
       50.000     0.816            0.455       37.34539       38.25461
       75.000     1.604            0.893       36.90717       38.69283
       90.000     2.920            1.626       36.17422       39.42578
       95.000     4.303            2.396       35.40438       40.19562
       99.000     9.925            5.526       32.27408       43.32592
       99.900    31.599           17.594       20.20639       55.39361
       99.990    99.992           55.673      -17.87346       93.47346
       99.999   316.225          176.067     -138.26683      213.86683
''']

This time the fact that there are only three measurements leads to
much wider intervals, indeed such large intervals that it's hard
to be very confident in the location of the mean.

[endsect] [/section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution]

[section:tut_mean_test Testing a sample mean for difference from a "true" mean]

When calibrating or comparing a scientific instrument or measurement method of some kind,
we want to be answer the question "Does an observed sample mean differ from the
"true" mean in any significant way?".  If it does, then we have evidence of
a systematic difference.  This question can be answered with a Students-t test:
more information can be found
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
on the NIST site].

Of course, the assignment of "true" to one mean may be quite arbitrary,
often this is simply a "traditional" method of measurement.

The following example code is taken from the example program
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].

We'll begin by defining a procedure to determine which of the
possible hypothesis are rejected or not-rejected
at a given significance level:

[note
Non-statisticians might say 'not-rejected' means 'accepted',
(often of the null-hypothesis) implying, wrongly, that there really *IS* no difference,
but statisticians eschew this to avoid implying that there is positive evidence of 'no difference'.
'Not-rejected' here means there is *no evidence* of difference, but there still might well be a difference.
For example, see [@http://en.wikipedia.org/wiki/Argument_from_ignorance argument from ignorance] and
[@http://www.bmj.com/cgi/content/full/311/7003/485 Absence of evidence does not constitute evidence of absence.]
] [/ note]


   // Needed includes:
   #include <boost/math/distributions/students_t.hpp>
   #include <iostream>
   #include <iomanip>
   // Bring everything into global namespace for ease of use:
   using namespace boost::math;
   using namespace std;

   void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha)
   {
      //
      // M = true mean.
      // Sm = Sample Mean.
      // Sd = Sample Standard Deviation.
      // Sn = Sample Size.
      // alpha = Significance Level.

Most of the procedure is pretty-printing, so let's just focus on the
calculation, we begin by calculating the t-statistic:

   // Difference in means:
   double diff = Sm - M;
   // Degrees of freedom:
   unsigned v = Sn - 1;
   // t-statistic:
   double t_stat = diff * sqrt(double(Sn)) / Sd;

Finally calculate the probability from the t-statistic. If we're interested
in simply whether there is a difference (either less or greater) or not,
we don't care about the sign of the t-statistic,
and we take the complement of the probability for comparison
to the significance level:

   students_t dist(v);
   double q = cdf(complement(dist, fabs(t_stat)));

The procedure then prints out the results of the various tests
that can be done, these
can be summarised in the following table:

[table
[[Hypothesis][Test]]
[[The Null-hypothesis: there is
*no difference* in means]
[Reject if complement of CDF for |t| < significance level / 2:

`cdf(complement(dist, fabs(t))) < alpha / 2`]]

[[The Alternative-hypothesis: there
*is difference* in means]
[Reject if complement of CDF for |t| > significance level / 2:

`cdf(complement(dist, fabs(t))) > alpha / 2`]]

[[The Alternative-hypothesis: the sample mean *is less* than
the true mean.]
[Reject if CDF of t > 1 - significance level:

`cdf(complement(dist, t)) < alpha`]]

[[The Alternative-hypothesis: the sample mean *is greater* than
the true mean.]
[Reject if complement of CDF of t < significance level:

`cdf(dist, t) < alpha`]]
]

[note
Notice that the comparisons are against `alpha / 2` for a two-sided test
and against `alpha` for a one-sided test]

Now that we have all the parts in place, let's take a look at some
sample output, first using the
[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
Heat flow data] from the NIST site.  The data set was collected
by Bob Zarr of NIST in January, 1990 from a heat flow meter
calibration and stability analysis.  The corresponding dataplot
output for this test can be found in
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
section 3.5.2] of the __handbook.

[pre
__________________________________
Student t test for a single sample
__________________________________

Number of Observations                                 =  195
Sample Mean                                            =  9.26146
Sample Standard Deviation                              =  0.02279
Expected True Mean                                     =  5.00000

Sample Mean - Expected Test Mean                       =  4.26146
Degrees of Freedom                                     =  194
T Statistic                                            =  2611.28380
Probability that difference is due to chance           =  0.000e+000

Results for Alternative Hypothesis and alpha           =  0.0500

Alternative Hypothesis     Conclusion
Mean != 5.000            NOT REJECTED
Mean  < 5.000            REJECTED
Mean  > 5.000            NOT REJECTED
]

You will note the line that says the probability that the difference is
due to chance is zero.  From a philosophical point of view, of course,
the probability can never reach zero.  However, in this case the calculated
probability is smaller than the smallest representable double precision number,
hence the appearance of a zero here.  Whatever its "true" value is, we know it
must be extraordinarily small, so the alternative hypothesis - that there is
a difference in means - is not rejected.

For comparison the next example data output is taken from
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
The values result from the determination of mercury by cold-vapour
atomic absorption.

[pre
__________________________________
Student t test for a single sample
__________________________________

Number of Observations                                 =  3
Sample Mean                                            =  37.80000
Sample Standard Deviation                              =  0.96437
Expected True Mean                                     =  38.90000

Sample Mean - Expected Test Mean                       =  -1.10000
Degrees of Freedom                                     =  2
T Statistic                                            =  -1.97566
Probability that difference is due to chance           =  1.869e-001

Results for Alternative Hypothesis and alpha           =  0.0500

Alternative Hypothesis     Conclusion
Mean != 38.900            REJECTED
Mean  < 38.900            NOT REJECTED
Mean  > 38.900            NOT REJECTED
]

As you can see the small number of measurements (3) has led to a large uncertainty
in the location of the true mean.  So even though there appears to be a difference
between the sample mean and the expected true mean, we conclude that there
is no significant difference, and are unable to reject the null hypothesis.
However, if we were to lower the bar for acceptance down to alpha = 0.1
(a 90% confidence level) we see a different output:

[pre
__________________________________
Student t test for a single sample
__________________________________

Number of Observations                                 =  3
Sample Mean                                            =  37.80000
Sample Standard Deviation                              =  0.96437
Expected True Mean                                     =  38.90000

Sample Mean - Expected Test Mean                       =  -1.10000
Degrees of Freedom                                     =  2
T Statistic                                            =  -1.97566
Probability that difference is due to chance           =  1.869e-001

Results for Alternative Hypothesis and alpha           =  0.1000

Alternative Hypothesis     Conclusion
Mean != 38.900            REJECTED
Mean  < 38.900            NOT REJECTED
Mean  > 38.900            REJECTED
]

In this case, we really have a borderline result,
and more data (and/or more accurate data),
is needed for a more convincing conclusion.

[endsect] [/section:tut_mean_test Testing a sample mean for difference from a "true" mean]


[section:tut_mean_size Estimating how large a sample size would have to become
in order to give a significant Students-t test result with a single sample test]

Imagine you have conducted a Students-t test on a single sample in order
to check for systematic errors in your measurements.  Imagine that the
result is borderline.  At this point one might go off and collect more data,
but it might be prudent to first ask the question "How much more?".
The parameter estimators of the students_t_distribution class
can provide this information.

This section is based on the example code in
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]
and we begin by defining a procedure that will print out a table of
estimated sample sizes for various confidence levels:

   // Needed includes:
   #include <boost/math/distributions/students_t.hpp>
   #include <iostream>
   #include <iomanip>
   // Bring everything into global namespace for ease of use:
   using namespace boost::math;
   using namespace std;

   void single_sample_find_df(
      double M,          // M = true mean.
      double Sm,         // Sm = Sample Mean.
      double Sd)         // Sd = Sample Standard Deviation.
   {

Next we define a table of significance levels:

      double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };

Printing out the table of sample sizes required for various confidence levels
begins with the table header:

      cout << "\n\n"
              "_______________________________________________________________\n"
              "Confidence       Estimated          Estimated\n"
              " Value (%)      Sample Size        Sample Size\n"
              "              (one sided test)    (two sided test)\n"
              "_______________________________________________________________\n";


And now the important part: the sample sizes required.  Class
`students_t_distribution` has a static member function
`find_degrees_of_freedom` that will calculate how large
a sample size needs to be in order to give a definitive result.

The first argument is the difference between the means that you
wish to be able to detect, here it's the absolute value of the
difference between the sample mean, and the true mean.

Then come two probability values: alpha and beta.  Alpha is the
maximum acceptable risk of rejecting the null-hypothesis when it is
in fact true.  Beta is the maximum acceptable risk of failing to reject
the null-hypothesis when in fact it is false.
Also note that for a two-sided test, alpha must be divided by 2.

The final parameter of the function is the standard deviation of the sample.

In this example, we assume that alpha and beta are the same, and call
`find_degrees_of_freedom` twice: once with alpha for a one-sided test,
and once with alpha/2 for a two-sided test.

      for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
      {
         // Confidence value:
         cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
         // calculate df for single sided test:
         double df = students_t::find_degrees_of_freedom(
            fabs(M - Sm), alpha[i], alpha[i], Sd);
         // convert to sample size:
         double size = ceil(df) + 1;
         // Print size:
         cout << fixed << setprecision(0) << setw(16) << right << size;
         // calculate df for two sided test:
         df = students_t::find_degrees_of_freedom(
            fabs(M - Sm), alpha[i]/2, alpha[i], Sd);
         // convert to sample size:
         size = ceil(df) + 1;
         // Print size:
         cout << fixed << setprecision(0) << setw(16) << right << size << endl;
      }
      cout << endl;
   }

Let's now look at some sample output using data taken from
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
The values result from the determination of mercury by cold-vapour
atomic absorption.

Only three measurements were made, and the Students-t test above
gave a borderline result, so this example
will show us how many samples would need to be collected:

[pre'''
_____________________________________________________________
Estimated sample sizes required for various confidence levels
_____________________________________________________________

True Mean                               =  38.90000
Sample Mean                             =  37.80000
Sample Standard Deviation               =  0.96437


_______________________________________________________________
Confidence       Estimated          Estimated
 Value (%)      Sample Size        Sample Size
              (one sided test)    (two sided test)
_______________________________________________________________
    75.000               3               4
    90.000               7               9
    95.000              11              13
    99.000              20              22
    99.900              35              37
    99.990              50              53
    99.999              66              68
''']

So in this case, many more measurements would have had to be made,
for example at the 95% level, 14 measurements in total for a two-sided test.

[endsect] [/section:tut_mean_size Estimating how large a sample size would have to become in order to give a significant Students-t test result with a single sample test]

[section:two_sample_students_t Comparing the means of two samples with the Students-t test]

Imagine that we have two samples, and we wish to determine whether
their means are different or not.  This situation often arises when
determining whether a new process or treatment is better than an old one.

In this example, we'll be using the
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm
Car Mileage sample data] from the
[@http://www.itl.nist.gov NIST website].  The data compares
miles per gallon of US cars with miles per gallon of Japanese cars.

The sample code is in
[@../../example/students_t_two_samples.cpp students_t_two_samples.cpp].

There are two ways in which this test can be conducted: we can assume
that the true standard deviations of the two samples are equal or not.
If the standard deviations are assumed to be equal, then the calculation
of the t-statistic is greatly simplified, so we'll examine that case first.
In real life we should verify whether this assumption is valid with a
Chi-Squared test for equal variances.

We begin by defining a procedure that will conduct our test assuming equal
variances:

   // Needed headers:
   #include <boost/math/distributions/students_t.hpp>
   #include <iostream>
   #include <iomanip>
   // Simplify usage:
   using namespace boost::math;
   using namespace std;

   void two_samples_t_test_equal_sd(
           double Sm1,       // Sm1 = Sample 1 Mean.
           double Sd1,       // Sd1 = Sample 1 Standard Deviation.
           unsigned Sn1,     // Sn1 = Sample 1 Size.
           double Sm2,       // Sm2 = Sample 2 Mean.
           double Sd2,       // Sd2 = Sample 2 Standard Deviation.
           unsigned Sn2,     // Sn2 = Sample 2 Size.
           double alpha)     // alpha = Significance Level.
   {


Our procedure will begin by calculating the t-statistic, assuming
equal variances the needed formulae are:

[equation dist_tutorial1]

where Sp is the "pooled" standard deviation of the two samples,
and /v/ is the number of degrees of freedom of the two combined
samples.  We can now write the code to calculate the t-statistic:

   // Degrees of freedom:
   double v = Sn1 + Sn2 - 2;
   cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";
   // Pooled variance:
   double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v);
   cout << setw(55) << left << "Pooled Standard Deviation" << "=  " << sp << "\n";
   // t-statistic:
   double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2));
   cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";

The next step is to define our distribution object, and calculate the
complement of the probability:

   students_t dist(v);
   double q = cdf(complement(dist, fabs(t_stat)));
   cout << setw(55) << left << "Probability that difference is due to chance" << "=  "
      << setprecision(3) << scientific << 2 * q << "\n\n";

Here we've used the absolute value of the t-statistic, because we initially
want to know simply whether there is a difference or not (a two-sided test).
However, we can also test whether the mean of the second sample is greater
or is less (one-sided test) than that of the first:
all the possible tests are summed up in the following table:

[table
[[Hypothesis][Test]]
[[The Null-hypothesis: there is
*no difference* in means]
[Reject if complement of CDF for |t| < significance level / 2:

`cdf(complement(dist, fabs(t))) < alpha / 2`]]

[[The Alternative-hypothesis: there is a
*difference* in means]
[Reject if complement of CDF for |t| > significance level / 2:

`cdf(complement(dist, fabs(t))) > alpha / 2`]]

[[The Alternative-hypothesis: Sample 1 Mean is *less* than
Sample 2 Mean.]
[Reject if CDF of t > significance level:

`cdf(dist, t) > alpha`]]

[[The Alternative-hypothesis: Sample 1 Mean is *greater* than
Sample 2 Mean.]

[Reject if complement of CDF of t > significance level:

`cdf(complement(dist, t)) > alpha`]]
]

[note
For a two-sided test we must compare against alpha / 2 and not alpha.]

Most of the rest of the sample program is pretty-printing, so we'll
skip over that, and take a look at the sample output for alpha=0.05
(a 95% probability level).  For comparison the dataplot output
for the same data is in
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm
section 1.3.5.3] of the __handbook.

[pre'''
   ________________________________________________
   Student t test for two samples (equal variances)
   ________________________________________________

   Number of Observations (Sample 1)                      =  249
   Sample 1 Mean                                          =  20.145
   Sample 1 Standard Deviation                            =  6.4147
   Number of Observations (Sample 2)                      =  79
   Sample 2 Mean                                          =  30.481
   Sample 2 Standard Deviation                            =  6.1077
   Degrees of Freedom                                     =  326
   Pooled Standard Deviation                              =  6.3426
   T Statistic                                            =  -12.621
   Probability that difference is due to chance           =  5.273e-030

   Results for Alternative Hypothesis and alpha           =  0.0500'''

   Alternative Hypothesis              Conclusion
   Sample 1 Mean != Sample 2 Mean       NOT REJECTED
   Sample 1 Mean <  Sample 2 Mean       NOT REJECTED
   Sample 1 Mean >  Sample 2 Mean       REJECTED
]

So with a probability that the difference is due to chance of just
5.273e-030, we can safely conclude that there is indeed a difference.

The tests on the alternative hypothesis show that we must
also reject the hypothesis that Sample 1 Mean is
greater than that for Sample 2: in this case Sample 1 represents the
miles per gallon for Japanese cars, and Sample 2 the miles per gallon for
US cars, so we conclude that Japanese cars are on average more
fuel efficient.

Now that we have the simple case out of the way, let's look for a moment
at the more complex one: that the standard deviations of the two samples
are not equal.  In this case the formula for the t-statistic becomes:

[equation dist_tutorial2]

And for the combined degrees of freedom we use the
[@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite]
approximation:

[equation dist_tutorial3]

Note that this is one of the rare situations where the degrees-of-freedom
parameter to the Student's t distribution is a real number, and not an
integer value.

[note
Some statistical packages truncate the effective degrees of freedom to
an integer value: this may be necessary if you are relying on lookup tables,
but since our code fully supports non-integer degrees of freedom there is no
need to truncate in this case.  Also note that when the degrees of freedom
is small then the Welch-Satterthwaite approximation may be a significant
source of error.]

Putting these formulae into code we get:

   // Degrees of freedom:
   double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2;
   v *= v;
   double t1 = Sd1 * Sd1 / Sn1;
   t1 *= t1;
   t1 /=  (Sn1 - 1);
   double t2 = Sd2 * Sd2 / Sn2;
   t2 *= t2;
   t2 /= (Sn2 - 1);
   v /= (t1 + t2);
   cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";
   // t-statistic:
   double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2);
   cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";

Thereafter the code and the tests are performed the same as before.  Using
are car mileage data again, here's what the output looks like:

[pre'''
   __________________________________________________
   Student t test for two samples (unequal variances)
   __________________________________________________

   Number of Observations (Sample 1)                      =  249
   Sample 1 Mean                                          =  20.145
   Sample 1 Standard Deviation                            =  6.4147
   Number of Observations (Sample 2)                      =  79
   Sample 2 Mean                                          =  30.481
   Sample 2 Standard Deviation                            =  6.1077
   Degrees of Freedom                                     =  136.87
   T Statistic                                            =  -12.946
   Probability that difference is due to chance           =  1.571e-025

   Results for Alternative Hypothesis and alpha           =  0.0500'''

   Alternative Hypothesis              Conclusion
   Sample 1 Mean != Sample 2 Mean       NOT REJECTED
   Sample 1 Mean <  Sample 2 Mean       NOT REJECTED
   Sample 1 Mean >  Sample 2 Mean       REJECTED
]

This time allowing the variances in the two samples to differ has yielded
a higher likelihood that the observed difference is down to chance alone
(1.571e-025 compared to 5.273e-030 when equal variances were assumed).
However, the conclusion remains the same: US cars are less fuel efficient
than Japanese models.

[endsect] [/section:two_sample_students_t Comparing the means of two samples with the Students-t test]

[section:paired_st Comparing two paired samples with the Student's t distribution]

Imagine that we have a before and after reading for each item in the sample:
for example we might have measured blood pressure before and after administration
of a new drug.  We can't pool the results and compare the means before and after
the change, because each patient will have a different baseline reading.
Instead we calculate the difference between before and after measurements
in each patient, and calculate the mean and standard deviation of the differences.
To test whether a significant change has taken place, we can then test
the null-hypothesis that the true mean is zero using the same procedure
we used in the single sample cases previously discussed.

That means we can:

* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean].
If the endpoints of the interval differ in sign then we are unable to reject
the null-hypothesis that there is no change.
* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If the
result is consistent with a true mean of zero, then we are unable to reject the
null-hypothesis that there is no change.
* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would need
in order to obtain a significant result].

[endsect] [/section:paired_st Comparing two paired samples with the Student's t distribution]


[endsect] [/section:st_eg Student's t]

[/
  Copyright 2006, 2012 John Maddock and Paul A. Bristow.
  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]