File: statistics.texi

package info (click to toggle)
pspp 0.6.2-3
  • links: PTS
  • area: main
  • in suites: squeeze
  • size: 18,664 kB
  • ctags: 13,313
  • sloc: ansic: 132,458; sh: 48,033; perl: 1,252; lisp: 597; xml: 154; makefile: 132; sed: 16
file content (833 lines) | stat: -rw-r--r-- 29,065 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
@node Statistics
@chapter Statistics

This chapter documents the statistical procedures that PSPP supports so
far.

@menu
* DESCRIPTIVES::                Descriptive statistics.
* FREQUENCIES::                 Frequency tables.
* EXAMINE::                     Testing data for normality.
* CROSSTABS::                   Crosstabulation tables.
* NPAR TESTS::                  Nonparametric tests.
* T-TEST::                      Test hypotheses about means.
* ONEWAY::                      One way analysis of variance.
* RANK::                        Compute rank scores.
* REGRESSION::                  Linear regression.
@end menu

@node DESCRIPTIVES
@section DESCRIPTIVES

@vindex DESCRIPTIVES
@display
DESCRIPTIVES
        /VARIABLES=var_list
        /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
        /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
        /SAVE
        /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
                     SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
                     SESKEWNESS,SEKURTOSIS@}
        /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
               RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
              @{A,D@}
@end display

The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
descriptive
statistics requested by the user.  In addition, it can optionally
compute Z-scores.

The VARIABLES subcommand, which is required, specifies the list of
variables to be analyzed.  Keyword VARIABLES is optional.

All other subcommands are optional:

The MISSING subcommand determines the handling of missing variables.  If
INCLUDE is set, then user-missing values are included in the
calculations.  If NOINCLUDE is set, which is the default, user-missing
values are excluded.  If VARIABLE is set, then missing values are
excluded on a variable by variable basis; if LISTWISE is set, then
the entire case is excluded whenever any value in that case has a
system-missing or, if INCLUDE is set, user-missing value.

The FORMAT subcommand affects the output format.  Currently the
LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
set, both valid and missing number of cases are listed in the output;
when NOSERIAL is set, only valid cases are listed.

The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
the specified variables.  The Z scores are saved to new variables.
Variable names are generated by trying first the original variable name
with Z prepended and truncated to a maximum of 8 characters, then the
names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
variable names can be specified explicitly on VARIABLES in the variable
list by enclosing them in parentheses after each variable.

The STATISTICS subcommand specifies the statistics to be displayed:

@table @code
@item ALL
All of the statistics below.
@item MEAN
Arithmetic mean.
@item SEMEAN
Standard error of the mean.
@item STDDEV
Standard deviation.
@item VARIANCE
Variance.
@item KURTOSIS
Kurtosis and standard error of the kurtosis.
@item SKEWNESS
Skewness and standard error of the skewness.
@item RANGE
Range.
@item MINIMUM
Minimum value.
@item MAXIMUM
Maximum value.
@item SUM
Sum.
@item DEFAULT
Mean, standard deviation of the mean, minimum, maximum.
@item SEKURTOSIS
Standard error of the kurtosis.
@item SESKEWNESS
Standard error of the skewness.
@end table

The SORT subcommand specifies how the statistics should be sorted.  Most
of the possible values should be self-explanatory.  NAME causes the
statistics to be sorted by name.  By default, the statistics are listed
in the order that they are specified on the VARIABLES subcommand.  The A
and D settings request an ascending or descending sort order,
respectively.

@node FREQUENCIES
@section FREQUENCIES

@vindex FREQUENCIES
@display
FREQUENCIES
        /VARIABLES=var_list
        /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
                @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
                @{LABELS,NOLABELS@}
                @{AVALUE,DVALUE,AFREQ,DFREQ@}
                @{SINGLE,DOUBLE@}
                @{OLDPAGE,NEWPAGE@}
        /MISSING=@{EXCLUDE,INCLUDE@}
        /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
                     KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
                     SESKEWNESS,SEKURTOSIS,ALL,NONE@}
        /NTILES=ntiles
        /PERCENTILES=percent@dots{}
        /HISTOGRAM=[MINIMUM(x_min)] [MAXIMUM(x_max)] 
                   [@{FREQ,PCNT@}] [@{NONORMAL,NORMAL@}]
        /PIECHART=[MINIMUM(x_min)] [MAXIMUM(x_max)] @{NOMISSING,MISSING@}

(These options are not currently implemented.)
        /BARCHART=@dots{}
        /HBAR=@dots{}
        /GROUPED=@dots{}
@end display

The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
variables.
@cmd{FREQUENCIES} can also calculate and display descriptive statistics
(including median and mode) and percentiles.

@cmd{FREQUENCIES} also support graphical output in the form of
histograms and pie charts.  In the future, it will be able to produce
bar charts and output percentiles for grouped data.

The VARIABLES subcommand is the only required subcommand.  Specify the
variables to be analyzed.

The FORMAT subcommand controls the output format.  It has several
possible settings:  

@itemize @bullet
@item
TABLE, the default, causes a frequency table to be output for every
variable specified.  NOTABLE prevents them from being output.  LIMIT
with a numeric argument causes them to be output except when there are
more than the specified number of values in the table.

@item
STANDARD frequency tables contain more complete information, but also to
take up more space on the printed page.  CONDENSE frequency tables are
less informative but take up less space.  ONEPAGE with a numeric
argument will output standard frequency tables if there are the
specified number of values or less, condensed tables otherwise.  ONEPAGE
without an argument defaults to a threshold of 50 values.

@item
LABELS causes value labels to be displayed in STANDARD frequency
tables.  NOLABLES prevents this.

@item
Normally frequency tables are sorted in ascending order by value.  This
is AVALUE.  DVALUE tables are sorted in descending order by value.
AFREQ and DFREQ tables are sorted in ascending and descending order,
respectively, by frequency count.

@item
SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
frequency tables have wider spacing.

@item
OLDPAGE and NEWPAGE are not currently used.
@end itemize

The MISSING subcommand controls the handling of user-missing values.
When EXCLUDE, the default, is set, user-missing values are not included
in frequency tables or statistics.  When INCLUDE is set, user-missing
are included.  System-missing values are never included in statistics,
but are listed in frequency tables.

The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
(@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
value, and MODE, the mode.  (If there are multiple modes, the smallest
value is reported.)  By default, the mean, standard deviation of the
mean, minimum, and maximum are reported for each variable.

@cindex percentiles
PERCENTILES causes the specified percentiles to be reported.
The percentiles should  be presented at a list of numbers between 0
and 100 inclusive.  
The NTILES subcommand causes the percentiles to be reported at the
boundaries of the data set divided into the specified number of ranges.
For instance, @code{/NTILES=4} would cause quartiles to be reported.

The HISTOGRAM subcommand causes the output to include a histogram for
each specified variable.  The X axis by default ranges from the
minimum to the maximum value observed in the data, but the MINIMUM and
MAXIMUM keywords can set an explicit range.  The Y axis by default is
labeled in frequencies; use the PERCENT keyword to causes it to be
labeled in percent of the total observed count.  Specify NORMAL to
superimpose a normal curve on the histogram.

The PIECHART adds a pie chart for each variable to the data.  Each
slice represents one value, with the size of the slice proportional to
the value's frequency.  By default, all non-missing values are given
slices.  The MINIMUM and MAXIMUM keywords can be used to limit the
displayed slices to a given range of values.  The MISSING keyword adds
slices for missing values.

@node EXAMINE
@comment  node-name,  next,  previous,  up
@section EXAMINE
@vindex EXAMINE

@cindex Normality, testing for

@display
EXAMINE
        VARIABLES=var_list [BY factor_list ]
        /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
        /PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
        /CINTERVAL n
        /COMPARE=@{GROUPS,VARIABLES@}
        /ID=@{case_number, var_name@}
        /@{TOTAL,NOTOTAL@}
        /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
        /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}] 
		[@{NOREPORT,REPORT@}]

@end display

The @cmd{EXAMINE} command is used to test how closely a distribution is to a 
normal distribution.  It also shows you outliers and extreme values.

The VARIABLES subcommand specifies the dependent variables and the
independent variable to use as factors for the analysis.   Variables
listed before the first BY keyword are the dependent variables.
The dependent variables may optionally be followed by a list of
factors which tell PSPP how to break down the analysis for each
dependent variable.  The format for each factor is 
@display
var [BY var].
@end display


The STATISTICS subcommand specifies the analysis to be done.  
DESCRIPTIVES will produce a table showing some parametric and
non-parametrics statistics.  EXTREME produces a table showing extreme
values of the dependent variable.  A number in parentheses determines
how many upper and lower extremes to show.  The default number is 5.


The PLOT subcommand specifies which plots are to be produced if any.

The COMPARE subcommand is only relevant if producing boxplots, and it is only 
useful there is more than one dependent variable and at least one factor.   If 
/COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
containing boxplots for all the factors.
If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each 
each containing one boxplot per dependent variable.
If the /COMPARE subcommand is ommitted, then PSPP uses the default value of 
/COMPARE=GROUPS.

The CINTERVAL subcommand specifies the confidence interval to use in
calculation of the descriptives command.  The default it 95%.

@cindex percentiles
The PERCENTILES subcommand specifies which percentiles are to be calculated, 
and which algorithm to use for calculating them.  The default is to
calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
HAVERAGE algorithm.

The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
is given and factors have been specified in the VARIABLES subcommand,
then then statistics for the unfactored dependent variables are
produced in addition to the factored variables.  If there are no
factors specified then TOTAL and NOTOTAL have no effect.

@strong{Warning!}
If many dependent variable are given, or factors are given for which
there are many distinct values, then @cmd{EXAMINE} will produce a very
large quantity of output.


@node CROSSTABS
@section CROSSTABS

@vindex CROSSTABS
@display
CROSSTABS
        /TABLES=var_list BY var_list [BY var_list]@dots{}
        /MISSING=@{TABLE,INCLUDE,REPORT@}
        /WRITE=@{NONE,CELLS,ALL@}
        /FORMAT=@{TABLES,NOTABLES@}
                @{LABELS,NOLABELS,NOVALLABS@}
                @{PIVOT,NOPIVOT@}
                @{AVALUE,DVALUE@}
                @{NOINDEX,INDEX@}
                @{BOX,NOBOX@}
        /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
                ASRESIDUAL,ALL,NONE@}
        /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
                     KAPPA,ETA,CORR,ALL,NONE@}
        
(Integer mode.)
        /VARIABLES=var_list (low,high)@dots{}
@end display

The @cmd{CROSSTABS} procedure displays crosstabulation
tables requested by the user.  It can calculate several statistics for
each cell in the crosstabulation tables.  In addition, a number of
statistics can be calculated for each table itself.

The TABLES subcommand is used to specify the tables to be reported.  Any
number of dimensions is permitted, and any number of variables per
dimension is allowed.  The TABLES subcommand may be repeated as many
times as needed.  This is the only required subcommand in @dfn{general
mode}.  

Occasionally, one may want to invoke a special mode called @dfn{integer
mode}.  Normally, in general mode, PSPP automatically determines
what values occur in the data.  In integer mode, the user specifies the
range of values that the data assumes.  To invoke this mode, specify the
VARIABLES subcommand, giving a range of data values in parentheses for
each variable to be used on the TABLES subcommand.  Data values inside
the range are truncated to the nearest integer, then assigned to that
value.  If values occur outside this range, they are discarded.  When it
is present, the VARIABLES subcommand must precede the TABLES
subcommand.

In general mode, numeric and string variables may be specified on
TABLES.  Although long string variables are allowed, only their
initial short-string parts are used.  In integer mode, only numeric
variables are allowed.

The MISSING subcommand determines the handling of user-missing values.
When set to TABLE, the default, missing values are dropped on a table by
table basis.  When set to INCLUDE, user-missing values are included in
tables and statistics.  When set to REPORT, which is allowed only in
integer mode, user-missing values are included in tables but marked with
an @samp{M} (for ``missing'') and excluded from statistical
calculations.

Currently the WRITE subcommand is ignored.

The FORMAT subcommand controls the characteristics of the
crosstabulation tables to be displayed.  It has a number of possible
settings:

@itemize @bullet
@item
TABLES, the default, causes crosstabulation tables to be output.
NOTABLES suppresses them.

@item
LABELS, the default, allows variable labels and value labels to appear
in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
labels but suppresses value labels.

@item
PIVOT, the default, causes each TABLES subcommand to be displayed in a
pivot table format.  NOPIVOT causes the old-style crosstabulation format
to be used.

@item
AVALUE, the default, causes values to be sorted in ascending order.
DVALUE asserts a descending sort order.

@item
INDEX/NOINDEX is currently ignored.

@item
BOX/NOBOX is currently ignored.
@end itemize

The CELLS subcommand controls the contents of each cell in the displayed
crosstabulation table.  The possible settings are:

@table @asis
@item COUNT
Frequency count.
@item ROW
Row percent.
@item COLUMN
Column percent.
@item TOTAL
Table percent.
@item EXPECTED
Expected value.
@item RESIDUAL 
Residual.
@item SRESIDUAL
Standardized residual.
@item ASRESIDUAL
Adjusted standardized residual.
@item ALL
All of the above.
@item NONE
Suppress cells entirely.
@end table

@samp{/CELLS} without any settings specified requests COUNT, ROW,
COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
will be selected.

The STATISTICS subcommand selects statistics for computation:

@table @asis
@item CHISQ
@cindex chisquare
@cindex chi-square

Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
correction, linear-by-linear association.
@item PHI
Phi.
@item CC
Contingency coefficient.
@item LAMBDA
Lambda.
@item UC
Uncertainty coefficient.
@item BTAU
Tau-b.
@item CTAU
Tau-c.
@item RISK
Risk estimate.
@item GAMMA
Gamma.
@item D
Somers' D.
@item KAPPA
Cohen's Kappa.
@item ETA
Eta.
@item CORR
Spearman correlation, Pearson's r.
@item ALL
All of the above.
@item NONE
No statistics.
@end table

Selected statistics are only calculated when appropriate for the
statistic.  Certain statistics require tables of a particular size, and
some statistics are calculated only in integer mode.

@samp{/STATISTICS} without any settings selects CHISQ.  If the
STATISTICS subcommand is not given, no statistics are calculated.

@strong{Please note:} Currently the implementation of CROSSTABS has the
followings bugs:

@itemize @bullet
@item
Pearson's R (but not Spearman) is off a little.
@item
T values for Spearman's R and Pearson's R are wrong.
@item
Significance of symmetric and directional measures is not calculated.
@item
Asymmetric ASEs and T values for lambda are wrong.
@item
ASE of Goodman and Kruskal's tau is not calculated.
@item
ASE of symmetric somers' d is wrong.
@item
Approximate T of uncertainty coefficient is wrong.
@end itemize

Fixes for any of these deficiencies would be welcomed.

@node NPAR TESTS
@section NPAR TESTS

@vindex NPAR TESTS
@cindex nonparametric tests

@display 
NPAR TESTS
     
     nonparametric test subcommands
     .
     .
     .
     
     [ /STATISTICS=@{DESCRIPTIVES@} ]

     [ /MISSING=@{ANALYSIS, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
@end display

NPAR TESTS performs nonparametric tests. 
Non parametric tests make very few assumptions about the distribution of the 
data.
One or more tests may be specified by using the corresponding subcommand.
If the /STATISTICS subcommand is also specified, then summary statistics are 
produces for each variable that is the subject of any test.


@menu
* BINOMIAL::                Binomial Test
* CHISQUARE::               Chisquare Test
@end menu


@node    BINOMIAL
@subsection Binomial test
@vindex BINOMIAL
@cindex binomial test

@display 
     [ /BINOMIAL[(p)]=var_list[(value1[, value2)] ] ]
@end display 

The binomial test compares the observed distribution of a dichotomous 
variable with that of a binomial distribution.
The variable @var{p} specifies the test proportion of the binomial 
distribution.  
The default value of 0.5 is assumed if @var{p} is omitted.

If a single value appears after the variable list, then that value is
used as the threshold to partition the observed values. Values less
than or equal to the threshold value form the first category.  Values
greater than the threshold form the second category. 

If two values appear after the variable list, then they will be used
as the values which a variable must take to be in the respective
category. 
Cases for which a variable takes a value equal to neither of the specified  
values, take no part in the test for that variable.

If no values appear, then the variable must assume dichotomous
values.
If more than two distinct, non-missing values for a variable
under test are encountered then an error occurs.

If the test proportion is equal to 0.5, then a two tailed test is
reported.   For any other test proportion, a one tailed test is
reported.   
For one tailed tests, if the test proportion is less than
or equal to the observed proportion, then the significance of
observing the observed proportion or more is reported.
If the test proportion is more than the observed proportion, then the
significance of observing the observed proportion or less is reported.
That is to say, the test is always performed in the observed
direction. 

PSPP uses a very precise approximation to the gamma function to
compute the binomial significance.  Thus, exact results are reported
even for very large sample sizes.



@node    CHISQUARE
@subsection Chisquare test
@vindex CHISQUARE
@cindex chisquare test


@display
     [ /CHISQUARE=var_list[(lo,hi)] [/EXPECTED=@{EQUAL|f1, f2 @dots{} fn@}] ]
@end display 


The chisquare test produces a chi-square statistic for the differences 
between the expected and observed frequencies of the categories of a variable. 
Optionally, a range of values may appear after the variable list.  
If a range is given, then non integer values are truncated, and values
outside the  specified range are excluded from the analysis.

The /EXPECTED subcommand specifies the expected values of each
category.  
There must be exactly one non-zero expected value, for each observed
category, or the EQUAL keywork must be specified.
You may use the notation @var{n}*@var{f} to specify @var{n}
consecutive expected categories all taking a frequency of @var{f}.
The frequencies given are proportions, not absolute frequencies.  The
sum of the frequencies need not be 1.
If no /EXPECTED subcommand is given, then then equal frequencies 
are expected.


@node T-TEST
@comment  node-name,  next,  previous,  up
@section T-TEST

@vindex T-TEST

@display
T-TEST
        /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
        /CRITERIA=CIN(confidence)


(One Sample mode.)
        TESTVAL=test_value
        /VARIABLES=var_list


(Independent Samples mode.)
        GROUPS=var(value1 [, value2])
        /VARIABLES=var_list


(Paired Samples mode.)
        PAIRS=var_list [WITH var_list [(PAIRED)] ]

@end display


The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about 
means.  
It operates in one of three modes:
@itemize
@item One Sample mode.
@item Independent Groups mode.
@item Paired mode.
@end itemize

@noindent
Each of these modes are described in more detail below.
There are two optional subcommands which are common to all modes.

The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
in the tests.  The default value is 0.95.


The @cmd{MISSING} subcommand determines the handling of missing
variables.  
If INCLUDE is set, then user-missing values are included in the
calculations, but system-missing values are not.
If EXCLUDE is set, which is the default, user-missing
values are excluded as well as system-missing values. 
This is the default.

If LISTWISE is set, then the entire case is excluded from analysis
whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or 
@cmd{/GROUPS} subcommands contains a missing value.   
If ANALYSIS is set, then missing values are excluded only in the analysis for
which they would be needed. This is the default.


@menu
* One Sample Mode::             Testing against a hypothesised mean
* Independent Samples Mode::    Testing two independent groups for equal mean
* Paired Samples Mode::         Testing two interdependent groups for equal mean
@end menu

@node One Sample Mode
@subsection One Sample Mode

The @cmd{TESTVAL} subcommand invokes the One Sample mode.
This mode is used to test a population mean against a hypothesised
mean. 
The value given to the @cmd{TESTVAL} subcommand is the value against
which you wish to test.
In this mode, you must also use the @cmd{/VARIABLES} subcommand to
tell PSPP which variables you wish to test.

@node Independent Samples Mode
@comment  node-name,  next,  previous,  up
@subsection Independent Samples Mode

The @cmd{GROUPS} subcommand invokes Independent Samples mode or
`Groups' mode. 
This mode is used to test whether two groups of values have the
same population mean.
In this mode, you must also use the @cmd{/VARIABLES} subcommand to
tell PSPP the dependent variables you wish to test.

The variable given in the @cmd{GROUPS} subcommand is the independent
variable which determines to which group the samples belong.
The values in parentheses are the specific values of the independent
variable for each group.
If the parentheses are omitted and no values are given, the default values 
of 1.0 and 2.0 are assumed.

If the independent variable is numeric, 
it is acceptable to specify only one value inside the parentheses.
If you do this, cases where the independent variable is
greater than or equal to this value belong to the first group, and cases
less than this value belong to the second group.
When using this form of the @cmd{GROUPS} subcommand, missing values in
the independent variable are excluded on a listwise basis, regardless
of whether @cmd{/MISSING=LISTWISE} was specified.


@node Paired Samples Mode
@comment  node-name,  next,  previous,  up
@subsection Paired Samples Mode

The @cmd{PAIRS} subcommand introduces Paired Samples mode.
Use this mode when repeated measures have been taken from the same
samples.
If the @code{WITH} keyword is omitted, then tables for all
combinations of variables given in the @cmd{PAIRS} subcommand are
generated. 
If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
is also given, then the number of variables preceding @code{WITH}
must be the same as the number following it.
In this case, tables for each respective pair of variables are
generated.
In the event that the @code{WITH} keyword is given, but the
@code{(PAIRED)} keyword is omitted, then tables for each combination
of variable preceding @code{WITH} against variable following
@code{WITH} are generated.


@node ONEWAY
@comment  node-name,  next,  previous,  up
@section ONEWAY

@vindex ONEWAY
@cindex analysis of variance
@cindex ANOVA

@display
ONEWAY
        [/VARIABLES = ] var_list BY var
        /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
        /CONTRAST= value1 [, value2] ... [,valueN]
        /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}

@end display

The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
variables factored by a single independent variable.
It is used to compare the means of a population
divided into more than two groups. 

The  variables to be analysed should be given in the @code{VARIABLES}
subcommand.  
The list of variables must be followed by the @code{BY} keyword and
the name of the independent (or factor) variable.

You can use the @code{STATISTICS} subcommand to tell PSPP to display
ancilliary information.  The options accepted are:
@itemize
@item DESCRIPTIVES
Displays descriptive statistics about the groups factored by the independent
variable.
@item HOMOGENEITY
Displays the Levene test of Homogeneity of Variance for the
variables and their groups.
@end itemize

The @code{CONTRAST} subcommand is used when you anticipate certain
differences between the groups.
The subcommand must be followed by a list of numerals which are the
coefficients of the groups to be tested.
The number of coefficients must correspond to the number of distinct
groups (or values of the independent variable).
If the total sum of the coefficients are not zero, then PSPP will
display a warning, but will proceed with the analysis.
The @code{CONTRAST} subcommand may be given up to 10 times in order
to specify different contrast tests.
@setfilename ignored

@node RANK
@comment  node-name,  next,  previous,  up
@section RANK

@vindex RANK
@display
RANK
        [VARIABLES=] var_list [@{A,D@}] [BY var_list]
        /TIES=@{MEAN,LOW,HIGH,CONDENSE@}
        /FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
        /PRINT[=@{YES,NO@}
        /MISSING=@{EXCLUDE,INCLUDE@}

        /RANK [INTO var_list]
        /NTILES(k) [INTO var_list]
        /NORMAL [INTO var_list]
        /PERCENT [INTO var_list]
        /RFRACTION [INTO var_list]
        /PROPORTION [INTO var_list]
        /N [INTO var_list]
        /SAVAGE [INTO var_list]
@end display

The @cmd{RANK} command ranks variables and stores the results into new
variables. 

The VARIABLES subcommand, which is mandatory, specifies one or
more variables whose values are to be ranked.  
After each variable, @samp{A} or @samp{D} may appear, indicating that
the variable is to be ranked in ascending or descending order.
Ascending is the default.
If a BY keyword appears, it should be followed by a list of variables
which are to serve as group variables.  
In this case, the cases are gathered into groups, and ranks calculated
for each group.

The TIES subcommand specifies how tied values are to be treated.  The
default is to take the mean value of all the tied cases.

The FRACTION subcommand specifies how proportional ranks are to be
calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
functions are requested.

The PRINT subcommand may be used to specify that a summary of the rank
variables created should appear in the output.

The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
PROPORTION and SAVAGE.  Any number of function subcommands may appear.
If none are given, then the default is RANK.
The NTILES subcommand must take an integer specifying the number of
partitions into which values should be ranked.
Each subcommand may be followed by the INTO keyword and a list of
variables which are the variables to be created and receive the rank
scores.  There may be as many variables specified as there are
variables named on the VARIABLES subcommand.  If fewer are specified,
then the variable names are automatically created.

The MISSING subcommand determines how user missing values are to be
treated. A setting of EXCLUDE means that variables whose values are
user-missing are to be excluded from the rank scores. A setting of
INCLUDE means they are to be included.  The default is EXCLUDE.

@include regression.texi