1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833
|
@node Statistics
@chapter Statistics
This chapter documents the statistical procedures that PSPP supports so
far.
@menu
* DESCRIPTIVES:: Descriptive statistics.
* FREQUENCIES:: Frequency tables.
* EXAMINE:: Testing data for normality.
* CROSSTABS:: Crosstabulation tables.
* NPAR TESTS:: Nonparametric tests.
* T-TEST:: Test hypotheses about means.
* ONEWAY:: One way analysis of variance.
* RANK:: Compute rank scores.
* REGRESSION:: Linear regression.
@end menu
@node DESCRIPTIVES
@section DESCRIPTIVES
@vindex DESCRIPTIVES
@display
DESCRIPTIVES
/VARIABLES=var_list
/MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
/FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
/SAVE
/STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
SESKEWNESS,SEKURTOSIS@}
/SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
@{A,D@}
@end display
The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
descriptive
statistics requested by the user. In addition, it can optionally
compute Z-scores.
The VARIABLES subcommand, which is required, specifies the list of
variables to be analyzed. Keyword VARIABLES is optional.
All other subcommands are optional:
The MISSING subcommand determines the handling of missing variables. If
INCLUDE is set, then user-missing values are included in the
calculations. If NOINCLUDE is set, which is the default, user-missing
values are excluded. If VARIABLE is set, then missing values are
excluded on a variable by variable basis; if LISTWISE is set, then
the entire case is excluded whenever any value in that case has a
system-missing or, if INCLUDE is set, user-missing value.
The FORMAT subcommand affects the output format. Currently the
LABELS/NOLABELS and NOINDEX/INDEX settings are not used. When SERIAL is
set, both valid and missing number of cases are listed in the output;
when NOSERIAL is set, only valid cases are listed.
The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
the specified variables. The Z scores are saved to new variables.
Variable names are generated by trying first the original variable name
with Z prepended and truncated to a maximum of 8 characters, then the
names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence. In addition, Z score
variable names can be specified explicitly on VARIABLES in the variable
list by enclosing them in parentheses after each variable.
The STATISTICS subcommand specifies the statistics to be displayed:
@table @code
@item ALL
All of the statistics below.
@item MEAN
Arithmetic mean.
@item SEMEAN
Standard error of the mean.
@item STDDEV
Standard deviation.
@item VARIANCE
Variance.
@item KURTOSIS
Kurtosis and standard error of the kurtosis.
@item SKEWNESS
Skewness and standard error of the skewness.
@item RANGE
Range.
@item MINIMUM
Minimum value.
@item MAXIMUM
Maximum value.
@item SUM
Sum.
@item DEFAULT
Mean, standard deviation of the mean, minimum, maximum.
@item SEKURTOSIS
Standard error of the kurtosis.
@item SESKEWNESS
Standard error of the skewness.
@end table
The SORT subcommand specifies how the statistics should be sorted. Most
of the possible values should be self-explanatory. NAME causes the
statistics to be sorted by name. By default, the statistics are listed
in the order that they are specified on the VARIABLES subcommand. The A
and D settings request an ascending or descending sort order,
respectively.
@node FREQUENCIES
@section FREQUENCIES
@vindex FREQUENCIES
@display
FREQUENCIES
/VARIABLES=var_list
/FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
@{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
@{LABELS,NOLABELS@}
@{AVALUE,DVALUE,AFREQ,DFREQ@}
@{SINGLE,DOUBLE@}
@{OLDPAGE,NEWPAGE@}
/MISSING=@{EXCLUDE,INCLUDE@}
/STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
SESKEWNESS,SEKURTOSIS,ALL,NONE@}
/NTILES=ntiles
/PERCENTILES=percent@dots{}
/HISTOGRAM=[MINIMUM(x_min)] [MAXIMUM(x_max)]
[@{FREQ,PCNT@}] [@{NONORMAL,NORMAL@}]
/PIECHART=[MINIMUM(x_min)] [MAXIMUM(x_max)] @{NOMISSING,MISSING@}
(These options are not currently implemented.)
/BARCHART=@dots{}
/HBAR=@dots{}
/GROUPED=@dots{}
@end display
The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
variables.
@cmd{FREQUENCIES} can also calculate and display descriptive statistics
(including median and mode) and percentiles.
@cmd{FREQUENCIES} also support graphical output in the form of
histograms and pie charts. In the future, it will be able to produce
bar charts and output percentiles for grouped data.
The VARIABLES subcommand is the only required subcommand. Specify the
variables to be analyzed.
The FORMAT subcommand controls the output format. It has several
possible settings:
@itemize @bullet
@item
TABLE, the default, causes a frequency table to be output for every
variable specified. NOTABLE prevents them from being output. LIMIT
with a numeric argument causes them to be output except when there are
more than the specified number of values in the table.
@item
STANDARD frequency tables contain more complete information, but also to
take up more space on the printed page. CONDENSE frequency tables are
less informative but take up less space. ONEPAGE with a numeric
argument will output standard frequency tables if there are the
specified number of values or less, condensed tables otherwise. ONEPAGE
without an argument defaults to a threshold of 50 values.
@item
LABELS causes value labels to be displayed in STANDARD frequency
tables. NOLABLES prevents this.
@item
Normally frequency tables are sorted in ascending order by value. This
is AVALUE. DVALUE tables are sorted in descending order by value.
AFREQ and DFREQ tables are sorted in ascending and descending order,
respectively, by frequency count.
@item
SINGLE spaced frequency tables are closely spaced. DOUBLE spaced
frequency tables have wider spacing.
@item
OLDPAGE and NEWPAGE are not currently used.
@end itemize
The MISSING subcommand controls the handling of user-missing values.
When EXCLUDE, the default, is set, user-missing values are not included
in frequency tables or statistics. When INCLUDE is set, user-missing
are included. System-missing values are never included in statistics,
but are listed in frequency tables.
The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
(@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
value, and MODE, the mode. (If there are multiple modes, the smallest
value is reported.) By default, the mean, standard deviation of the
mean, minimum, and maximum are reported for each variable.
@cindex percentiles
PERCENTILES causes the specified percentiles to be reported.
The percentiles should be presented at a list of numbers between 0
and 100 inclusive.
The NTILES subcommand causes the percentiles to be reported at the
boundaries of the data set divided into the specified number of ranges.
For instance, @code{/NTILES=4} would cause quartiles to be reported.
The HISTOGRAM subcommand causes the output to include a histogram for
each specified variable. The X axis by default ranges from the
minimum to the maximum value observed in the data, but the MINIMUM and
MAXIMUM keywords can set an explicit range. The Y axis by default is
labeled in frequencies; use the PERCENT keyword to causes it to be
labeled in percent of the total observed count. Specify NORMAL to
superimpose a normal curve on the histogram.
The PIECHART adds a pie chart for each variable to the data. Each
slice represents one value, with the size of the slice proportional to
the value's frequency. By default, all non-missing values are given
slices. The MINIMUM and MAXIMUM keywords can be used to limit the
displayed slices to a given range of values. The MISSING keyword adds
slices for missing values.
@node EXAMINE
@comment node-name, next, previous, up
@section EXAMINE
@vindex EXAMINE
@cindex Normality, testing for
@display
EXAMINE
VARIABLES=var_list [BY factor_list ]
/STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
/PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
/CINTERVAL n
/COMPARE=@{GROUPS,VARIABLES@}
/ID=@{case_number, var_name@}
/@{TOTAL,NOTOTAL@}
/PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
/MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
[@{NOREPORT,REPORT@}]
@end display
The @cmd{EXAMINE} command is used to test how closely a distribution is to a
normal distribution. It also shows you outliers and extreme values.
The VARIABLES subcommand specifies the dependent variables and the
independent variable to use as factors for the analysis. Variables
listed before the first BY keyword are the dependent variables.
The dependent variables may optionally be followed by a list of
factors which tell PSPP how to break down the analysis for each
dependent variable. The format for each factor is
@display
var [BY var].
@end display
The STATISTICS subcommand specifies the analysis to be done.
DESCRIPTIVES will produce a table showing some parametric and
non-parametrics statistics. EXTREME produces a table showing extreme
values of the dependent variable. A number in parentheses determines
how many upper and lower extremes to show. The default number is 5.
The PLOT subcommand specifies which plots are to be produced if any.
The COMPARE subcommand is only relevant if producing boxplots, and it is only
useful there is more than one dependent variable and at least one factor. If
/COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
containing boxplots for all the factors.
If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
each containing one boxplot per dependent variable.
If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
/COMPARE=GROUPS.
The CINTERVAL subcommand specifies the confidence interval to use in
calculation of the descriptives command. The default it 95%.
@cindex percentiles
The PERCENTILES subcommand specifies which percentiles are to be calculated,
and which algorithm to use for calculating them. The default is to
calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
HAVERAGE algorithm.
The TOTAL and NOTOTAL subcommands are mutually exclusive. If NOTOTAL
is given and factors have been specified in the VARIABLES subcommand,
then then statistics for the unfactored dependent variables are
produced in addition to the factored variables. If there are no
factors specified then TOTAL and NOTOTAL have no effect.
@strong{Warning!}
If many dependent variable are given, or factors are given for which
there are many distinct values, then @cmd{EXAMINE} will produce a very
large quantity of output.
@node CROSSTABS
@section CROSSTABS
@vindex CROSSTABS
@display
CROSSTABS
/TABLES=var_list BY var_list [BY var_list]@dots{}
/MISSING=@{TABLE,INCLUDE,REPORT@}
/WRITE=@{NONE,CELLS,ALL@}
/FORMAT=@{TABLES,NOTABLES@}
@{LABELS,NOLABELS,NOVALLABS@}
@{PIVOT,NOPIVOT@}
@{AVALUE,DVALUE@}
@{NOINDEX,INDEX@}
@{BOX,NOBOX@}
/CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
ASRESIDUAL,ALL,NONE@}
/STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
KAPPA,ETA,CORR,ALL,NONE@}
(Integer mode.)
/VARIABLES=var_list (low,high)@dots{}
@end display
The @cmd{CROSSTABS} procedure displays crosstabulation
tables requested by the user. It can calculate several statistics for
each cell in the crosstabulation tables. In addition, a number of
statistics can be calculated for each table itself.
The TABLES subcommand is used to specify the tables to be reported. Any
number of dimensions is permitted, and any number of variables per
dimension is allowed. The TABLES subcommand may be repeated as many
times as needed. This is the only required subcommand in @dfn{general
mode}.
Occasionally, one may want to invoke a special mode called @dfn{integer
mode}. Normally, in general mode, PSPP automatically determines
what values occur in the data. In integer mode, the user specifies the
range of values that the data assumes. To invoke this mode, specify the
VARIABLES subcommand, giving a range of data values in parentheses for
each variable to be used on the TABLES subcommand. Data values inside
the range are truncated to the nearest integer, then assigned to that
value. If values occur outside this range, they are discarded. When it
is present, the VARIABLES subcommand must precede the TABLES
subcommand.
In general mode, numeric and string variables may be specified on
TABLES. Although long string variables are allowed, only their
initial short-string parts are used. In integer mode, only numeric
variables are allowed.
The MISSING subcommand determines the handling of user-missing values.
When set to TABLE, the default, missing values are dropped on a table by
table basis. When set to INCLUDE, user-missing values are included in
tables and statistics. When set to REPORT, which is allowed only in
integer mode, user-missing values are included in tables but marked with
an @samp{M} (for ``missing'') and excluded from statistical
calculations.
Currently the WRITE subcommand is ignored.
The FORMAT subcommand controls the characteristics of the
crosstabulation tables to be displayed. It has a number of possible
settings:
@itemize @bullet
@item
TABLES, the default, causes crosstabulation tables to be output.
NOTABLES suppresses them.
@item
LABELS, the default, allows variable labels and value labels to appear
in the output. NOLABELS suppresses them. NOVALLABS displays variable
labels but suppresses value labels.
@item
PIVOT, the default, causes each TABLES subcommand to be displayed in a
pivot table format. NOPIVOT causes the old-style crosstabulation format
to be used.
@item
AVALUE, the default, causes values to be sorted in ascending order.
DVALUE asserts a descending sort order.
@item
INDEX/NOINDEX is currently ignored.
@item
BOX/NOBOX is currently ignored.
@end itemize
The CELLS subcommand controls the contents of each cell in the displayed
crosstabulation table. The possible settings are:
@table @asis
@item COUNT
Frequency count.
@item ROW
Row percent.
@item COLUMN
Column percent.
@item TOTAL
Table percent.
@item EXPECTED
Expected value.
@item RESIDUAL
Residual.
@item SRESIDUAL
Standardized residual.
@item ASRESIDUAL
Adjusted standardized residual.
@item ALL
All of the above.
@item NONE
Suppress cells entirely.
@end table
@samp{/CELLS} without any settings specified requests COUNT, ROW,
COLUMN, and TOTAL. If CELLS is not specified at all then only COUNT
will be selected.
The STATISTICS subcommand selects statistics for computation:
@table @asis
@item CHISQ
@cindex chisquare
@cindex chi-square
Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
correction, linear-by-linear association.
@item PHI
Phi.
@item CC
Contingency coefficient.
@item LAMBDA
Lambda.
@item UC
Uncertainty coefficient.
@item BTAU
Tau-b.
@item CTAU
Tau-c.
@item RISK
Risk estimate.
@item GAMMA
Gamma.
@item D
Somers' D.
@item KAPPA
Cohen's Kappa.
@item ETA
Eta.
@item CORR
Spearman correlation, Pearson's r.
@item ALL
All of the above.
@item NONE
No statistics.
@end table
Selected statistics are only calculated when appropriate for the
statistic. Certain statistics require tables of a particular size, and
some statistics are calculated only in integer mode.
@samp{/STATISTICS} without any settings selects CHISQ. If the
STATISTICS subcommand is not given, no statistics are calculated.
@strong{Please note:} Currently the implementation of CROSSTABS has the
followings bugs:
@itemize @bullet
@item
Pearson's R (but not Spearman) is off a little.
@item
T values for Spearman's R and Pearson's R are wrong.
@item
Significance of symmetric and directional measures is not calculated.
@item
Asymmetric ASEs and T values for lambda are wrong.
@item
ASE of Goodman and Kruskal's tau is not calculated.
@item
ASE of symmetric somers' d is wrong.
@item
Approximate T of uncertainty coefficient is wrong.
@end itemize
Fixes for any of these deficiencies would be welcomed.
@node NPAR TESTS
@section NPAR TESTS
@vindex NPAR TESTS
@cindex nonparametric tests
@display
NPAR TESTS
nonparametric test subcommands
.
.
.
[ /STATISTICS=@{DESCRIPTIVES@} ]
[ /MISSING=@{ANALYSIS, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
@end display
NPAR TESTS performs nonparametric tests.
Non parametric tests make very few assumptions about the distribution of the
data.
One or more tests may be specified by using the corresponding subcommand.
If the /STATISTICS subcommand is also specified, then summary statistics are
produces for each variable that is the subject of any test.
@menu
* BINOMIAL:: Binomial Test
* CHISQUARE:: Chisquare Test
@end menu
@node BINOMIAL
@subsection Binomial test
@vindex BINOMIAL
@cindex binomial test
@display
[ /BINOMIAL[(p)]=var_list[(value1[, value2)] ] ]
@end display
The binomial test compares the observed distribution of a dichotomous
variable with that of a binomial distribution.
The variable @var{p} specifies the test proportion of the binomial
distribution.
The default value of 0.5 is assumed if @var{p} is omitted.
If a single value appears after the variable list, then that value is
used as the threshold to partition the observed values. Values less
than or equal to the threshold value form the first category. Values
greater than the threshold form the second category.
If two values appear after the variable list, then they will be used
as the values which a variable must take to be in the respective
category.
Cases for which a variable takes a value equal to neither of the specified
values, take no part in the test for that variable.
If no values appear, then the variable must assume dichotomous
values.
If more than two distinct, non-missing values for a variable
under test are encountered then an error occurs.
If the test proportion is equal to 0.5, then a two tailed test is
reported. For any other test proportion, a one tailed test is
reported.
For one tailed tests, if the test proportion is less than
or equal to the observed proportion, then the significance of
observing the observed proportion or more is reported.
If the test proportion is more than the observed proportion, then the
significance of observing the observed proportion or less is reported.
That is to say, the test is always performed in the observed
direction.
PSPP uses a very precise approximation to the gamma function to
compute the binomial significance. Thus, exact results are reported
even for very large sample sizes.
@node CHISQUARE
@subsection Chisquare test
@vindex CHISQUARE
@cindex chisquare test
@display
[ /CHISQUARE=var_list[(lo,hi)] [/EXPECTED=@{EQUAL|f1, f2 @dots{} fn@}] ]
@end display
The chisquare test produces a chi-square statistic for the differences
between the expected and observed frequencies of the categories of a variable.
Optionally, a range of values may appear after the variable list.
If a range is given, then non integer values are truncated, and values
outside the specified range are excluded from the analysis.
The /EXPECTED subcommand specifies the expected values of each
category.
There must be exactly one non-zero expected value, for each observed
category, or the EQUAL keywork must be specified.
You may use the notation @var{n}*@var{f} to specify @var{n}
consecutive expected categories all taking a frequency of @var{f}.
The frequencies given are proportions, not absolute frequencies. The
sum of the frequencies need not be 1.
If no /EXPECTED subcommand is given, then then equal frequencies
are expected.
@node T-TEST
@comment node-name, next, previous, up
@section T-TEST
@vindex T-TEST
@display
T-TEST
/MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
/CRITERIA=CIN(confidence)
(One Sample mode.)
TESTVAL=test_value
/VARIABLES=var_list
(Independent Samples mode.)
GROUPS=var(value1 [, value2])
/VARIABLES=var_list
(Paired Samples mode.)
PAIRS=var_list [WITH var_list [(PAIRED)] ]
@end display
The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
means.
It operates in one of three modes:
@itemize
@item One Sample mode.
@item Independent Groups mode.
@item Paired mode.
@end itemize
@noindent
Each of these modes are described in more detail below.
There are two optional subcommands which are common to all modes.
The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
in the tests. The default value is 0.95.
The @cmd{MISSING} subcommand determines the handling of missing
variables.
If INCLUDE is set, then user-missing values are included in the
calculations, but system-missing values are not.
If EXCLUDE is set, which is the default, user-missing
values are excluded as well as system-missing values.
This is the default.
If LISTWISE is set, then the entire case is excluded from analysis
whenever any variable specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
@cmd{/GROUPS} subcommands contains a missing value.
If ANALYSIS is set, then missing values are excluded only in the analysis for
which they would be needed. This is the default.
@menu
* One Sample Mode:: Testing against a hypothesised mean
* Independent Samples Mode:: Testing two independent groups for equal mean
* Paired Samples Mode:: Testing two interdependent groups for equal mean
@end menu
@node One Sample Mode
@subsection One Sample Mode
The @cmd{TESTVAL} subcommand invokes the One Sample mode.
This mode is used to test a population mean against a hypothesised
mean.
The value given to the @cmd{TESTVAL} subcommand is the value against
which you wish to test.
In this mode, you must also use the @cmd{/VARIABLES} subcommand to
tell PSPP which variables you wish to test.
@node Independent Samples Mode
@comment node-name, next, previous, up
@subsection Independent Samples Mode
The @cmd{GROUPS} subcommand invokes Independent Samples mode or
`Groups' mode.
This mode is used to test whether two groups of values have the
same population mean.
In this mode, you must also use the @cmd{/VARIABLES} subcommand to
tell PSPP the dependent variables you wish to test.
The variable given in the @cmd{GROUPS} subcommand is the independent
variable which determines to which group the samples belong.
The values in parentheses are the specific values of the independent
variable for each group.
If the parentheses are omitted and no values are given, the default values
of 1.0 and 2.0 are assumed.
If the independent variable is numeric,
it is acceptable to specify only one value inside the parentheses.
If you do this, cases where the independent variable is
greater than or equal to this value belong to the first group, and cases
less than this value belong to the second group.
When using this form of the @cmd{GROUPS} subcommand, missing values in
the independent variable are excluded on a listwise basis, regardless
of whether @cmd{/MISSING=LISTWISE} was specified.
@node Paired Samples Mode
@comment node-name, next, previous, up
@subsection Paired Samples Mode
The @cmd{PAIRS} subcommand introduces Paired Samples mode.
Use this mode when repeated measures have been taken from the same
samples.
If the @code{WITH} keyword is omitted, then tables for all
combinations of variables given in the @cmd{PAIRS} subcommand are
generated.
If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
is also given, then the number of variables preceding @code{WITH}
must be the same as the number following it.
In this case, tables for each respective pair of variables are
generated.
In the event that the @code{WITH} keyword is given, but the
@code{(PAIRED)} keyword is omitted, then tables for each combination
of variable preceding @code{WITH} against variable following
@code{WITH} are generated.
@node ONEWAY
@comment node-name, next, previous, up
@section ONEWAY
@vindex ONEWAY
@cindex analysis of variance
@cindex ANOVA
@display
ONEWAY
[/VARIABLES = ] var_list BY var
/MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
/CONTRAST= value1 [, value2] ... [,valueN]
/STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
@end display
The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
variables factored by a single independent variable.
It is used to compare the means of a population
divided into more than two groups.
The variables to be analysed should be given in the @code{VARIABLES}
subcommand.
The list of variables must be followed by the @code{BY} keyword and
the name of the independent (or factor) variable.
You can use the @code{STATISTICS} subcommand to tell PSPP to display
ancilliary information. The options accepted are:
@itemize
@item DESCRIPTIVES
Displays descriptive statistics about the groups factored by the independent
variable.
@item HOMOGENEITY
Displays the Levene test of Homogeneity of Variance for the
variables and their groups.
@end itemize
The @code{CONTRAST} subcommand is used when you anticipate certain
differences between the groups.
The subcommand must be followed by a list of numerals which are the
coefficients of the groups to be tested.
The number of coefficients must correspond to the number of distinct
groups (or values of the independent variable).
If the total sum of the coefficients are not zero, then PSPP will
display a warning, but will proceed with the analysis.
The @code{CONTRAST} subcommand may be given up to 10 times in order
to specify different contrast tests.
@setfilename ignored
@node RANK
@comment node-name, next, previous, up
@section RANK
@vindex RANK
@display
RANK
[VARIABLES=] var_list [@{A,D@}] [BY var_list]
/TIES=@{MEAN,LOW,HIGH,CONDENSE@}
/FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
/PRINT[=@{YES,NO@}
/MISSING=@{EXCLUDE,INCLUDE@}
/RANK [INTO var_list]
/NTILES(k) [INTO var_list]
/NORMAL [INTO var_list]
/PERCENT [INTO var_list]
/RFRACTION [INTO var_list]
/PROPORTION [INTO var_list]
/N [INTO var_list]
/SAVAGE [INTO var_list]
@end display
The @cmd{RANK} command ranks variables and stores the results into new
variables.
The VARIABLES subcommand, which is mandatory, specifies one or
more variables whose values are to be ranked.
After each variable, @samp{A} or @samp{D} may appear, indicating that
the variable is to be ranked in ascending or descending order.
Ascending is the default.
If a BY keyword appears, it should be followed by a list of variables
which are to serve as group variables.
In this case, the cases are gathered into groups, and ranks calculated
for each group.
The TIES subcommand specifies how tied values are to be treated. The
default is to take the mean value of all the tied cases.
The FRACTION subcommand specifies how proportional ranks are to be
calculated. This only has any effect if NORMAL or PROPORTIONAL rank
functions are requested.
The PRINT subcommand may be used to specify that a summary of the rank
variables created should appear in the output.
The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
PROPORTION and SAVAGE. Any number of function subcommands may appear.
If none are given, then the default is RANK.
The NTILES subcommand must take an integer specifying the number of
partitions into which values should be ranked.
Each subcommand may be followed by the INTO keyword and a list of
variables which are the variables to be created and receive the rank
scores. There may be as many variables specified as there are
variables named on the VARIABLES subcommand. If fewer are specified,
then the variable names are automatically created.
The MISSING subcommand determines how user missing values are to be
treated. A setting of EXCLUDE means that variables whose values are
user-missing are to be excluded from the rank scores. A setting of
INCLUDE means they are to be included. The default is EXCLUDE.
@include regression.texi
|