1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567
|
@c PSPP - a program for statistical analysis.
@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
@c A copy of the license is included in the section entitled "GNU
@c Free Documentation License".
@c
@node Language
@chapter The @pspp{} language
@cindex language, @pspp{}
@cindex @pspp{}, language
This chapter discusses elements common to many @pspp{} commands.
Later chapters describe individual commands in detail.
@menu
* Tokens:: Characters combine to form tokens.
* Commands:: Tokens combine to form commands.
* Syntax Variants:: Batch vs. Interactive mode
* Types of Commands:: Commands come in several flavors.
* Order of Commands:: Commands combine to form syntax files.
* Missing Observations:: Handling missing observations.
* Datasets:: Data organization.
* Files:: Files used by @pspp{}.
* File Handles:: How files are named.
* BNF:: How command syntax is described.
@end menu
@node Tokens
@section Tokens
@cindex language, lexical analysis
@cindex language, tokens
@cindex tokens
@cindex lexical analysis
@pspp{} divides most syntax file lines into series of short chunks
called @dfn{tokens}.
Tokens are then grouped to form commands, each of which tells
@pspp{} to take some action---read in data, write out data, perform
a statistical procedure, etc. Each type of token is
described below.
@table @strong
@cindex identifiers
@item Identifiers
Identifiers are names that typically specify variables, commands, or
subcommands. The first character in an identifier must be a letter,
@samp{#}, or @samp{@@}. The remaining characters in the identifier
must be letters, digits, or one of the following special characters:
@example
@center @. _ $ # @@
@end example
@cindex case-sensitivity
Identifiers may be any length, but only the first 64 bytes are
significant. Identifiers are not case-sensitive: @code{foobar},
@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
different representations of the same identifier.
@cindex identifiers, reserved
@cindex reserved identifiers
Some identifiers are reserved. Reserved identifiers may not be used
in any context besides those explicitly described in this manual. The
reserved identifiers are:
@example
@center ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
@end example
@item Keywords
Keywords are a subclass of identifiers that form a fixed part of
command syntax. For example, command and subcommand names are
keywords. Keywords may be abbreviated to their first 3 characters if
this abbreviation is unambiguous. (Unique abbreviations of 3 or more
characters are also accepted: @samp{FRE}, @samp{FREQ}, and
@samp{FREQUENCIES} are equivalent when the last is a keyword.)
Reserved identifiers are always used as keywords. Other identifiers
may be used both as keywords and as user-defined identifiers, such as
variable names.
@item Numbers
@cindex numbers
@cindex integers
@cindex reals
Numbers are expressed in decimal. A decimal point is optional.
Numbers may be expressed in scientific notation by adding @samp{e} and
a base-10 exponent, so that @samp{1.234e3} has the value 1234. Here
are some more examples of valid numbers:
@example
-5 3.14159265359 1e100 -.707 8945.
@end example
Negative numbers are expressed with a @samp{-} prefix. However, in
situations where a literal @samp{-} token is expected, what appears to
be a negative number is treated as @samp{-} followed by a positive
number.
No white space is allowed within a number token, except for horizontal
white space between @samp{-} and the rest of the number.
The last example above, @samp{8945.} is interpreted as two
tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
@xref{Commands, , Forming commands of tokens}.
@item Strings
@cindex strings
@cindex @samp{'}
@cindex @samp{"}
@cindex case-sensitivity
Strings are literal sequences of characters enclosed in pairs of
single quotes (@samp{'}) or double quotes (@samp{"}). To include the
character used for quoting in the string, double it, @i{e.g.}@:
@samp{'it''s an apostrophe'}. White space and case of letters are
significant inside strings.
Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
'c'} is equivalent to @samp{'abc'}. So that a long string may be
broken across lines, a line break may precede or follow, or both
precede and follow, the @samp{+}. (However, an entirely blank line
preceding or following the @samp{+} is interpreted as ending the
current command.)
Strings may also be expressed as hexadecimal character values by
prefixing the initial quote character by @samp{x} or @samp{X}.
Regardless of the syntax file or active dataset's encoding, the
hexadecimal digits in the string are interpreted as Unicode characters
in UTF-8 encoding.
Individual Unicode code points may also be expressed by specifying the
hexadecimal code point number in single or double quotes preceded by
@samp{u} or @samp{U}. For example, Unicode code point U+1D11E, the
musical G clef character, could be expressed as @code{U'1D11E'}.
Invalid Unicode code points (above U+10FFFF or in between U+D800 and
U+DFFF) are not allowed.
When strings are concatenated with @samp{+}, each segment's prefix is
considered individually. For example, @code{'The G clef symbol is:' +
u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
plain text string.
@item Punctuators and Operators
@cindex punctuators
@cindex operators
These tokens are the punctuators and operators:
@example
@center , / = ( ) + - * / ** < <= <> > >= ~= & | .
@end example
Most of these appear within the syntax of commands, but the period
(@samp{.}) punctuator is used only at the end of a command. It is a
punctuator only as the last character on a line (except white space).
When it is the last non-space character on a line, a period is not
treated as part of another token, even if it would otherwise be part
of, @i{e.g.}@:, an identifier or a floating-point number.
@end table
@node Commands
@section Forming commands of tokens
@cindex @pspp{}, command structure
@cindex language, command structure
@cindex commands, structure
Most @pspp{} commands share a common structure. A command begins with a
command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
CASES}. The command name may be abbreviated to its first word, and
each word in the command name may be abbreviated to its first three
or more characters, where these abbreviations are unambiguous.
The command name may be followed by one or more @dfn{subcommands}.
Each subcommand begins with a subcommand name, which may be
abbreviated to its first three letters. Some subcommands accept a
series of one or more specifications, which follow the subcommand
name, optionally separated from it by an equals sign
(@samp{=}). Specifications may be separated from each other
by commas or spaces. Each subcommand must be separated from the next (if any)
by a forward slash (@samp{/}).
There are multiple ways to mark the end of a command. The most common
way is to end the last line of the command with a period (@samp{.}) as
described in the previous section (@pxref{Tokens}). A blank line, or
one that consists only of white space or comments, also ends a command.
@node Syntax Variants
@section Syntax Variants
@cindex Batch syntax
@cindex Interactive syntax
There are three variants of command syntax, which vary only in how
they detect the end of one command and the start of the next.
In @dfn{interactive mode}, which is the default for syntax typed at a
command prompt, a period as the last non-blank character on a line
ends a command. A blank line also ends a command.
In @dfn{batch mode}, an end-of-line period or a blank line also ends a
command. Additionally, it treats any line that has a non-blank
character in the leftmost column as beginning a new command. Thus, in
batch mode the second and subsequent lines in a command must be
indented.
Regardless of the syntax mode, a plus sign, minus sign, or period in
the leftmost column of a line is ignored and causes that line to begin
a new command. This is most useful in batch mode, in which the first
line of a new command could not otherwise be indented, but it is
accepted regardless of syntax mode.
The default mode for reading commands from a file is @dfn{auto mode}.
It is the same as batch mode, except that a line with a non-blank in
the leftmost column only starts a new command if that line begins with
the name of a @pspp{} command. This correctly interprets most valid @pspp{}
syntax files regardless of the syntax mode for which they are
intended.
The @option{--interactive} (or @option{-i}) or @option{--batch} (or
@option{-b}) options set the syntax mode for files listed on the @pspp{}
command line. @xref{Main Options}, for more details.
@node Types of Commands
@section Types of Commands
Commands in @pspp{} are divided roughly into six categories:
@table @strong
@item Utility commands
@cindex utility commands
Set or display various global options that affect @pspp{} operations.
May appear anywhere in a syntax file. @xref{Utilities, , Utility
commands}.
@item File definition commands
@cindex file definition commands
Give instructions for reading data from text files or from special
binary ``system files''. Most of these commands replace any previous
data or variables with new data or
variables. At least one file definition command must appear before the first command in any of
the categories below. @xref{Data Input and Output}.
@item Input program commands
@cindex input program commands
Though rarely used, these provide tools for reading data files
in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
@item Transformations
@cindex transformations
Perform operations on data and write data to output files. Transformations
are not carried out until a procedure is executed.
@item Restricted transformations
@cindex restricted transformations
Transformations that cannot appear in certain contexts. @xref{Order
of Commands}, for details.
@item Procedures
@cindex procedures
Analyze data, writing results of analyses to the listing file. Cause
transformations specified earlier in the file to be performed. In a
more general sense, a @dfn{procedure} is any command that causes the
active dataset (the data) to be read.
@end table
@node Order of Commands
@section Order of Commands
@cindex commands, ordering
@cindex order of commands
@pspp{} does not place many restrictions on ordering of commands. The
main restriction is that variables must be defined before they are otherwise
referenced. This section describes the details of command ordering,
but most users will have no need to refer to them.
@pspp{} possesses five internal states, called @dfn{initial}, @dfn{input-program}
@dfn{file-type}, @dfn{transformation}, and @dfn{procedure} states. (Please note the
distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
@emph{commands} and the @dfn{input-program} and @dfn{file-type} @emph{states}.)
@pspp{} starts in the initial state. Each successful completion
of a command may cause a state transition. Each type of command has its
own rules for state transitions:
@table @strong
@item Utility commands
@itemize @bullet
@item
Valid in any state.
@item
Do not cause state transitions. Exception: when @cmd{N OF CASES}
is executed in the procedure state, it causes a transition to the
transformation state.
@end itemize
@item @cmd{DATA LIST}
@itemize @bullet
@item
Valid in any state.
@item
When executed in the initial or procedure state, causes a transition to
the transformation state.
@item
Clears the active dataset if executed in the procedure or transformation
state.
@end itemize
@item @cmd{INPUT PROGRAM}
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Causes a transition to the intput-program state.
@item
Clears the active dataset.
@end itemize
@item @cmd{FILE TYPE}
@itemize @bullet
@item
Invalid in intput-program and file-type states.
@item
Causes a transition to the file-type state.
@item
Clears the active dataset.
@end itemize
@item Other file definition commands
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Cause a transition to the transformation state.
@item
Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
and @cmd{UPDATE}.
@end itemize
@item Transformations
@itemize @bullet
@item
Invalid in initial and file-type states.
@item
Cause a transition to the transformation state.
@end itemize
@item Restricted transformations
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the transformation state.
@end itemize
@item Procedures
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the procedure state.
@end itemize
@end table
@node Missing Observations
@section Handling missing observations
@cindex missing values
@cindex values, missing
@pspp{} includes special support for unknown numeric data values.
Missing observations are assigned a special value, called the
@dfn{system-missing value}. This ``value'' actually indicates the
absence of a value; it means that the actual value is unknown. Procedures
automatically exclude from analyses those observations or cases that
have missing values. Details of missing value exclusion depend on the
procedure and can often be controlled by the user; refer to
descriptions of individual procedures for details.
The system-missing value exists only for numeric variables. String
variables always have a defined value, even if it is only a string of
spaces.
Variables, whether numeric or string, can have designated
@dfn{user-missing values}. Every user-missing value is an actual value
for that variable. However, most of the time user-missing values are
treated in the same way as the system-missing value.
For more information on missing values, see the following sections:
@ref{Datasets}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
documentation on individual procedures for information on how they
handle missing values.
@node Datasets
@section Datasets
@cindex dataset
@cindex variable
@cindex dictionary
@pspp{} works with data organized into @dfn{datasets}. A dataset
consists of a set of @dfn{variables}, which taken together are said to
form a @dfn{dictionary}, and one or more @dfn{cases}, each of which
has one value for each variable.
At any given time @pspp{} has exactly one distinguished dataset, called
the @dfn{active dataset}. Most @pspp{} commands work only with the
active dataset. In addition to the active dataset, @pspp{} also supports
any number of additional open datasets. The @cmd{DATASET} commands
can choose a new active dataset from among those that are open, as
well as create and destroy datasets (@pxref{DATASET}).
The sections below describe variables in more detail.
@menu
* Attributes:: Attributes of variables.
* System Variables:: Variables automatically defined by @pspp{}.
* Sets of Variables:: Lists of variable names.
* Input and Output Formats:: Input and output formats.
* Scratch Variables:: Variables deleted by procedures.
@end menu
@node Attributes
@subsection Attributes of Variables
@cindex variables, attributes of
@cindex attributes of variables
Each variable has a number of attributes, including:
@table @strong
@item Name
An identifier, up to 64 bytes long. Each variable must have a different name.
@xref{Tokens}.
Some system variable names begin with @samp{$}, but user-defined
variables' names may not begin with @samp{$}.
@cindex @samp{.}
@cindex period
@cindex variable names, ending with period
The final character in a variable name should not be @samp{.}, because
such an identifier will be misinterpreted when it is the final token
on a line: @code{FOO.} is divided into two separate tokens,
@samp{FOO} and @samp{.}, indicating end-of-command. @xref{Tokens}.
@cindex @samp{_}
The final character in a variable name should not be @samp{_}, because
some such identifiers are used for special purposes by @pspp{}
procedures.
As with all @pspp{} identifiers, variable names are not case-sensitive.
@pspp{} capitalizes variable names on output the same way they were
capitalized at their point of definition in the input.
@cindex variables, type
@cindex type of variables
@item Type
Numeric or string.
@cindex variables, width
@cindex width of variables
@item Width
(string variables only) String variables with a width of 8 characters or
fewer are called @dfn{short string variables}. Short string variables
may be used in a few contexts where @dfn{long string variables} (those
with widths greater than 8) are not allowed.
@item Position
Variables in the dictionary are arranged in a specific order.
@cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
@item Initialization
Either reinitialized to 0 or spaces for each case, or left at its
existing value. @xref{LEAVE}.
@cindex missing values
@cindex values, missing
@item Missing values
Optionally, up to three values, or a range of values, or a specific
value plus a range, can be specified as @dfn{user-missing values}.
There is also a @dfn{system-missing value} that is assigned to an
observation when there is no other obvious value for that observation.
Observations with missing values are automatically excluded from
analyses. User-missing values are actual data values, while the
system-missing value is not a value at all. @xref{Missing Observations}.
@cindex variable labels
@cindex labels, variable
@item Variable label
A string that describes the variable. @xref{VARIABLE LABELS}.
@cindex value labels
@cindex labels, value
@item Value label
Optionally, these associate each possible value of the variable with a
string. @xref{VALUE LABELS}.
@cindex print format
@item Print format
Display width, format, and (for numeric variables) number of decimal
places. This attribute does not affect how data are stored, just how
they are displayed. Example: a width of 8, with 2 decimal places.
@xref{Input and Output Formats}.
@cindex write format
@item Write format
Similar to print format, but used by the @cmd{WRITE} command
(@pxref{WRITE}).
@cindex measurement level
@item Measurement level
@anchor{Measurement Level}
One of the following:
@table @asis
@item Nominal
Each value of a nominal variable represents a distinct category. The
possible categories are finite and often have value labels. The order
of categories is not significant. Political parties, US states, and
yes/no choices are nominal. Numeric and string variables can be
nominal.
@item Ordinal
Ordinal variables also represent distinct categories, but their values
are arranged according to some natural order. Likert scales, e.g.@:
from strongly disagree to strongly agree, are ordinal. Data grouped
into ranges, e.g.@: age groups or income groups, are ordinal. Both
numeric and string variables can be ordinal. String values are
ordered alphabetically, so letter grades from A to F will work as
expected, but @code{poor}, @code{satisfactory}, @code{excellent} will
not.
@item Scale
Scale variables are ones for which differences and ratios are
meaningful. These are often values which have a natural unit
attached, such as age in years, income in dollars, or distance in
miles. Only numeric variables are scalar.
@end table
Variables created by @cmd{COMPUTE} and similar transformations,
obtained from external sources, etc., initially have an unknown
measurement level. Any procedure that reads the data will then assign
a default measurement level. @pspp{} can assign some defaults without
reading the data:
@itemize @bullet
@item
Nominal, if it's a string variable.
@item
Nominal, if the variable has a WKDAY or MONTH print format.
@item
Scale, if the variable has a DOLLAR, CCA through CCE, or time or date
print format.
@end itemize
Otherwise, @pspp{} reads the data and decides based on its
distribution:
@itemize @bullet
@item
Nominal, if all observations are missing.
@item
Scale, if one or more valid observations are noninteger or negative.
@item
Scale, if no valid observation is less than 10.
@item
Scale, if the variable has 24 or more unique valid values. The value
24 is the default and can be adjusted (@pxref{SET SCALEMIN}).
@end itemize
Finally, if none of the above is true, @pspp{} assigns the variable a
nominal measurement level.
@cindex custom attributes
@item Custom attributes
User-defined associations between names and values. @xref{VARIABLE
ATTRIBUTE}.
@cindex variable role
@item Role
The intended role of a variable for use in dialog boxes in graphical
user interfaces. @xref{VARIABLE ROLE}.
@end table
@node System Variables
@subsection Variables Automatically Defined by @pspp{}
@cindex system variables
@cindex variables, system
There are seven system variables. These are not like ordinary
variables because system variables are not always stored. They can be used only
in expressions. These system variables, whose values and output formats
cannot be modified, are described below.
@table @code
@cindex @code{$CASENUM}
@item $CASENUM
Case number of the case at the moment. This changes as cases are
shuffled around.
@cindex @code{$DATE}
@item $DATE
Date the @pspp{} process was started, in format A9, following the
pattern @code{DD-MMM-YY}.
@cindex @code{$DATE11}
@item $DATE11
Date the @pspp{} process was started, in format A11, following the
pattern @code{DD-MMM-YYYY}.
@cindex @code{$JDATE}
@item $JDATE
Number of days between 15 Oct 1582 and the time the @pspp{} process
was started.
@cindex @code{$LENGTH}
@item $LENGTH
Page length, in lines, in format F11.
@cindex @code{$SYSMIS}
@item $SYSMIS
System missing value, in format F1.
@cindex @code{$TIME}
@item $TIME
Number of seconds between midnight 14 Oct 1582 and the time the active dataset
was read, in format F20.
@cindex @code{$WIDTH}
@item $WIDTH
Page width, in characters, in format F3.
@end table
@node Sets of Variables
@subsection Lists of variable names
@cindex @code{TO} convention
@cindex convention, @code{TO}
To refer to a set of variables, list their names one after another.
Optionally, their names may be separated by commas. To include a
range of variables from the dictionary in the list, write the name of
the first and last variable in the range, separated by @code{TO}. For
instance, if the dictionary contains six variables with the names
@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
variables @code{X2}, @code{GOAL}, and @code{MET}.
Commands that define variables, such as @cmd{DATA LIST}, give
@code{TO} an alternate meaning. With these commands, @code{TO} define
sequences of variables whose names end in consecutive integers. The
syntax is two identifiers that begin with the same root and end with
numbers, separated by @code{TO}. The syntax @code{X1 TO X5} defines 5
variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
@code{X5}. The syntax @code{ITEM0008 TO ITEM0013} defines 6
variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}. The syntaxes
@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
After a set of variables has been defined with @cmd{DATA LIST} or
another command with this method, the same set can be referenced on
later commands using the same syntax.
@node Input and Output Formats
@subsection Input and Output Formats
@cindex formats
An @dfn{input format} describes how to interpret the contents of an
input field as a number or a string. It might specify that the field
contains an ordinary decimal number, a time or date, a number in binary
or hexadecimal notation, or one of several other notations. Input
formats are used by commands such as @cmd{DATA LIST} that read data or
syntax files into the @pspp{} active dataset.
Every input format corresponds to a default @dfn{output format} that
specifies the formatting used when the value is output later. It is
always possible to explicitly specify an output format that resembles
the input format. Usually, this is the default, but in cases where the
input format is unfriendly to human readability, such as binary or
hexadecimal formats, the default output format is an easier-to-read
decimal format.
Every variable has two output formats, called its @dfn{print format} and
@dfn{write format}. Print formats are used in most output contexts;
write formats are used only by @cmd{WRITE} (@pxref{WRITE}). Newly
created variables have identical print and write formats, and
@cmd{FORMATS}, the most commonly used command for changing formats
(@pxref{FORMATS}), sets both of them to the same value as well. Thus,
most of the time, the distinction between print and write formats is
unimportant.
Input and output formats are specified to @pspp{} with
a @dfn{format specification} of the
form @subcmd{@var{TYPE}@var{w}} or @code{TYPE@var{w}.@var{d}}, where
@var{TYPE} is one of the format types described later, @var{w} is a
field width measured in columns, and @var{d} is an optional number of
decimal places. If @var{d} is omitted, a value of 0 is assumed. Some
formats do not allow a nonzero @var{d} to be specified.
The following sections describe the input and output formats supported
by @pspp{}.
@menu
* Basic Numeric Formats::
* Custom Currency Formats::
* Legacy Numeric Formats::
* Binary and Hexadecimal Numeric Formats::
* Time and Date Formats::
* Date Component Formats::
* String Formats::
@end menu
@node Basic Numeric Formats
@subsubsection Basic Numeric Formats
@cindex numeric formats
The basic numeric formats are used for input and output of real numbers
in standard or scientific notation. The following table shows an
example of how each format displays positive and negative numbers with
the default decimal point setting:
@float
@multitable {DOLLAR10.2} {@code{@tie{}$3,141.59}} {@code{-$3,141.59}}
@headitem Format @tab @code{@tie{}3141.59} @tab @code{-3141.59}
@item F8.2 @tab @code{@tie{}3141.59} @tab @code{-3141.59}
@item COMMA9.2 @tab @code{@tie{}3,141.59} @tab @code{-3,141.59}
@item DOT9.2 @tab @code{@tie{}3.141,59} @tab @code{-3.141,59}
@item DOLLAR10.2 @tab @code{@tie{}$3,141.59} @tab @code{-$3,141.59}
@item PCT9.2 @tab @code{@tie{}3141.59%} @tab @code{-3141.59%}
@item E8.1 @tab @code{@tie{}3.1E+003} @tab @code{-3.1E+003}
@end multitable
@end float
On output, numbers in F format are expressed in standard decimal
notation with the requested number of decimal places. The other formats
output some variation on this style:
@itemize @bullet
@item
Numbers in COMMA format are additionally grouped every three digits by
inserting a grouping character. The grouping character is ordinarily a
comma, but it can be changed to a period (@pxref{SET DECIMAL}).
@item
DOT format is like COMMA format, but it interchanges the role of the
decimal point and grouping characters. That is, the current grouping
character is used as a decimal point and vice versa.
@item
DOLLAR format is like COMMA format, but it prefixes the number with
@samp{$}.
@item
PCT format is like F format, but adds @samp{%} after the number.
@item
The E format always produces output in scientific notation.
@end itemize
On input, the basic numeric formats accept positive and numbers in
standard decimal notation or scientific notation. Leading and trailing
spaces are allowed. An empty or all-spaces field, or one that contains
only a single period, is treated as the system missing value.
In scientific notation, the exponent may be introduced by a sign
(@samp{+} or @samp{-}), or by one of the letters @samp{e} or @samp{d}
(in uppercase or lowercase), or by a letter followed by a sign. A
single space may follow the letter or the sign or both.
On fixed-format @cmd{DATA LIST} (@pxref{DATA LIST FIXED}) and in a few
other contexts, decimals are implied when the field does not contain a
decimal point. In F6.5 format, for example, the field @code{314159} is
taken as the value 3.14159 with implied decimals. Decimals are never
implied if an explicit decimal point is present or if scientific
notation is used.
E and F formats accept the basic syntax already described. The other
formats allow some additional variations:
@itemize @bullet
@item
COMMA, DOLLAR, and DOT formats ignore grouping characters within the
integer part of the input field. The identity of the grouping
character depends on the format.
@item
DOLLAR format allows a dollar sign to precede the number. In a negative
number, the dollar sign may precede or follow the minus sign.
@item
PCT format allows a percent sign to follow the number.
@end itemize
All of the basic number formats have a maximum field width of 40 and
accept no more than 16 decimal places, on both input and output. Some
additional restrictions apply:
@itemize @bullet
@item
As input formats, the basic numeric formats allow no more decimal places
than the field width. As output formats, the field width must be
greater than the number of decimal places; that is, large enough to
allow for a decimal point and the number of requested decimal places.
DOLLAR and PCT formats must allow an additional column for @samp{$} or
@samp{%}.
@item
The default output format for a given input format increases the field
width enough to make room for optional input characters. If an input
format calls for decimal places, the width is increased by 1 to make
room for an implied decimal point. COMMA, DOT, and DOLLAR formats also
increase the output width to make room for grouping characters. DOLLAR
and PCT further increase the output field width by 1 to make room for
@samp{$} or @samp{%}. The increased output width is capped at 40, the
maximum field width.
@item
The E format is exceptional. For output, E format has a minimum width
of 7 plus the number of decimal places. The default output format for
an E input format is an E format with at least 3 decimal places and
thus a minimum width of 10.
@end itemize
More details of basic numeric output formatting are given below:
@itemize @bullet
@item
Output rounds to nearest, with ties rounded away from zero. Thus, 2.5
is output as @code{3} in F1.0 format, and -1.125 as @code{-1.13} in F5.1
format.
@item
The system-missing value is output as a period in a field of spaces,
placed in the decimal point's position, or in the rightmost column if no
decimal places are requested. A period is used even if the decimal
point character is a comma.
@item
A number that does not fill its field is right-justified within the
field.
@item
A number is too large for its field causes decimal places to be dropped
to make room. If dropping decimals does not make enough room,
scientific notation is used if the field is wide enough. If a number
does not fit in the field, even in scientific notation, the overflow is
indicated by filling the field with asterisks (@samp{*}).
@item
COMMA, DOT, and DOLLAR formats insert grouping characters only if space
is available for all of them. Grouping characters are never inserted
when all decimal places must be dropped. Thus, 1234.56 in COMMA5.2
format is output as @samp{@tie{}1235} without a comma, even though there
is room for one, because all decimal places were dropped.
@item
DOLLAR or PCT format drop the @samp{$} or @samp{%} only if the number
would not fit at all without it. Scientific notation with @samp{$} or
@samp{%} is preferred to ordinary decimal notation without it.
@item
Except in scientific notation, a decimal point is included only when
it is followed by a digit. If the integer part of the number being
output is 0, and a decimal point is included, then @pspp{} ordinarily
drops the zero before the decimal point. However, in @code{F},
@code{COMMA}, or @code{DOT} formats, @pspp{} keeps the zero if
@code{SET LEADZERO} is set to @code{ON} (@pxref{SET LEADZERO}).
In scientific notation, the number always includes a decimal point,
even if it is not followed by a digit.
@item
A negative number includes a minus sign only in the presence of a
nonzero digit: -0.01 is output as @samp{-.01} in F4.2 format but as
@samp{@tie{}@tie{}.0} in F4.1 format. Thus, a ``negative zero'' never
includes a minus sign.
@item
In negative numbers output in DOLLAR format, the dollar sign follows the
negative sign. Thus, -9.99 in DOLLAR6.2 format is output as
@code{-$9.99}.
@item
In scientific notation, the exponent is output as @samp{E} followed by
@samp{+} or @samp{-} and exactly three digits. Numbers with magnitude
less than 10**-999 or larger than 10**999 are not supported by most
computers, but if they are supported then their output is considered
to overflow the field and they are output as asterisks.
@item
On most computers, no more than 15 decimal digits are significant in
output, even if more are printed. In any case, output precision cannot
be any higher than input precision; few data sets are accurate to 15
digits of precision. Unavoidable loss of precision in intermediate
calculations may also reduce precision of output.
@item
Special values such as infinities and ``not a number'' values are
usually converted to the system-missing value before printing. In a few
circumstances, these values are output directly. In fields of width 3
or greater, special values are output as however many characters
fit from @code{+Infinity} or @code{-Infinity} for infinities, from
@code{NaN} for ``not a number,'' or from @code{Unknown} for other values
(if any are supported by the system). In fields under 3 columns wide,
special values are output as asterisks.
@end itemize
@node Custom Currency Formats
@subsubsection Custom Currency Formats
@cindex currency formats
The custom currency formats are closely related to the basic numeric
formats, but they allow users to customize the output format. The
SET command configures custom currency formats, using the syntax
@display
SET CC@var{x}=@t{"}@var{string}@t{"}.
@end display
@noindent
where @var{x} is A, B, C, D, or E, and @var{string} is no more than 16
characters long.
@var{string} must contain exactly three commas or exactly three periods
(but not both), except that a single quote character may be used to
``escape'' a following comma, period, or single quote. If three commas
are used, commas are used for grouping in output, and a period
is used as the decimal point. Uses of periods reverses these roles.
The commas or periods divide @var{string} into four fields, called the
@dfn{negative prefix}, @dfn{prefix}, @dfn{suffix}, and @dfn{negative
suffix}, respectively. The prefix and suffix are added to output
whenever space is available. The negative prefix and negative suffix
are always added to a negative number when the output includes a nonzero
digit.
The following syntax shows how custom currency formats could be used to
reproduce basic numeric formats:
@example
@group
SET CCA="-,,,". /* Same as COMMA.
SET CCB="-...". /* Same as DOT.
SET CCC="-,$,,". /* Same as DOLLAR.
SET CCD="-,,%,". /* Like PCT, but groups with commas.
@end group
@end example
Here are some more examples of custom currency formats. The final
example shows how to use a single quote to escape a delimiter:
@example
@group
SET CCA=",EUR,,-". /* Euro.
SET CCB="(,USD ,,)". /* US dollar.
SET CCC="-.R$..". /* Brazilian real.
SET CCD="-,, NIS,". /* Israel shekel.
SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
@end group
@end example
@noindent These formats would yield the following output:
@float
@multitable {CCD13.2} {@code{@tie{}@tie{}USD 3,145.59}} {@code{(USD 3,145.59)}}
@headitem Format @tab @code{@tie{}3145.59} @tab @code{-3145.59}
@item CCA12.2 @tab @code{@tie{}EUR3,145.59} @tab @code{EUR3,145.59-}
@item CCB14.2 @tab @code{@tie{}@tie{}USD 3,145.59} @tab @code{(USD 3,145.59)}
@item CCC11.2 @tab @code{@tie{}R$3.145,59} @tab @code{-R$3.145,59}
@item CCD13.2 @tab @code{@tie{}3,145.59 NIS} @tab @code{-3,145.59 NIS}
@item CCE10.0 @tab @code{@tie{}Rp. 3.146} @tab @code{-Rp. 3.146}
@end multitable
@end float
The default for all the custom currency formats is @samp{-,,,},
equivalent to COMMA format.
@node Legacy Numeric Formats
@subsubsection Legacy Numeric Formats
The N and Z numeric formats provide compatibility with legacy file
formats. They have much in common:
@itemize @bullet
@item
Output is rounded to the nearest representable value, with ties rounded
away from zero.
@item
Numbers too large to display are output as a field filled with asterisks
(@samp{*}).
@item
The decimal point is always implicitly the specified number of digits
from the right edge of the field, except that Z format input allows an
explicit decimal point.
@item
Scientific notation may not be used.
@item
The system-missing value is output as a period in a field of spaces.
The period is placed just to the right of the implied decimal point in
Z format, or at the right end in N format or in Z format if no decimal
places are requested. A period is used even if the decimal point
character is a comma.
@item
Field width may range from 1 to 40. Decimal places may range from 0 up
to the field width, to a maximum of 16.
@item
When a legacy numeric format used for input is converted to an output
format, it is changed into the equivalent F format. The field width is
increased by 1 if any decimal places are specified, to make room for a
decimal point. For Z format, the field width is increased by 1 more
column, to make room for a negative sign. The output field width is
capped at 40 columns.
@end itemize
@subsubheading N Format
The N format supports input and output of fields that contain only
digits. On input, leading or trailing spaces, a decimal point, or any
other non-digit character causes the field to be read as the
system-missing value. As a special exception, an N format used on
@cmd{DATA LIST FREE} or @cmd{DATA LIST LIST} is treated as the
equivalent F format.
On output, N pads the field on the left with zeros. Negative numbers
are output like the system-missing value.
@subsubheading Z Format
The Z format is a ``zoned decimal'' format used on IBM mainframes. Z
format encodes the sign as part of the final digit, which must be one of
the following:
@example
0123456789
@{ABCDEFGHI
@}JKLMNOPQR
@end example
@noindent
where the characters in each row represent digits 0 through 9 in order.
Characters in the first two rows indicate a positive sign; those in the
third indicate a negative sign.
On output, Z fields are padded on the left with spaces. On input,
leading and trailing spaces are ignored. Any character in an input
field other than spaces, the digit characters above, and @samp{.} causes
the field to be read as system-missing.
The decimal point character for input and output is always @samp{.},
even if the decimal point character is a comma (@pxref{SET DECIMAL}).
Nonzero, negative values output in Z format are marked as negative even
when no nonzero digits are output. For example, -0.2 is output in Z1.0
format as @samp{J}. The ``negative zero'' value supported by most
machines is output as positive.
@node Binary and Hexadecimal Numeric Formats
@subsubsection Binary and Hexadecimal Numeric Formats
@cindex binary formats
@cindex hexadecimal formats
The binary and hexadecimal formats are primarily designed for
compatibility with existing machine formats, not for human readability.
All of them therefore have a F format as default output format. Some of
these formats are only portable between machines with compatible byte
ordering (endianness) or floating-point format.
Binary formats use byte values that in text files are interpreted as
special control functions, such as carriage return and line feed. Thus,
data in binary formats should not be included in syntax files or read
from data files with variable-length records, such as ordinary text
files. They may be read from or written to data files with fixed-length
records. @xref{FILE HANDLE}, for information on working with
fixed-length records.
@subsubheading P and PK Formats
These are binary-coded decimal formats, in which every byte (except the
last, in P format) represents two decimal digits. The most-significant
4 bits of the first byte is the most-significant decimal digit, the
least-significant 4 bits of the first byte is the next decimal digit,
and so on.
In P format, the most-significant 4 bits of the last byte are the
least-significant decimal digit. The least-significant 4 bits represent
the sign: decimal 15 indicates a negative value, decimal 13 indicates a
positive value.
Numbers are rounded downward on output. The system-missing value and
numbers outside representable range are output as zero.
The maximum field width is 16. Decimal places may range from 0 up to
the number of decimal digits represented by the field.
The default output format is an F format with twice the input field
width, plus one column for a decimal point (if decimal places were
requested).
@subsubheading IB and PIB Formats
These are integer binary formats. IB reads and writes 2's complement
binary integers, and PIB reads and writes unsigned binary integers. The
byte ordering is by default the host machine's, but SET RIB may be used
to select a specific byte ordering for reading (@pxref{SET RIB}) and
SET WIB, similarly, for writing (@pxref{SET WIB}).
The maximum field width is 8. Decimal places may range from 0 up to the
number of decimal digits in the largest value representable in the field
width.
The default output format is an F format whose width is the number of
decimal digits in the largest value representable in the field width,
plus 1 if the format has decimal places.
@subsubheading RB Format
This is a binary format for real numbers. By default it reads and
writes the host machine's floating-point format, but SET RRB may be
used to select an alternate floating-point format for reading
(@pxref{SET RRB}) and SET WRB, similarly, for writing (@pxref{SET
WRB}).
The recommended field width depends on the floating-point format.
NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use
a field width of 8. ISL, ISB, VF, and ZS formats should use a field
width of 4. Other field widths do not produce useful results. The
maximum field width is 8. No decimal places may be specified.
The default output format is F8.2.
@subsubheading PIBHEX and RBHEX Formats
These are hexadecimal formats, for reading and writing binary formats
where each byte has been recoded as a pair of hexadecimal digits.
A hexadecimal field consists solely of hexadecimal digits
@samp{0}@dots{}@samp{9} and @samp{A}@dots{}@samp{F}. Uppercase and
lowercase are accepted on input; output is in uppercase.
Other than the hexadecimal representation, these formats are equivalent
to PIB and RB formats, respectively. However, bytes in PIBHEX format
are always ordered with the most-significant byte first (big-endian
order), regardless of the host machine's native byte order or @pspp{}
settings.
Field widths must be even and between 2 and 16. RBHEX format allows no
decimal places; PIBHEX allows as many decimal places as a PIB format
with half the given width.
@node Time and Date Formats
@subsubsection Time and Date Formats
@cindex time formats
@cindex date formats
In @pspp{}, a @dfn{time} is an interval. The time formats translate
between human-friendly descriptions of time intervals and @pspp{}'s
internal representation of time intervals, which is simply the number of
seconds in the interval. @pspp{} has three time formats:
@float
@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 01:31:17.01}}
@headitem Time Format @tab Template @tab Example
@item MTIME @tab @code{MM:SS.ss} @tab @code{91:17.01}
@item TIME @tab @code{hh:MM:SS.ss} @tab @code{01:31:17.01}
@item DTIME @tab @code{DD HH:MM:SS.ss} @tab @code{00 04:31:17.01}
@end multitable
@end float
A @dfn{date} is a moment in the past or the future. Internally, @pspp{}
represents a date as the number of seconds since the @dfn{epoch},
midnight, Oct. 14, 1582. The date formats translate between
human-readable dates and @pspp{}'s numeric representation of dates and
times. @pspp{} has several date formats:
@float
@multitable {Date Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
@headitem Date Format @tab Template @tab Example
@item DATE @tab @code{dd-mmm-yyyy} @tab @code{01-OCT-1978}
@item ADATE @tab @code{mm/dd/yyyy} @tab @code{10/01/1978}
@item EDATE @tab @code{dd.mm.yyyy} @tab @code{01.10.1978}
@item JDATE @tab @code{yyyyjjj} @tab @code{1978274}
@item SDATE @tab @code{yyyy/mm/dd} @tab @code{1978/10/01}
@item QYR @tab @code{q Q yyyy} @tab @code{3 Q 1978}
@item MOYR @tab @code{mmm yyyy} @tab @code{OCT 1978}
@item WKYR @tab @code{ww WK yyyy} @tab @code{40 WK 1978}
@item DATETIME @tab @code{dd-mmm-yyyy HH:MM:SS.ss} @tab @code{01-OCT-1978 04:31:17.01}
@item YMDHMS @tab @code{yyyy-mm-dd HH:MM:SS.ss} @tab @code{1978-01-OCT 04:31:17.01}
@end multitable
@end float
The templates in the preceding tables describe how the time and date
formats are input and output:
@table @code
@item dd
Day of month, from 1 to 31. Always output as two digits.
@item mm
@itemx mmm
Month. In output, @code{mm} is output as two digits, @code{mmm} as the
first three letters of an English month name (January, February,
@dots{}). In input, both of these formats, plus Roman numerals, are
accepted.
@item yyyy
Year. In output, DATETIME and YMDHMS always produce 4-digit years;
other formats can produce a 2- or 4-digit year. The century assumed
for 2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).
In output, a year outside the epoch causes the whole field to be
filled with asterisks (@samp{*}).
@item jjj
Day of year (Julian day), from 1 to 366. This is exactly three digits
giving the count of days from the start of the year. January 1 is
considered day 1.
@item q
Quarter of year, from 1 to 4. Quarters start on January 1, April 1,
July 1, and October 1.
@item ww
Week of year, from 1 to 53. Output as exactly two digits. January 1 is
the first day of week 1.
@item DD
Count of days, which may be positive or negative. Output as at least
two digits.
@item hh
Count of hours, which may be positive or negative. Output as at least
two digits.
@item HH
Hour of day, from 0 to 23. Output as exactly two digits.
@item MM
In MTIME, count of minutes, which may be positive or negative. Output
as at least two digits.
In other formats, minute of hour, from 0 to 59. Output as exactly two
digits.
@item SS.ss
Seconds within minute, from 0 to 59. The integer part is output as
exactly two digits. On output, seconds and fractional seconds may or
may not be included, depending on field width and decimal places. On
input, seconds and fractional seconds are optional. The DECIMAL setting
controls the character accepted and displayed as the decimal point
(@pxref{SET DECIMAL}).
@end table
For output, the date and time formats use the delimiters indicated in
the table. For input, date components may be separated by spaces or by
one of the characters @samp{-}, @samp{/}, @samp{.}, or @samp{,}, and
time components may be separated by spaces or @samp{:}. On
input, the @samp{Q} separating quarter from year and the @samp{WK}
separating week from year may be uppercase or lowercase, and the spaces
around them are optional.
On input, all time and date formats accept any amount of leading and
trailing white space.
The maximum width for time and date formats is 40 columns. Minimum
input and output width for each of the time and date formats is shown
below:
@float
@multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year}
@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option
@item DATE @tab 8 @tab 9 @tab 4-digit year
@item ADATE @tab 8 @tab 8 @tab 4-digit year
@item EDATE @tab 8 @tab 8 @tab 4-digit year
@item JDATE @tab 5 @tab 5 @tab 4-digit year
@item SDATE @tab 8 @tab 8 @tab 4-digit year
@item QYR @tab 4 @tab 6 @tab 4-digit year
@item MOYR @tab 6 @tab 6 @tab 4-digit year
@item WKYR @tab 6 @tab 8 @tab 4-digit year
@item DATETIME @tab 17 @tab 17 @tab seconds
@item YMDHMS @tab 12 @tab 16 @tab seconds
@item MTIME @tab 4 @tab 5
@item TIME @tab 5 @tab 5 @tab seconds
@item DTIME @tab 8 @tab 8 @tab seconds
@end multitable
@end float
@noindent
In the table, ``Option'' describes what increased output width enables:
@table @asis
@item 4-digit year
A field 2 columns wider than the minimum includes a 4-digit year.
(DATETIME and YMDHMS formats always include a 4-digit year.)
@item seconds
A field 3 columns wider than the minimum includes seconds as well as
minutes. A field 5 columns wider than minimum, or more, can also
include a decimal point and fractional seconds (but no more than allowed
by the format's decimal places).
@end table
For the time and date formats, the default output format is the same as
the input format, except that @pspp{} increases the field width, if
necessary, to the minimum allowed for output.
Time or dates narrower than the field width are right-justified within
the field.
When a time or date exceeds the field width, characters are trimmed from
the end until it fits. This can occur in an unusual situation, @i{e.g.}@:
with a year greater than 9999 (which adds an extra digit), or for a
negative value on MTIME, TIME, or DTIME (which adds a leading minus sign).
@c What about out-of-range values?
The system-missing value is output as a period at the right end of the
field.
@node Date Component Formats
@subsubsection Date Component Formats
The WKDAY and MONTH formats provide input and output for the names of
weekdays and months, respectively.
On output, these formats convert a number between 1 and 7, for WKDAY, or
between 1 and 12, for MONTH, into the English name of a day or month,
respectively. If the name is longer than the field, it is trimmed to
fit. If the name is shorter than the field, it is padded on the right
with spaces. Values outside the valid range, and the system-missing
value, are output as all spaces.
On input, English weekday or month names (in uppercase or lowercase) are
converted back to their corresponding numbers. Weekday and month names
may be abbreviated to their first 2 or 3 letters, respectively.
The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for
MONTH. No decimal places are allowed.
The default output format is the same as the input format.
@node String Formats
@subsubsection String Formats
@cindex string formats
The A and AHEX formats are the only ones that may be assigned to string
variables. Neither format allows any decimal places.
In A format, the entire field is treated as a string value. The field
width may range from 1 to 32,767, the maximum string width. The default
output format is the same as the input format.
In AHEX format, the field is composed of characters in a string encoded
as hex digit pairs. On output, hex digits are output in uppercase; on
input, uppercase and lowercase are both accepted. The default output
format is A format with half the input width.
@node Scratch Variables
@subsection Scratch Variables
@cindex scratch variables
Most of the time, variables don't retain their values between cases.
Instead, either they're being read from a data file or the active dataset,
in which case they assume the value read, or, if created with
@cmd{COMPUTE} or
another transformation, they're initialized to the system-missing value
or to blanks, depending on type.
However, sometimes it's useful to have a variable that keeps its value
between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
use a @dfn{scratch variable}. Scratch variables are variables whose
names begin with an octothorpe (@samp{#}).
Scratch variables have the same properties as variables left with
@cmd{LEAVE}: they retain their values between cases, and for the first
case they are initialized to 0 or blanks. They have the additional
property that they are deleted before the execution of any procedure.
For this reason, scratch variables can't be used for analysis. To use
a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
to copy its value into an ordinary variable, then use that ordinary
variable in the analysis.
@node Files
@section Files Used by @pspp{}
@pspp{} makes use of many files each time it runs. Some of these it
reads, some it writes, some it creates. Here is a table listing the
most important of these files:
@table @strong
@cindex file, command
@cindex file, syntax file
@cindex command file
@cindex syntax file
@item command file
@itemx syntax file
These names (synonyms) refer to the file that contains instructions
that tell @pspp{} what to do. The syntax file's name is specified on
the @pspp{} command line. Syntax files can also be read with
@cmd{INCLUDE} (@pxref{INCLUDE}).
@cindex file, data
@cindex data file
@item data file
Data files contain raw data in text or binary format. Data can also
be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
@cindex file, output
@cindex output file
@item listing file
One or more output files are created by @pspp{} each time it is
run. The output files receive the tables and charts produced by
statistical procedures. The output files may be in any number of formats,
depending on how @pspp{} is configured.
@cindex system file
@cindex file, system
@item system file
System files are binary files that store a dictionary and a set of
cases. @cmd{GET} and @cmd{SAVE} read and write system files.
@cindex portable file
@cindex file, portable
@item portable file
Portable files are files in a text-based format that store a dictionary
and a set of cases. @cmd{IMPORT} and @cmd{EXPORT} read and write
portable files.
@end table
@node File Handles
@section File Handles
@cindex file handles
A @dfn{file handle} is a reference to a data file, system file, or
portable file. Most often, a file handle is specified as the
name of a file as a string, that is, enclosed within @samp{'} or
@samp{"}.
A file name string that begins or ends with @samp{|} is treated as the
name of a command to pipe data to or from. You can use this feature
to read data over the network using a program such as @samp{curl}
(@i{e.g.}@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
read compressed data from a file using a program such as @samp{zcat}
(@i{e.g.}@: @code{GET '|zcat mydata.sav.gz'}), and for many other
purposes.
@pspp{} also supports declaring named file handles with the @cmd{FILE
HANDLE} command. This command associates an identifier of your choice
(the file handle's name) with a file. Later, the file handle name can
be substituted for the name of the file. When @pspp{} syntax accesses a
file multiple times, declaring a named file handle simplifies updating
the syntax later to use a different file. Use of @cmd{FILE HANDLE} is
also required to read data files in binary formats. @xref{FILE HANDLE},
for more information.
In some circumstances, @pspp{} must distinguish whether a file handle
refers to a system file or a portable file. When this is necessary to
read a file, @i{e.g.}@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
@pspp{} uses the file's contents to decide. In the context of writing a
file, @i{e.g.}@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, @pspp{}
decides based on the file's name: if it ends in @samp{.por} (with any
capitalization), then @pspp{} writes a portable file; otherwise, @pspp{}
writes a system file.
INLINE is reserved as a file handle name. It refers to the ``data
file'' embedded into the syntax file between @cmd{BEGIN DATA} and
@cmd{END DATA}. @xref{BEGIN DATA}, for more information.
The file to which a file handle refers may be reassigned on a later
@cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
HANDLE}. @xref{CLOSE FILE HANDLE}, for
more information.
@node BNF
@section Backus-Naur Form
@cindex BNF
@cindex Backus-Naur Form
@cindex command syntax, description of
@cindex description of command syntax
The syntax of some parts of the @pspp{} language is presented in this
manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
following table describes BNF:
@itemize @bullet
@cindex keywords
@cindex terminals
@item
Words in all-uppercase are @pspp{} keyword tokens. In BNF, these are
often called @dfn{terminals}. There are some special terminals, which
are written in lowercase for clarity:
@table @asis
@cindex @code{number}
@item @code{number}
A real number.
@cindex @code{integer}
@item @code{integer}
An integer number.
@cindex @code{string}
@item @code{string}
A string.
@cindex @code{var-name}
@item @code{var-name}
A single variable name.
@cindex operators
@cindex punctuators
@item @code{=}, @code{/}, @code{+}, @code{-}, etc.
Operators and punctuators.
@cindex @code{.}
@item @code{.}
The end of the command. This is not necessarily an actual dot in the
syntax file (@pxref{Commands}).
@end table
@item
@cindex productions
@cindex nonterminals
Other words in all lowercase refer to BNF definitions, called
@dfn{productions}. These productions are also known as
@dfn{nonterminals}. Some nonterminals are very common, so they are
defined here in English for clarity:
@table @code
@cindex @code{var-list}
@item var-list
A list of one or more variable names or the keyword @code{ALL}.
@cindex @code{expression}
@item expression
An expression. @xref{Expressions}, for details.
@end table
@item
@cindex ``is defined as''
@cindex productions
@samp{::=} means ``is defined as''. The left side of @samp{::=} gives
the name of the nonterminal being defined. The right side of @samp{::=}
gives the definition of that nonterminal. If the right side is empty,
then one possible expansion of that nonterminal is nothing. A BNF
definition is called a @dfn{production}.
@item
@cindex terminals and nonterminals, differences
So, the key difference between a terminal and a nonterminal is that a
terminal cannot be broken into smaller parts---in fact, every terminal
is a single token (@pxref{Tokens}). On the other hand, nonterminals are
composed of a (possibly empty) sequence of terminals and nonterminals.
Thus, terminals indicate the deepest level of syntax description. (In
parsing theory, terminals are the leaves of the parse tree; nonterminals
form the branches.)
@item
@cindex start symbol
@cindex symbol, start
The first nonterminal defined in a set of productions is called the
@dfn{start symbol}. The start symbol defines the entire syntax for
that command.
@end itemize
|