File: language.texi

package info (click to toggle)
pspp 2.0.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,676 kB
  • sloc: ansic: 267,210; xml: 18,446; sh: 5,534; python: 2,881; makefile: 125; perl: 64
file content (1567 lines) | stat: -rw-r--r-- 59,433 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
@c PSPP - a program for statistical analysis.
@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
@c A copy of the license is included in the section entitled "GNU
@c Free Documentation License".
@c
@node Language
@chapter The @pspp{} language
@cindex language, @pspp{}
@cindex @pspp{}, language

This chapter discusses elements common to many @pspp{} commands.
Later chapters describe individual commands in detail.

@menu
* Tokens::                      Characters combine to form tokens.
* Commands::                    Tokens combine to form commands.
* Syntax Variants::             Batch vs. Interactive mode
* Types of Commands::           Commands come in several flavors.
* Order of Commands::           Commands combine to form syntax files.
* Missing Observations::        Handling missing observations.
* Datasets::                    Data organization.
* Files::                       Files used by @pspp{}.
* File Handles::                How files are named.
* BNF::                         How command syntax is described.
@end menu


@node Tokens
@section Tokens
@cindex language, lexical analysis
@cindex language, tokens
@cindex tokens
@cindex lexical analysis

@pspp{} divides most syntax file lines into series of short chunks
called @dfn{tokens}.
Tokens are then grouped to form commands, each of which tells
@pspp{} to take some action---read in data, write out data, perform
a statistical procedure, etc.  Each type of token is
described below.

@table @strong
@cindex identifiers
@item Identifiers
Identifiers are names that typically specify variables, commands, or
subcommands.  The first character in an identifier must be a letter,
@samp{#}, or @samp{@@}.  The remaining characters in the identifier
must be letters, digits, or one of the following special characters:

@example
@center @.  _  $  #  @@
@end example

@cindex case-sensitivity
Identifiers may be any length, but only the first 64 bytes are
significant.  Identifiers are not case-sensitive: @code{foobar},
@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
different representations of the same identifier.

@cindex identifiers, reserved
@cindex reserved identifiers
Some identifiers are reserved.  Reserved identifiers may not be used
in any context besides those explicitly described in this manual.  The
reserved identifiers are:

@example
@center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
@end example

@item Keywords
Keywords are a subclass of identifiers that form a fixed part of
command syntax.  For example, command and subcommand names are
keywords.  Keywords may be abbreviated to their first 3 characters if
this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
characters are also accepted: @samp{FRE}, @samp{FREQ}, and
@samp{FREQUENCIES} are equivalent when the last is a keyword.)

Reserved identifiers are always used as keywords.  Other identifiers
may be used both as keywords and as user-defined identifiers, such as
variable names.

@item Numbers
@cindex numbers
@cindex integers
@cindex reals
Numbers are expressed in decimal.  A decimal point is optional.
Numbers may be expressed in scientific notation by adding @samp{e} and
a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
are some more examples of valid numbers:

@example
-5  3.14159265359  1e100  -.707  8945.
@end example

Negative numbers are expressed with a @samp{-} prefix.  However, in
situations where a literal @samp{-} token is expected, what appears to
be a negative number is treated as @samp{-} followed by a positive
number.

No white space is allowed within a number token, except for horizontal
white space between @samp{-} and the rest of the number.

The last example above, @samp{8945.} is interpreted as two
tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
@xref{Commands, , Forming commands of tokens}.

@item Strings
@cindex strings
@cindex @samp{'}
@cindex @samp{"}
@cindex case-sensitivity
Strings are literal sequences of characters enclosed in pairs of
single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
character used for quoting in the string, double it, @i{e.g.}@:
@samp{'it''s an apostrophe'}.  White space and case of letters are
significant inside strings.

Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
'c'} is equivalent to @samp{'abc'}.  So that a long string may be
broken across lines, a line break may precede or follow, or both
precede and follow, the @samp{+}.  (However, an entirely blank line
preceding or following the @samp{+} is interpreted as ending the
current command.)

Strings may also be expressed as hexadecimal character values by
prefixing the initial quote character by @samp{x} or @samp{X}.
Regardless of the syntax file or active dataset's encoding, the
hexadecimal digits in the string are interpreted as Unicode characters
in UTF-8 encoding.

Individual Unicode code points may also be expressed by specifying the
hexadecimal code point number in single or double quotes preceded by
@samp{u} or @samp{U}.  For example, Unicode code point U+1D11E, the
musical G clef character, could be expressed as @code{U'1D11E'}.
Invalid Unicode code points (above U+10FFFF or in between U+D800 and
U+DFFF) are not allowed.

When strings are concatenated with @samp{+}, each segment's prefix is
considered individually.  For example, @code{'The G clef symbol is:' +
u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
plain text string.

@item Punctuators and Operators
@cindex punctuators
@cindex operators
These tokens are the punctuators and operators:

@example
@center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
@end example

Most of these appear within the syntax of commands, but the period
(@samp{.}) punctuator is used only at the end of a command.  It is a
punctuator only as the last character on a line (except white space).
When it is the last non-space character on a line, a period is not
treated as part of another token, even if it would otherwise be part
of, @i{e.g.}@:, an identifier or a floating-point number.
@end table

@node Commands
@section Forming commands of tokens

@cindex @pspp{}, command structure
@cindex language, command structure
@cindex commands, structure

Most @pspp{} commands share a common structure.  A command begins with a
command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
CASES}.  The command name may be abbreviated to its first word, and
each word in the command name may be abbreviated to its first three
or more characters, where these abbreviations are unambiguous.

The command name may be followed by one or more @dfn{subcommands}.
Each subcommand begins with a subcommand name, which may be
abbreviated to its first three letters.  Some subcommands accept a
series of one or more specifications, which follow the subcommand
name, optionally separated from it by an equals sign
(@samp{=}). Specifications may be separated from each other
by commas or spaces.  Each subcommand must be separated from the next (if any)
by a forward slash (@samp{/}).

There are multiple ways to mark the end of a command.  The most common
way is to end the last line of the command with a period (@samp{.}) as
described in the previous section (@pxref{Tokens}).  A blank line, or
one that consists only of white space or comments, also ends a command.

@node Syntax Variants
@section Syntax Variants

@cindex Batch syntax
@cindex Interactive syntax

There are three variants of command syntax, which vary only in how
they detect the end of one command and the start of the next.

In @dfn{interactive mode}, which is the default for syntax typed at a
command prompt, a period as the last non-blank character on a line
ends a command.  A blank line also ends a command.

In @dfn{batch mode}, an end-of-line period or a blank line also ends a
command.  Additionally, it treats any line that has a non-blank
character in the leftmost column as beginning a new command.  Thus, in
batch mode the second and subsequent lines in a command must be
indented.

Regardless of the syntax mode, a plus sign, minus sign, or period in
the leftmost column of a line is ignored and causes that line to begin
a new command.  This is most useful in batch mode, in which the first
line of a new command could not otherwise be indented, but it is
accepted regardless of syntax mode.

The default mode for reading commands from a file is @dfn{auto mode}.
It is the same as batch mode, except that a line with a non-blank in
the leftmost column only starts a new command if that line begins with
the name of a @pspp{} command.  This correctly interprets most valid @pspp{}
syntax files regardless of the syntax mode for which they are
intended.

The @option{--interactive} (or @option{-i}) or @option{--batch} (or
@option{-b}) options set the syntax mode for files listed on the @pspp{}
command line.  @xref{Main Options}, for more details.

@node Types of Commands
@section Types of Commands

Commands in @pspp{} are divided roughly into six categories:

@table @strong
@item Utility commands
@cindex utility commands
Set or display various global options that affect @pspp{} operations.
May appear anywhere in a syntax file.  @xref{Utilities, , Utility
commands}.

@item File definition commands
@cindex file definition commands
Give instructions for reading data from text files or from special
binary ``system files''.  Most of these commands replace any previous
data or variables with new data or
variables.  At least one file definition command must appear before the first command in any of
the categories below.  @xref{Data Input and Output}.

@item Input program commands
@cindex input program commands
Though rarely used, these provide tools for reading data files
in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.

@item Transformations
@cindex transformations
Perform operations on data and write data to output files.  Transformations
are not carried out until a procedure is executed.

@item Restricted transformations
@cindex restricted transformations
Transformations that cannot appear in certain contexts.  @xref{Order
of Commands}, for details.

@item Procedures
@cindex procedures
Analyze data, writing results of analyses to the listing file.  Cause
transformations specified earlier in the file to be performed.  In a
more general sense, a @dfn{procedure} is any command that causes the
active dataset (the data) to be read.
@end table

@node Order of Commands
@section Order of Commands
@cindex commands, ordering
@cindex order of commands

@pspp{} does not place many restrictions on ordering of commands.  The
main restriction is that variables must be defined before they are otherwise
referenced.  This section describes the details of command ordering,
but most users will have no need to refer to them.

@pspp{} possesses five internal states, called @dfn{initial}, @dfn{input-program}
@dfn{file-type}, @dfn{transformation}, and @dfn{procedure} states.  (Please note the
distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
@emph{commands} and the @dfn{input-program} and @dfn{file-type} @emph{states}.)

@pspp{} starts in the initial state.  Each successful completion
of a command may cause a state transition.  Each type of command has its
own rules for state transitions:

@table @strong
@item Utility commands
@itemize @bullet
@item
Valid in any state.
@item
Do not cause state transitions.  Exception: when @cmd{N OF CASES}
is executed in the procedure state, it causes a transition to the
transformation state.
@end itemize

@item @cmd{DATA LIST}
@itemize @bullet
@item
Valid in any state.
@item
When executed in the initial or procedure state, causes a transition to
the transformation state.
@item
Clears the active dataset if executed in the procedure or transformation
state.
@end itemize

@item @cmd{INPUT PROGRAM}
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Causes a transition to the intput-program state.
@item
Clears the active dataset.
@end itemize

@item @cmd{FILE TYPE}
@itemize @bullet
@item
Invalid in intput-program and file-type states.
@item
Causes a transition to the file-type state.
@item
Clears the active dataset.
@end itemize

@item Other file definition commands
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Cause a transition to the transformation state.
@item
Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
and @cmd{UPDATE}.
@end itemize

@item Transformations
@itemize @bullet
@item
Invalid in initial and file-type states.
@item
Cause a transition to the transformation state.
@end itemize

@item Restricted transformations
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the transformation state.
@end itemize

@item Procedures
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the procedure state.
@end itemize
@end table

@node Missing Observations
@section Handling missing observations
@cindex missing values
@cindex values, missing

@pspp{} includes special support for unknown numeric data values.
Missing observations are assigned a special value, called the
@dfn{system-missing value}.  This ``value'' actually indicates the
absence of a value; it means that the actual value is unknown.  Procedures
automatically exclude from analyses those observations or cases that
have missing values.  Details of missing value exclusion depend on the
procedure and can often be controlled by the user; refer to
descriptions of individual procedures for details.

The system-missing value exists only for numeric variables.  String
variables always have a defined value, even if it is only a string of
spaces.

Variables, whether numeric or string, can have designated
@dfn{user-missing values}.  Every user-missing value is an actual value
for that variable.  However, most of the time user-missing values are
treated in the same way as the system-missing value.

For more information on missing values, see the following sections:
@ref{Datasets}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
documentation on individual procedures for information on how they
handle missing values.

@node Datasets
@section Datasets
@cindex dataset
@cindex variable
@cindex dictionary

@pspp{} works with data organized into @dfn{datasets}.  A dataset
consists of a set of @dfn{variables}, which taken together are said to
form a @dfn{dictionary}, and one or more @dfn{cases}, each of which
has one value for each variable.

At any given time @pspp{} has exactly one distinguished dataset, called
the @dfn{active dataset}.  Most @pspp{} commands work only with the
active dataset.  In addition to the active dataset, @pspp{} also supports
any number of additional open datasets.  The @cmd{DATASET} commands
can choose a new active dataset from among those that are open, as
well as create and destroy datasets (@pxref{DATASET}).

The sections below describe variables in more detail.

@menu
* Attributes::                  Attributes of variables.
* System Variables::            Variables automatically defined by @pspp{}.
* Sets of Variables::           Lists of variable names.
* Input and Output Formats::    Input and output formats.
* Scratch Variables::           Variables deleted by procedures.
@end menu

@node Attributes
@subsection Attributes of Variables
@cindex variables, attributes of
@cindex attributes of variables
Each variable has a number of attributes, including:

@table @strong
@item Name
An identifier, up to 64 bytes long.  Each variable must have a different name.
@xref{Tokens}.

Some system variable names begin with @samp{$}, but user-defined
variables' names may not begin with @samp{$}.

@cindex @samp{.}
@cindex period
@cindex variable names, ending with period
The final character in a variable name should not be @samp{.}, because
such an identifier will be misinterpreted when it is the final token
on a line: @code{FOO.} is divided into two separate tokens,
@samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.

@cindex @samp{_}
The final character in a variable name should not be @samp{_}, because
some such identifiers are used for special purposes by @pspp{}
procedures.

As with all @pspp{} identifiers, variable names are not case-sensitive.
@pspp{} capitalizes variable names on output the same way they were
capitalized at their point of definition in the input.

@cindex variables, type
@cindex type of variables
@item Type
Numeric or string.

@cindex variables, width
@cindex width of variables
@item Width
(string variables only) String variables with a width of 8 characters or
fewer are called @dfn{short string variables}.  Short string variables
may be used in a few contexts where @dfn{long string variables} (those
with widths greater than 8) are not allowed.

@item Position
Variables in the dictionary are arranged in a specific order.
@cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.

@item Initialization
Either reinitialized to 0 or spaces for each case, or left at its
existing value.  @xref{LEAVE}.

@cindex missing values
@cindex values, missing
@item Missing values
Optionally, up to three values, or a range of values, or a specific
value plus a range, can be specified as @dfn{user-missing values}.
There is also a @dfn{system-missing value} that is assigned to an
observation when there is no other obvious value for that observation.
Observations with missing values are automatically excluded from
analyses.  User-missing values are actual data values, while the
system-missing value is not a value at all.  @xref{Missing Observations}.

@cindex variable labels
@cindex labels, variable
@item Variable label
A string that describes the variable.  @xref{VARIABLE LABELS}.

@cindex value labels
@cindex labels, value
@item Value label
Optionally, these associate each possible value of the variable with a
string.  @xref{VALUE LABELS}.

@cindex print format
@item Print format
Display width, format, and (for numeric variables) number of decimal
places.  This attribute does not affect how data are stored, just how
they are displayed.  Example: a width of 8, with 2 decimal places.
@xref{Input and Output Formats}.

@cindex write format
@item Write format
Similar to print format, but used by the @cmd{WRITE} command
(@pxref{WRITE}).

@cindex measurement level
@item Measurement level
@anchor{Measurement Level}
One of the following:

@table @asis
@item Nominal
Each value of a nominal variable represents a distinct category.  The
possible categories are finite and often have value labels.  The order
of categories is not significant.  Political parties, US states, and
yes/no choices are nominal.  Numeric and string variables can be
nominal.

@item Ordinal
Ordinal variables also represent distinct categories, but their values
are arranged according to some natural order.  Likert scales, e.g.@:
from strongly disagree to strongly agree, are ordinal.  Data grouped
into ranges, e.g.@: age groups or income groups, are ordinal.  Both
numeric and string variables can be ordinal.  String values are
ordered alphabetically, so letter grades from A to F will work as
expected, but @code{poor}, @code{satisfactory}, @code{excellent} will
not.

@item Scale
Scale variables are ones for which differences and ratios are
meaningful.  These are often values which have a natural unit
attached, such as age in years, income in dollars, or distance in
miles.  Only numeric variables are scalar.
@end table

Variables created by @cmd{COMPUTE} and similar transformations,
obtained from external sources, etc., initially have an unknown
measurement level.  Any procedure that reads the data will then assign
a default measurement level.  @pspp{} can assign some defaults without
reading the data:

@itemize @bullet
@item
Nominal, if it's a string variable.

@item
Nominal, if the variable has a WKDAY or MONTH print format.

@item
Scale, if the variable has a DOLLAR, CCA through CCE, or time or date
print format.
@end itemize

Otherwise, @pspp{} reads the data and decides based on its
distribution:

@itemize @bullet
@item
Nominal, if all observations are missing.

@item
Scale, if one or more valid observations are noninteger or negative.

@item
Scale, if no valid observation is less than 10.

@item
Scale, if the variable has 24 or more unique valid values.  The value
24 is the default and can be adjusted (@pxref{SET SCALEMIN}).
@end itemize

Finally, if none of the above is true, @pspp{} assigns the variable a
nominal measurement level.

@cindex custom attributes
@item Custom attributes
User-defined associations between names and values.  @xref{VARIABLE
ATTRIBUTE}.

@cindex variable role
@item Role
The intended role of a variable for use in dialog boxes in graphical
user interfaces.  @xref{VARIABLE ROLE}.
@end table

@node System Variables
@subsection Variables Automatically Defined by @pspp{}
@cindex system variables
@cindex variables, system

There are seven system variables.  These are not like ordinary
variables because system variables are not always stored.  They can be used only
in expressions.  These system variables, whose values and output formats
cannot be modified, are described below.

@table @code
@cindex @code{$CASENUM}
@item $CASENUM
Case number of the case at the moment.  This changes as cases are
shuffled around.

@cindex @code{$DATE}
@item $DATE
Date the @pspp{} process was started, in format A9, following the
pattern @code{DD-MMM-YY}.

@cindex @code{$DATE11}
@item $DATE11
Date the @pspp{} process was started, in format A11, following the
pattern @code{DD-MMM-YYYY}.

@cindex @code{$JDATE}
@item $JDATE
Number of days between 15 Oct 1582 and the time the @pspp{} process
was started.

@cindex @code{$LENGTH}
@item $LENGTH
Page length, in lines, in format F11.

@cindex @code{$SYSMIS}
@item $SYSMIS
System missing value, in format F1.

@cindex @code{$TIME}
@item $TIME
Number of seconds between midnight 14 Oct 1582 and the time the active dataset
was read, in format F20.

@cindex @code{$WIDTH}
@item $WIDTH
Page width, in characters, in format F3.
@end table

@node Sets of Variables
@subsection Lists of variable names
@cindex @code{TO} convention
@cindex convention, @code{TO}

To refer to a set of variables, list their names one after another.
Optionally, their names may be separated by commas.  To include a
range of variables from the dictionary in the list, write the name of
the first and last variable in the range, separated by @code{TO}.  For
instance, if the dictionary contains six variables with the names
@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
variables @code{X2}, @code{GOAL}, and @code{MET}.

Commands that define variables, such as @cmd{DATA LIST}, give
@code{TO} an alternate meaning.  With these commands, @code{TO} define
sequences of variables whose names end in consecutive integers.  The
syntax is two identifiers that begin with the same root and end with
numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
@code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.

After a set of variables has been defined with @cmd{DATA LIST} or
another command with this method, the same set can be referenced on
later commands using the same syntax.

@node Input and Output Formats
@subsection Input and Output Formats

@cindex formats
An @dfn{input format} describes how to interpret the contents of an
input field as a number or a string.  It might specify that the field
contains an ordinary decimal number, a time or date, a number in binary
or hexadecimal notation, or one of several other notations.  Input
formats are used by commands such as @cmd{DATA LIST} that read data or
syntax files into the @pspp{} active dataset.

Every input format corresponds to a default @dfn{output format} that
specifies the formatting used when the value is output later.  It is
always possible to explicitly specify an output format that resembles
the input format.  Usually, this is the default, but in cases where the
input format is unfriendly to human readability, such as binary or
hexadecimal formats, the default output format is an easier-to-read
decimal format.

Every variable has two output formats, called its @dfn{print format} and
@dfn{write format}.  Print formats are used in most output contexts;
write formats are used only by @cmd{WRITE} (@pxref{WRITE}).  Newly
created variables have identical print and write formats, and
@cmd{FORMATS}, the most commonly used command for changing formats
(@pxref{FORMATS}), sets both of them to the same value as well.  Thus,
most of the time, the distinction between print and write formats is
unimportant.

Input and output formats are specified to @pspp{} with
a @dfn{format specification} of the
form @subcmd{@var{TYPE}@var{w}} or @code{TYPE@var{w}.@var{d}}, where
@var{TYPE} is one of the format types described later, @var{w} is a
field width measured in columns, and @var{d} is an optional number of
decimal places.  If @var{d} is omitted, a value of 0 is assumed.  Some
formats do not allow a nonzero @var{d} to be specified.

The following sections describe the input and output formats supported
by @pspp{}.

@menu
* Basic Numeric Formats::
* Custom Currency Formats::
* Legacy Numeric Formats::
* Binary and Hexadecimal Numeric Formats::
* Time and Date Formats::
* Date Component Formats::
* String Formats::
@end menu

@node Basic Numeric Formats
@subsubsection Basic Numeric Formats

@cindex numeric formats
The basic numeric formats are used for input and output of real numbers
in standard or scientific notation.  The following table shows an
example of how each format displays positive and negative numbers with
the default decimal point setting:

@float
@multitable {DOLLAR10.2} {@code{@tie{}$3,141.59}} {@code{-$3,141.59}}
@headitem Format @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
@item F8.2       @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
@item COMMA9.2   @tab @code{@tie{}3,141.59}  @tab @code{-3,141.59}
@item DOT9.2     @tab @code{@tie{}3.141,59}  @tab @code{-3.141,59}
@item DOLLAR10.2 @tab @code{@tie{}$3,141.59} @tab @code{-$3,141.59}
@item PCT9.2     @tab @code{@tie{}3141.59%}  @tab @code{-3141.59%}
@item E8.1       @tab @code{@tie{}3.1E+003}  @tab @code{-3.1E+003}
@end multitable
@end float

On output, numbers in F format are expressed in standard decimal
notation with the requested number of decimal places.  The other formats
output some variation on this style:

@itemize @bullet
@item
Numbers in COMMA format are additionally grouped every three digits by
inserting a grouping character.  The grouping character is ordinarily a
comma, but it can be changed to a period (@pxref{SET DECIMAL}).

@item
DOT format is like COMMA format, but it interchanges the role of the
decimal point and grouping characters.  That is, the current grouping
character is used as a decimal point and vice versa.

@item
DOLLAR format is like COMMA format, but it prefixes the number with
@samp{$}.

@item
PCT format is like F format, but adds @samp{%} after the number.

@item
The E format always produces output in scientific notation.
@end itemize

On input, the basic numeric formats accept positive and numbers in
standard decimal notation or scientific notation.  Leading and trailing
spaces are allowed.  An empty or all-spaces field, or one that contains
only a single period, is treated as the system missing value.

In scientific notation, the exponent may be introduced by a sign
(@samp{+} or @samp{-}), or by one of the letters @samp{e} or @samp{d}
(in uppercase or lowercase), or by a letter followed by a sign.  A
single space may follow the letter or the sign or both.

On fixed-format @cmd{DATA LIST} (@pxref{DATA LIST FIXED}) and in a few
other contexts, decimals are implied when the field does not contain a
decimal point.  In F6.5 format, for example, the field @code{314159} is
taken as the value 3.14159 with implied decimals.  Decimals are never
implied if an explicit decimal point is present or if scientific
notation is used.

E and F formats accept the basic syntax already described.  The other
formats allow some additional variations:

@itemize @bullet
@item
COMMA, DOLLAR, and DOT formats ignore grouping characters within the
integer part of the input field.  The identity of the grouping
character depends on the format.

@item
DOLLAR format allows a dollar sign to precede the number.  In a negative
number, the dollar sign may precede or follow the minus sign.

@item
PCT format allows a percent sign to follow the number.
@end itemize

All of the basic number formats have a maximum field width of 40 and
accept no more than 16 decimal places, on both input and output.  Some
additional restrictions apply:

@itemize @bullet
@item
As input formats, the basic numeric formats allow no more decimal places
than the field width.  As output formats, the field width must be
greater than the number of decimal places; that is, large enough to
allow for a decimal point and the number of requested decimal places.
DOLLAR and PCT formats must allow an additional column for @samp{$} or
@samp{%}.

@item
The default output format for a given input format increases the field
width enough to make room for optional input characters.  If an input
format calls for decimal places, the width is increased by 1 to make
room for an implied decimal point.  COMMA, DOT, and DOLLAR formats also
increase the output width to make room for grouping characters.  DOLLAR
and PCT further increase the output field width by 1 to make room for
@samp{$} or @samp{%}.  The increased output width is capped at 40, the
maximum field width.

@item
The E format is exceptional.  For output, E format has a minimum width
of 7 plus the number of decimal places.  The default output format for
an E input format is an E format with at least 3 decimal places and
thus a minimum width of 10.
@end itemize

More details of basic numeric output formatting are given below:

@itemize @bullet
@item
Output rounds to nearest, with ties rounded away from zero.  Thus, 2.5
is output as @code{3} in F1.0 format, and -1.125 as @code{-1.13} in F5.1
format.

@item
The system-missing value is output as a period in a field of spaces,
placed in the decimal point's position, or in the rightmost column if no
decimal places are requested.  A period is used even if the decimal
point character is a comma.

@item
A number that does not fill its field is right-justified within the
field.

@item
A number is too large for its field causes decimal places to be dropped
to make room.  If dropping decimals does not make enough room,
scientific notation is used if the field is wide enough.  If a number
does not fit in the field, even in scientific notation, the overflow is
indicated by filling the field with asterisks (@samp{*}).

@item
COMMA, DOT, and DOLLAR formats insert grouping characters only if space
is available for all of them.  Grouping characters are never inserted
when all decimal places must be dropped.  Thus, 1234.56 in COMMA5.2
format is output as @samp{@tie{}1235} without a comma, even though there
is room for one, because all decimal places were dropped.

@item
DOLLAR or PCT format drop the @samp{$} or @samp{%} only if the number
would not fit at all without it.  Scientific notation with @samp{$} or
@samp{%} is preferred to ordinary decimal notation without it.

@item
Except in scientific notation, a decimal point is included only when
it is followed by a digit.  If the integer part of the number being
output is 0, and a decimal point is included, then @pspp{} ordinarily
drops the zero before the decimal point.  However, in @code{F},
@code{COMMA}, or @code{DOT} formats, @pspp{} keeps the zero if
@code{SET LEADZERO} is set to @code{ON} (@pxref{SET LEADZERO}).

In scientific notation, the number always includes a decimal point,
even if it is not followed by a digit.

@item
A negative number includes a minus sign only in the presence of a
nonzero digit: -0.01 is output as @samp{-.01} in F4.2 format but as
@samp{@tie{}@tie{}.0} in F4.1 format.  Thus, a ``negative zero'' never
includes a minus sign.

@item
In negative numbers output in DOLLAR format, the dollar sign follows the
negative sign.  Thus, -9.99 in DOLLAR6.2 format is output as
@code{-$9.99}.

@item
In scientific notation, the exponent is output as @samp{E} followed by
@samp{+} or @samp{-} and exactly three digits.  Numbers with magnitude
less than 10**-999 or larger than 10**999 are not supported by most
computers, but if they are supported then their output is considered
to overflow the field and they are output as asterisks.

@item
On most computers, no more than 15 decimal digits are significant in
output, even if more are printed.  In any case, output precision cannot
be any higher than input precision; few data sets are accurate to 15
digits of precision.  Unavoidable loss of precision in intermediate
calculations may also reduce precision of output.

@item
Special values such as infinities and ``not a number'' values are
usually converted to the system-missing value before printing.  In a few
circumstances, these values are output directly.  In fields of width 3
or greater, special values are output as however many characters
fit from @code{+Infinity} or @code{-Infinity} for infinities, from
@code{NaN} for ``not a number,'' or from @code{Unknown} for other values
(if any are supported by the system).  In fields under 3 columns wide,
special values are output as asterisks.
@end itemize

@node Custom Currency Formats
@subsubsection Custom Currency Formats

@cindex currency formats
The custom currency formats are closely related to the basic numeric
formats, but they allow users to customize the output format.  The
SET command configures custom currency formats, using the syntax
@display
SET CC@var{x}=@t{"}@var{string}@t{"}.
@end display
@noindent
where @var{x} is A, B, C, D, or E, and @var{string} is no more than 16
characters long.

@var{string} must contain exactly three commas or exactly three periods
(but not both), except that a single quote character may be used to
``escape'' a following comma, period, or single quote.  If three commas
are used, commas are used for grouping in output, and a period
is used as the decimal point.  Uses of periods reverses these roles.

The commas or periods divide @var{string} into four fields, called the
@dfn{negative prefix}, @dfn{prefix}, @dfn{suffix}, and @dfn{negative
suffix}, respectively.  The prefix and suffix are added to output
whenever space is available.  The negative prefix and negative suffix
are always added to a negative number when the output includes a nonzero
digit.

The following syntax shows how custom currency formats could be used to
reproduce basic numeric formats:

@example
@group
SET CCA="-,,,".  /* Same as COMMA.
SET CCB="-...".  /* Same as DOT.
SET CCC="-,$,,". /* Same as DOLLAR.
SET CCD="-,,%,". /* Like PCT, but groups with commas.
@end group
@end example

Here are some more examples of custom currency formats.  The final
example shows how to use a single quote to escape a delimiter:

@example
@group
SET CCA=",EUR,,-".   /* Euro.
SET CCB="(,USD ,,)". /* US dollar.
SET CCC="-.R$..".    /* Brazilian real.
SET CCD="-,, NIS,".  /* Israel shekel.
SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
@end group
@end example

@noindent These formats would yield the following output:

@float
@multitable {CCD13.2} {@code{@tie{}@tie{}USD 3,145.59}} {@code{(USD 3,145.59)}}
@headitem Format @tab @code{@tie{}3145.59}         @tab @code{-3145.59}
@item CCA12.2 @tab @code{@tie{}EUR3,145.59}        @tab @code{EUR3,145.59-}
@item CCB14.2 @tab @code{@tie{}@tie{}USD 3,145.59} @tab @code{(USD 3,145.59)}
@item CCC11.2 @tab @code{@tie{}R$3.145,59}         @tab @code{-R$3.145,59}
@item CCD13.2 @tab @code{@tie{}3,145.59 NIS}       @tab @code{-3,145.59 NIS}
@item CCE10.0 @tab @code{@tie{}Rp. 3.146}          @tab @code{-Rp. 3.146}
@end multitable
@end float

The default for all the custom currency formats is @samp{-,,,},
equivalent to COMMA format.

@node Legacy Numeric Formats
@subsubsection Legacy Numeric Formats

The N and Z numeric formats provide compatibility with legacy file
formats.  They have much in common:

@itemize @bullet
@item
Output is rounded to the nearest representable value, with ties rounded
away from zero.

@item
Numbers too large to display are output as a field filled with asterisks
(@samp{*}).

@item
The decimal point is always implicitly the specified number of digits
from the right edge of the field, except that Z format input allows an
explicit decimal point.

@item
Scientific notation may not be used.

@item
The system-missing value is output as a period in a field of spaces.
The period is placed just to the right of the implied decimal point in
Z format, or at the right end in N format or in Z format if no decimal
places are requested.  A period is used even if the decimal point
character is a comma.

@item
Field width may range from 1 to 40.  Decimal places may range from 0 up
to the field width, to a maximum of 16.

@item
When a legacy numeric format used for input is converted to an output
format, it is changed into the equivalent F format.  The field width is
increased by 1 if any decimal places are specified, to make room for a
decimal point.  For Z format, the field width is increased by 1 more
column, to make room for a negative sign.  The output field width is
capped at 40 columns.
@end itemize

@subsubheading N Format

The N format supports input and output of fields that contain only
digits.  On input, leading or trailing spaces, a decimal point, or any
other non-digit character causes the field to be read as the
system-missing value.  As a special exception, an N format used on
@cmd{DATA LIST FREE} or @cmd{DATA LIST LIST} is treated as the
equivalent F format.

On output, N pads the field on the left with zeros.  Negative numbers
are output like the system-missing value.

@subsubheading Z Format

The Z format is a ``zoned decimal'' format used on IBM mainframes.  Z
format encodes the sign as part of the final digit, which must be one of
the following:
@example
0123456789
@{ABCDEFGHI
@}JKLMNOPQR
@end example
@noindent
where the characters in each row represent digits 0 through 9 in order.
Characters in the first two rows indicate a positive sign; those in the
third indicate a negative sign.

On output, Z fields are padded on the left with spaces.  On input,
leading and trailing spaces are ignored.  Any character in an input
field other than spaces, the digit characters above, and @samp{.} causes
the field to be read as system-missing.

The decimal point character for input and output is always @samp{.},
even if the decimal point character is a comma (@pxref{SET DECIMAL}).

Nonzero, negative values output in Z format are marked as negative even
when no nonzero digits are output.  For example, -0.2 is output in Z1.0
format as @samp{J}.  The ``negative zero'' value supported by most
machines is output as positive.

@node Binary and Hexadecimal Numeric Formats
@subsubsection Binary and Hexadecimal Numeric Formats

@cindex binary formats
@cindex hexadecimal formats
The binary and hexadecimal formats are primarily designed for
compatibility with existing machine formats, not for human readability.
All of them therefore have a F format as default output format.  Some of
these formats are only portable between machines with compatible byte
ordering (endianness) or floating-point format.

Binary formats use byte values that in text files are interpreted as
special control functions, such as carriage return and line feed.  Thus,
data in binary formats should not be included in syntax files or read
from data files with variable-length records, such as ordinary text
files.  They may be read from or written to data files with fixed-length
records.  @xref{FILE HANDLE}, for information on working with
fixed-length records.

@subsubheading P and PK Formats

These are binary-coded decimal formats, in which every byte (except the
last, in P format) represents two decimal digits.  The most-significant
4 bits of the first byte is the most-significant decimal digit, the
least-significant 4 bits of the first byte is the next decimal digit,
and so on.

In P format, the most-significant 4 bits of the last byte are the
least-significant decimal digit.  The least-significant 4 bits represent
the sign: decimal 15 indicates a negative value, decimal 13 indicates a
positive value.

Numbers are rounded downward on output.  The system-missing value and
numbers outside representable range are output as zero.

The maximum field width is 16.  Decimal places may range from 0 up to
the number of decimal digits represented by the field.

The default output format is an F format with twice the input field
width, plus one column for a decimal point (if decimal places were
requested).

@subsubheading IB and PIB Formats

These are integer binary formats.  IB reads and writes 2's complement
binary integers, and PIB reads and writes unsigned binary integers.  The
byte ordering is by default the host machine's, but SET RIB may be used
to select a specific byte ordering for reading (@pxref{SET RIB}) and
SET WIB, similarly, for writing (@pxref{SET WIB}).

The maximum field width is 8.  Decimal places may range from 0 up to the
number of decimal digits in the largest value representable in the field
width.

The default output format is an F format whose width is the number of
decimal digits in the largest value representable in the field width,
plus 1 if the format has decimal places.

@subsubheading RB Format

This is a binary format for real numbers.  By default it reads and
writes the host machine's floating-point format, but SET RRB may be
used to select an alternate floating-point format for reading
(@pxref{SET RRB}) and SET WRB, similarly, for writing (@pxref{SET
WRB}).

The recommended field width depends on the floating-point format.
NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use
a field width of 8.  ISL, ISB, VF, and ZS formats should use a field
width of 4.  Other field widths do not produce useful results.  The
maximum field width is 8.  No decimal places may be specified.

The default output format is F8.2.

@subsubheading PIBHEX and RBHEX Formats

These are hexadecimal formats, for reading and writing binary formats
where each byte has been recoded as a pair of hexadecimal digits.

A hexadecimal field consists solely of hexadecimal digits
@samp{0}@dots{}@samp{9} and @samp{A}@dots{}@samp{F}.  Uppercase and
lowercase are accepted on input; output is in uppercase.

Other than the hexadecimal representation, these formats are equivalent
to PIB and RB formats, respectively.  However, bytes in PIBHEX format
are always ordered with the most-significant byte first (big-endian
order), regardless of the host machine's native byte order or @pspp{}
settings.

Field widths must be even and between 2 and 16.  RBHEX format allows no
decimal places; PIBHEX allows as many decimal places as a PIB format
with half the given width.

@node Time and Date Formats
@subsubsection Time and Date Formats

@cindex time formats
@cindex date formats
In @pspp{}, a @dfn{time} is an interval.  The time formats translate
between human-friendly descriptions of time intervals and @pspp{}'s
internal representation of time intervals, which is simply the number of
seconds in the interval.  @pspp{} has three time formats:

@float
@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 01:31:17.01}}
@headitem Time Format @tab Template                  @tab Example
@item MTIME    @tab @code{MM:SS.ss}             @tab @code{91:17.01}
@item TIME     @tab @code{hh:MM:SS.ss}          @tab @code{01:31:17.01}
@item DTIME    @tab @code{DD HH:MM:SS.ss}       @tab @code{00 04:31:17.01}
@end multitable
@end float

A @dfn{date} is a moment in the past or the future.  Internally, @pspp{}
represents a date as the number of seconds since the @dfn{epoch},
midnight, Oct. 14, 1582.  The date formats translate between
human-readable dates and @pspp{}'s numeric representation of dates and
times.  @pspp{} has several date formats:

@float
@multitable {Date Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
@headitem Date Format @tab Template                  @tab Example
@item DATE     @tab @code{dd-mmm-yyyy}          @tab @code{01-OCT-1978}
@item ADATE    @tab @code{mm/dd/yyyy}           @tab @code{10/01/1978}
@item EDATE    @tab @code{dd.mm.yyyy}           @tab @code{01.10.1978}
@item JDATE    @tab @code{yyyyjjj}              @tab @code{1978274}
@item SDATE    @tab @code{yyyy/mm/dd}           @tab @code{1978/10/01}
@item QYR      @tab @code{q Q yyyy}             @tab @code{3 Q 1978}
@item MOYR     @tab @code{mmm yyyy}             @tab @code{OCT 1978}
@item WKYR     @tab @code{ww WK yyyy}           @tab @code{40 WK 1978}
@item DATETIME @tab @code{dd-mmm-yyyy HH:MM:SS.ss} @tab @code{01-OCT-1978 04:31:17.01}
@item YMDHMS   @tab @code{yyyy-mm-dd HH:MM:SS.ss} @tab @code{1978-01-OCT 04:31:17.01}
@end multitable
@end float

The templates in the preceding tables describe how the time and date
formats are input and output:

@table @code
@item dd
Day of month, from 1 to 31.  Always output as two digits.

@item mm
@itemx mmm
Month.  In output, @code{mm} is output as two digits, @code{mmm} as the
first three letters of an English month name (January, February,
@dots{}).  In input, both of these formats, plus Roman numerals, are
accepted.

@item yyyy
Year.  In output, DATETIME and YMDHMS always produce 4-digit years;
other formats can produce a 2- or 4-digit year.  The century assumed
for 2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).
In output, a year outside the epoch causes the whole field to be
filled with asterisks (@samp{*}).

@item jjj
Day of year (Julian day), from 1 to 366.  This is exactly three digits
giving the count of days from the start of the year.  January 1 is
considered day 1.

@item q
Quarter of year, from 1 to 4.  Quarters start on January 1, April 1,
July 1, and October 1.

@item ww
Week of year, from 1 to 53.  Output as exactly two digits.  January 1 is
the first day of week 1.

@item DD
Count of days, which may be positive or negative.  Output as at least
two digits.

@item hh
Count of hours, which may be positive or negative.  Output as at least
two digits.

@item HH
Hour of day, from 0 to 23.  Output as exactly two digits.

@item MM
In MTIME, count of minutes, which may be positive or negative.  Output
as at least two digits.

In other formats, minute of hour, from 0 to 59.  Output as exactly two
digits.

@item SS.ss
Seconds within minute, from 0 to 59.  The integer part is output as
exactly two digits.  On output, seconds and fractional seconds may or
may not be included, depending on field width and decimal places.  On
input, seconds and fractional seconds are optional.  The DECIMAL setting
controls the character accepted and displayed as the decimal point
(@pxref{SET DECIMAL}).
@end table

For output, the date and time formats use the delimiters indicated in
the table.  For input, date components may be separated by spaces or by
one of the characters @samp{-}, @samp{/}, @samp{.}, or @samp{,}, and
time components may be separated by spaces or @samp{:}.  On
input, the @samp{Q} separating quarter from year and the @samp{WK}
separating week from year may be uppercase or lowercase, and the spaces
around them are optional.

On input, all time and date formats accept any amount of leading and
trailing white space.

The maximum width for time and date formats is 40 columns.  Minimum
input and output width for each of the time and date formats is shown
below:

@float
@multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year}
@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option
@item DATE @tab 8 @tab 9 @tab 4-digit year
@item ADATE @tab 8 @tab 8 @tab 4-digit year
@item EDATE @tab 8 @tab 8 @tab 4-digit year
@item JDATE @tab 5 @tab 5 @tab 4-digit year
@item SDATE @tab 8 @tab 8 @tab 4-digit year
@item QYR @tab 4 @tab 6 @tab 4-digit year
@item MOYR @tab 6 @tab 6 @tab 4-digit year
@item WKYR @tab 6 @tab 8 @tab 4-digit year
@item DATETIME @tab 17 @tab 17 @tab seconds
@item YMDHMS @tab 12 @tab 16 @tab seconds
@item MTIME @tab 4 @tab 5
@item TIME @tab 5 @tab 5 @tab seconds
@item DTIME @tab 8 @tab 8 @tab seconds
@end multitable
@end float
@noindent
In the table, ``Option'' describes what increased output width enables:

@table @asis
@item 4-digit year
A field 2 columns wider than the minimum includes a 4-digit year.
(DATETIME and YMDHMS formats always include a 4-digit year.)

@item seconds
A field 3 columns wider than the minimum includes seconds as well as
minutes.  A field 5 columns wider than minimum, or more, can also
include a decimal point and fractional seconds (but no more than allowed
by the format's decimal places).
@end table

For the time and date formats, the default output format is the same as
the input format, except that @pspp{} increases the field width, if
necessary, to the minimum allowed for output.

Time or dates narrower than the field width are right-justified within
the field.

When a time or date exceeds the field width, characters are trimmed from
the end until it fits.  This can occur in an unusual situation, @i{e.g.}@:
with a year greater than 9999 (which adds an extra digit), or for a
negative value on MTIME, TIME, or DTIME (which adds a leading minus sign).

@c What about out-of-range values?

The system-missing value is output as a period at the right end of the
field.

@node Date Component Formats
@subsubsection Date Component Formats

The WKDAY and MONTH formats provide input and output for the names of
weekdays and months, respectively.

On output, these formats convert a number between 1 and 7, for WKDAY, or
between 1 and 12, for MONTH, into the English name of a day or month,
respectively.  If the name is longer than the field, it is trimmed to
fit.  If the name is shorter than the field, it is padded on the right
with spaces.  Values outside the valid range, and the system-missing
value, are output as all spaces.

On input, English weekday or month names (in uppercase or lowercase) are
converted back to their corresponding numbers.  Weekday and month names
may be abbreviated to their first 2 or 3 letters, respectively.

The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for
MONTH.  No decimal places are allowed.

The default output format is the same as the input format.

@node String Formats
@subsubsection String Formats

@cindex string formats
The A and AHEX formats are the only ones that may be assigned to string
variables.  Neither format allows any decimal places.

In A format, the entire field is treated as a string value.  The field
width may range from 1 to 32,767, the maximum string width.  The default
output format is the same as the input format.

In AHEX format, the field is composed of characters in a string encoded
as hex digit pairs.  On output, hex digits are output in uppercase; on
input, uppercase and lowercase are both accepted.  The default output
format is A format with half the input width.

@node Scratch Variables
@subsection Scratch Variables

@cindex scratch variables
Most of the time, variables don't retain their values between cases.
Instead, either they're being read from a data file or the active dataset,
in which case they assume the value read, or, if created with
@cmd{COMPUTE} or
another transformation, they're initialized to the system-missing value
or to blanks, depending on type.

However, sometimes it's useful to have a variable that keeps its value
between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
use a @dfn{scratch variable}.  Scratch variables are variables whose
names begin with an octothorpe (@samp{#}).

Scratch variables have the same properties as variables left with
@cmd{LEAVE}: they retain their values between cases, and for the first
case they are initialized to 0 or blanks.  They have the additional
property that they are deleted before the execution of any procedure.
For this reason, scratch variables can't be used for analysis.  To use
a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
to copy its value into an ordinary variable, then use that ordinary
variable in the analysis.

@node Files
@section Files Used by @pspp{}

@pspp{} makes use of many files each time it runs.  Some of these it
reads, some it writes, some it creates.  Here is a table listing the
most important of these files:

@table @strong
@cindex file, command
@cindex file, syntax file
@cindex command file
@cindex syntax file
@item command file
@itemx syntax file
These names (synonyms) refer to the file that contains instructions
that tell @pspp{} what to do.  The syntax file's name is specified on
the @pspp{} command line.  Syntax files can also be read with
@cmd{INCLUDE} (@pxref{INCLUDE}).

@cindex file, data
@cindex data file
@item data file
Data files contain raw data in text or binary format.  Data can also
be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.

@cindex file, output
@cindex output file
@item listing file
One or more output files are created by @pspp{} each time it is
run.  The output files receive the tables and charts produced by
statistical procedures.  The output files may be in any number of formats,
depending on how @pspp{} is configured.

@cindex system file
@cindex file, system
@item system file
System files are binary files that store a dictionary and a set of
cases.  @cmd{GET} and @cmd{SAVE} read and write system files.

@cindex portable file
@cindex file, portable
@item portable file
Portable files are files in a text-based format that store a dictionary
and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
portable files.
@end table

@node File Handles
@section File Handles
@cindex file handles

A @dfn{file handle} is a reference to a data file, system file, or
portable file.  Most often, a file handle is specified as the
name of a file as a string, that is, enclosed within @samp{'} or
@samp{"}.

A file name string that begins or ends with @samp{|} is treated as the
name of a command to pipe data to or from.  You can use this feature
to read data over the network using a program such as @samp{curl}
(@i{e.g.}@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
read compressed data from a file using a program such as @samp{zcat}
(@i{e.g.}@: @code{GET '|zcat mydata.sav.gz'}), and for many other
purposes.

@pspp{} also supports declaring named file handles with the @cmd{FILE
HANDLE} command.  This command associates an identifier of your choice
(the file handle's name) with a file.  Later, the file handle name can
be substituted for the name of the file.  When @pspp{} syntax accesses a
file multiple times, declaring a named file handle simplifies updating
the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
also required to read data files in binary formats.  @xref{FILE HANDLE},
for more information.

In some circumstances, @pspp{} must distinguish whether a file handle
refers to a system file or a portable file.  When this is necessary to
read a file, @i{e.g.}@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
@pspp{} uses the file's contents to decide.  In the context of writing a
file, @i{e.g.}@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, @pspp{}
decides based on the file's name: if it ends in @samp{.por} (with any
capitalization), then @pspp{} writes a portable file; otherwise, @pspp{}
writes a system file.

INLINE is reserved as a file handle name.  It refers to the ``data
file'' embedded into the syntax file between @cmd{BEGIN DATA} and
@cmd{END DATA}.  @xref{BEGIN DATA}, for more information.

The file to which a file handle refers may be reassigned on a later
@cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
HANDLE}.  @xref{CLOSE FILE HANDLE}, for
more information.

@node BNF
@section Backus-Naur Form
@cindex BNF
@cindex Backus-Naur Form
@cindex command syntax, description of
@cindex description of command syntax

The syntax of some parts of the @pspp{} language is presented in this
manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
following table describes BNF:

@itemize @bullet
@cindex keywords
@cindex terminals
@item
Words in all-uppercase are @pspp{} keyword tokens.  In BNF, these are
often called @dfn{terminals}.  There are some special terminals, which
are written in lowercase for clarity:

@table @asis
@cindex @code{number}
@item @code{number}
A real number.

@cindex @code{integer}
@item @code{integer}
An integer number.

@cindex @code{string}
@item @code{string}
A string.

@cindex @code{var-name}
@item @code{var-name}
A single variable name.

@cindex operators
@cindex punctuators
@item @code{=}, @code{/}, @code{+}, @code{-}, etc.
Operators and punctuators.

@cindex @code{.}
@item @code{.}
The end of the command.  This is not necessarily an actual dot in the
syntax file (@pxref{Commands}).
@end table

@item
@cindex productions
@cindex nonterminals
Other words in all lowercase refer to BNF definitions, called
@dfn{productions}.  These productions are also known as
@dfn{nonterminals}.  Some nonterminals are very common, so they are
defined here in English for clarity:

@table @code
@cindex @code{var-list}
@item var-list
A list of one or more variable names or the keyword @code{ALL}.

@cindex @code{expression}
@item expression
An expression.  @xref{Expressions}, for details.
@end table

@item
@cindex ``is defined as''
@cindex productions
@samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
the name of the nonterminal being defined.  The right side of @samp{::=}
gives the definition of that nonterminal.  If the right side is empty,
then one possible expansion of that nonterminal is nothing.  A BNF
definition is called a @dfn{production}.

@item
@cindex terminals and nonterminals, differences
So, the key difference between a terminal and a nonterminal is that a
terminal cannot be broken into smaller parts---in fact, every terminal
is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
composed of a (possibly empty) sequence of terminals and nonterminals.
Thus, terminals indicate the deepest level of syntax description.  (In
parsing theory, terminals are the leaves of the parse tree; nonterminals
form the branches.)

@item
@cindex start symbol
@cindex symbol, start
The first nonterminal defined in a set of productions is called the
@dfn{start symbol}.  The start symbol defines the entire syntax for
that command.
@end itemize