File: language.texi

package info (click to toggle)
pspp 2.0.1-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 66,676 kB
sloc: ansic: 267,210; xml: 18,446; sh: 5,534; python: 2,881; makefile: 125; perl: 64
file content (1567 lines) | stat: -rw-r--r-- 59,433 bytes
@c PSPP - a program for statistical analysis.
@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
@c A copy of the license is included in the section entitled "GNU
@c Free Documentation License".
@c
@node Language
@chapter The @pspp{} language
@cindex language, @pspp{}
@cindex @pspp{}, language

This chapter discusses elements common to many @pspp{} commands.
Later chapters describe individual commands in detail.

@menu
* Tokens::                      Characters combine to form tokens.
* Commands::                    Tokens combine to form commands.
* Syntax Variants::             Batch vs. Interactive mode
* Types of Commands::           Commands come in several flavors.
* Order of Commands::           Commands combine to form syntax files.
* Missing Observations::        Handling missing observations.
* Datasets::                    Data organization.
* Files::                       Files used by @pspp{}.
* File Handles::                How files are named.
* BNF::                         How command syntax is described.
@end menu


@node Tokens
@section Tokens
@cindex language, lexical analysis
@cindex language, tokens
@cindex tokens
@cindex lexical analysis

@pspp{} divides most syntax file lines into series of short chunks
called @dfn{tokens}.
Tokens are then grouped to form commands, each of which tells
@pspp{} to take some action---read in data, write out data, perform
a statistical procedure, etc.  Each type of token is
described below.

@table @strong
@cindex identifiers
@item Identifiers
Identifiers are names that typically specify variables, commands, or
subcommands.  The first character in an identifier must be a letter,
@samp{#}, or @samp{@@}.  The remaining characters in the identifier
must be letters, digits, or one of the following special characters:

@example
@center @.  _  $  #  @@
@end example

@cindex case-sensitivity
Identifiers may be any length, but only the first 64 bytes are
significant.  Identifiers are not case-sensitive: @code{foobar},
@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
different representations of the same identifier.

@cindex identifiers, reserved
@cindex reserved identifiers
Some identifiers are reserved.  Reserved identifiers may not be used
in any context besides those explicitly described in this manual.  The
reserved identifiers are:

@example
@center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
@end example

@item Keywords
Keywords are a subclass of identifiers that form a fixed part of
command syntax.  For example, command and subcommand names are
keywords.  Keywords may be abbreviated to their first 3 characters if
this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
characters are also accepted: @samp{FRE}, @samp{FREQ}, and
@samp{FREQUENCIES} are equivalent when the last is a keyword.)

Reserved identifiers are always used as keywords.  Other identifiers
may be used both as keywords and as user-defined identifiers, such as
variable names.

@item Numbers
@cindex numbers
@cindex integers
@cindex reals
Numbers are expressed in decimal.  A decimal point is optional.
Numbers may be expressed in scientific notation by adding @samp{e} and
a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
are some more examples of valid numbers:

@example
-5  3.14159265359  1e100  -.707  8945.
@end example

Negative numbers are expressed with a @samp{-} prefix.  However, in
situations where a literal @samp{-} token is expected, what appears to
be a negative number is treated as @samp{-} followed by a positive
number.

No white space is allowed within a number token, except for horizontal
white space between @samp{-} and the rest of the number.

The last example above, @samp{8945.} is interpreted as two
tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
@xref{Commands, , Forming commands of tokens}.

@item Strings
@cindex strings
@cindex @samp{'}
@cindex @samp{"}
@cindex case-sensitivity
Strings are literal sequences of characters enclosed in pairs of
single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
character used for quoting in the string, double it, @i{e.g.}@:
@samp{'it''s an apostrophe'}.  White space and case of letters are
significant inside strings.

Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
'c'} is equivalent to @samp{'abc'}.  So that a long string may be
broken across lines, a line break may precede or follow, or both
precede and follow, the @samp{+}.  (However, an entirely blank line
preceding or following the @samp{+} is interpreted as ending the
current command.)

Strings may also be expressed as hexadecimal character values by
prefixing the initial quote character by @samp{x} or @samp{X}.
Regardless of the syntax file or active dataset's encoding, the
hexadecimal digits in the string are interpreted as Unicode characters
in UTF-8 encoding.

Individual Unicode code points may also be expressed by specifying the
hexadecimal code point number in single or double quotes preceded by
@samp{u} or @samp{U}.  For example, Unicode code point U+1D11E, the
musical G clef character, could be expressed as @code{U'1D11E'}.
Invalid Unicode code points (above U+10FFFF or in between U+D800 and
U+DFFF) are not allowed.

When strings are concatenated with @samp{+}, each segment's prefix is
considered individually.  For example, @code{'The G clef symbol is:' +
u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
plain text string.

@item Punctuators and Operators
@cindex punctuators
@cindex operators
These tokens are the punctuators and operators:

@example
@center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
@end example

Most of these appear within the syntax of commands, but the period
(@samp{.}) punctuator is used only at the end of a command.  It is a
punctuator only as the last character on a line (except white space).
When it is the last non-space character on a line, a period is not
treated as part of another token, even if it would otherwise be part
of, @i{e.g.}@:, an identifier or a floating-point number.
@end table

@node Commands
@section Forming commands of tokens

@cindex @pspp{}, command structure
@cindex language, command structure
@cindex commands, structure

Most @pspp{} commands share a common structure.  A command begins with a
command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
CASES}.  The command name may be abbreviated to its first word, and
each word in the command name may be abbreviated to its first three
or more characters, where these abbreviations are unambiguous.

The command name may be followed by one or more @dfn{subcommands}.
Each subcommand begins with a subcommand name, which may be
abbreviated to its first three letters.  Some subcommands accept a
series of one or more specifications, which follow the subcommand
name, optionally separated from it by an equals sign
(@samp{=}). Specifications may be separated from each other
by commas or spaces.  Each subcommand must be separated from the next (if any)
by a forward slash (@samp{/}).

There are multiple ways to mark the end of a command.  The most common
way is to end the last line of the command with a period (@samp{.}) as
described in the previous section (@pxref{Tokens}).  A blank line, or
one that consists only of white space or comments, also ends a command.

@node Syntax Variants
@section Syntax Variants

@cindex Batch syntax
@cindex Interactive syntax

There are three variants of command syntax, which vary only in how
they detect the end of one command and the start of the next.

In @dfn{interactive mode}, which is the default for syntax typed at a
command prompt, a period as the last non-blank character on a line
ends a command.  A blank line also ends a command.

In @dfn{batch mode}, an end-of-line period or a blank line also ends a
command.  Additionally, it treats any line that has a non-blank
character in the leftmost column as beginning a new command.  Thus, in
batch mode the second and subsequent lines in a command must be
indented.

Regardless of the syntax mode, a plus sign, minus sign, or period in
the leftmost column of a line is ignored and causes that line to begin
a new command.  This is most useful in batch mode, in which the first
line of a new command could not otherwise be indented, but it is
accepted regardless of syntax mode.

The default mode for reading commands from a file is @dfn{auto mode}.
It is the same as batch mode, except that a line with a non-blank in
the leftmost column only starts a new command if that line begins with
the name of a @pspp{} command.  This correctly interprets most valid @pspp{}
syntax files regardless of the syntax mode for which they are
intended.

The @option{--interactive} (or @option{-i}) or @option{--batch} (or
@option{-b}) options set the syntax mode for files listed on the @pspp{}
command line.  @xref{Main Options}, for more details.

@node Types of Commands
@section Types of Commands

Commands in @pspp{} are divided roughly into six categories:

@table @strong
@item Utility commands
@cindex utility commands
Set or display various global options that affect @pspp{} operations.
May appear anywhere in a syntax file.  @xref{Utilities, , Utility
commands}.

@item File definition commands
@cindex file definition commands
Give instructions for reading data from text files or from special
binary ``system files''.  Most of these commands replace any previous
data or variables with new data or
variables.  At least one file definition command must appear before the first command in any of
the categories below.  @xref{Data Input and Output}.

@item Input program commands
@cindex input program commands
Though rarely used, these provide tools for reading data files
in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.

@item Transformations
@cindex transformations
Perform operations on data and write data to output files.  Transformations
are not carried out until a procedure is executed.

@item Restricted transformations
@cindex restricted transformations
Transformations that cannot appear in certain contexts.  @xref{Order
of Commands}, for details.

@item Procedures
@cindex procedures
Analyze data, writing results of analyses to the listing file.  Cause
transformations specified earlier in the file to be performed.  In a
more general sense, a @dfn{procedure} is any command that causes the
active dataset (the data) to be read.
@end table

@node Order of Commands
@section Order of Commands
@cindex commands, ordering
@cindex order of commands

@pspp{} does not place many restrictions on ordering of commands.  The
main restriction is that variables must be defined before they are otherwise
referenced.  This section describes the details of command ordering,
but most users will have no need to refer to them.

@pspp{} possesses five internal states, called @dfn{initial}, @dfn{input-program}
@dfn{file-type}, @dfn{transformation}, and @dfn{procedure} states.  (Please note the
distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
@emph{commands} and the @dfn{input-program} and @dfn{file-type} @emph{states}.)

@pspp{} starts in the initial state.  Each successful completion
of a command may cause a state transition.  Each type of command has its
own rules for state transitions:

@table @strong
@item Utility commands
@itemize @bullet
@item
Valid in any state.
@item
Do not cause state transitions.  Exception: when @cmd{N OF CASES}
is executed in the procedure state, it causes a transition to the
transformation state.
@end itemize

@item @cmd{DATA LIST}
@itemize @bullet
@item
Valid in any state.
@item
When executed in the initial or procedure state, causes a transition to
the transformation state.
@item
Clears the active dataset if executed in the procedure or transformation
state.
@end itemize

@item @cmd{INPUT PROGRAM}
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Causes a transition to the intput-program state.
@item
Clears the active dataset.
@end itemize

@item @cmd{FILE TYPE}
@itemize @bullet
@item
Invalid in intput-program and file-type states.
@item
Causes a transition to the file-type state.
@item
Clears the active dataset.
@end itemize

@item Other file definition commands
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Cause a transition to the transformation state.
@item
Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
and @cmd{UPDATE}.
@end itemize

@item Transformations
@itemize @bullet
@item
Invalid in initial and file-type states.
@item
Cause a transition to the transformation state.
@end itemize

@item Restricted transformations
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the transformation state.
@end itemize

@item Procedures
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the procedure state.
@end itemize
@end table

@node Missing Observations
@section Handling missing observations
@cindex missing values
@cindex values, missing

@pspp{} includes special support for unknown numeric data values.
Missing observations are assigned a special value, called the
@dfn{system-missing value}.  This ``value'' actually indicates the
absence of a value; it means that the actual value is unknown.  Procedures
automatically exclude from analyses those observations or cases that
have missing values.  Details of missing value exclusion depend on the
procedure and can often be controlled by the user; refer to
descriptions of individual procedures for details.

The system-missing value exists only for numeric variables.  String
variables always have a defined value, even if it is only a string of
spaces.

Variables, whether numeric or string, can have designated
@dfn{user-missing values}.  Every user-missing value is an actual value
for that variable.  However, most of the time user-missing values are
treated in the same way as the system-missing value.

For more information on missing values, see the following sections:
@ref{Datasets}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
documentation on individual procedures for information on how they
handle missing values.

@node Datasets
@section Datasets
@cindex dataset
@cindex variable
@cindex dictionary

@pspp{} works with data organized into @dfn{datasets}.  A dataset
consists of a set of @dfn{variables}, which taken together are said to
form a @dfn{dictionary}, and one or more @dfn{cases}, each of which
has one value for each variable.

At any given time @pspp{} has exactly one distinguished dataset, called
the @dfn{active dataset}.  Most @pspp{} commands work only with the
active dataset.  In addition to the active dataset, @pspp{} also supports
any number of additional open datasets.  The @cmd{DATASET} commands
can choose a new active dataset from among those that are open, as
well as create and destroy datasets (@pxref{DATASET}).

The sections below describe variables in more detail.

@menu
* Attributes::                  Attributes of variables.
* System Variables::            Variables automatically defined by @pspp{}.
* Sets of Variables::           Lists of variable names.
* Input and Output Formats::    Input and output formats.
* Scratch Variables::           Variables deleted by procedures.
@end menu

@node Attributes
@subsection Attributes of Variables
@cindex variables, attributes of
@cindex attributes of variables
Each variable has a number of attributes, including:

@table @strong
@item Name
An identifier, up to 64 bytes long.  Each variable must have a different name.
@xref{Tokens}.

Some system variable names begin with @samp{$}, but user-defined
variables' names may not begin with @samp{$}.

@cindex @samp{.}
@cindex period
@cindex variable names, ending with period
The final character in a variable name should not be @samp{.}, because
such an identifier will be misinterpreted when it is the final token
on a line: @code{FOO.} is divided into two separate tokens,
@samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.

@cindex @samp{_}
The final character in a variable name should not be @samp{_}, because
some such identifiers are used for special purposes by @pspp{}
procedures.

As with all @pspp{} identifiers, variable names are not case-sensitive.
@pspp{} capitalizes variable names on output the same way they were
capitalized at their point of definition in the input.

@cindex variables, type
@cindex type of variables
@item Type
Numeric or string.

@cindex variables, width
@cindex width of variables
@item Width
(string variables only) String variables with a width of 8 characters or
fewer are called @dfn{short string variables}.  Short string variables
may be used in a few contexts where @dfn{long string variables} (those
with widths greater than 8) are not allowed.

@item Position
Variables in the dictionary are arranged in a specific order.
@cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.

@item Initialization
Either reinitialized to 0 or spaces for each case, or left at its
existing value.  @xref{LEAVE}.

@cindex missing values
@cindex values, missing
@item Missing values
Optionally, up to three values, or a range of values, or a specific
value plus a range, can be specified as @dfn{user-missing values}.
There is also a @dfn{system-missing value} that is assigned to an
observation when there is no other obvious value for that observation.
Observations with missing values are automatically excluded from
analyses.  User-missing values are actual data values, while the
system-missing value is not a value at all.  @xref{Missing Observations}.

@cindex variable labels
@cindex labels, variable
@item Variable label
A string that describes the variable.  @xref{VARIABLE LABELS}.

@cindex value labels
@cindex labels, value
@item Value label
Optionally, these associate each possible value of the variable with a
string.  @xref{VALUE LABELS}.

@cindex print format
@item Print format
Display width, format, and (for numeric variables) number of decimal
places.  This attribute does not affect how data are stored, just how
they are displayed.  Example: a width of 8, with 2 decimal places.
@xref{Input and Output Formats}.

@cindex write format
@item Write format
Similar to print format, but used by the @cmd{WRITE} command
(@pxref{WRITE}).

@cindex measurement level
@item Measurement level
@anchor{Measurement Level}
One of the following:

@table @asis
@item Nominal
Each value of a nominal variable represents a distinct category.  The
possible categories are finite and often have value labels.  The order
of categories is not significant.  Political parties, US states, and
yes/no choices are nominal.  Numeric and string variables can be
nominal.

@item Ordinal
Ordinal variables also represent distinct categories, but their values
are arranged according to some natural order.  Likert scales, e.g.@:
from strongly disagree to strongly agree, are ordinal.  Data grouped
into ranges, e.g.@: age groups or income groups, are ordinal.  Both
numeric and string variables can be ordinal.  String values are
ordered alphabetically, so letter grades from A to F will work as
expected, but @code{poor}, @code{satisfactory}, @code{excellent} will
not.

@item Scale
Scale variables are ones for which differences and ratios are
meaningful.  These are often values which have a natural unit
attached, such as age in years, income in dollars, or distance in
miles.  Only numeric variables are scalar.
@end table

Variables created by @cmd{COMPUTE} and similar transformations,
obtained from external sources, etc., initially have an unknown
measurement level.  Any procedure that reads the data will then assign
a default measurement level.  @pspp{} can assign some defaults without
reading the data:

@itemize @bullet
@item
Nominal, if it's a string variable.

@item
Nominal, if the variable has a WKDAY or MONTH print format.

@item
Scale, if the variable has a DOLLAR, CCA through CCE, or time or date
print format.
@end itemize

Otherwise, @pspp{} reads the data and decides based on its
distribution:

@itemize @bullet
@item
Nominal, if all observations are missing.

@item
Scale, if one or more valid observations are noninteger or negative.

@item
Scale, if no valid observation is less than 10.

@item
Scale, if the variable has 24 or more unique valid values.  The value
24 is the default and can be adjusted (@pxref{SET SCALEMIN}).
@end itemize

Finally, if none of the above is true, @pspp{} assigns the variable a
nominal measurement level.

@cindex custom attributes
@item Custom attributes
User-defined associations between names and values.  @xref{VARIABLE
ATTRIBUTE}.

@cindex variable role
@item Role
The intended role of a variable for use in dialog boxes in graphical
user interfaces.  @xref{VARIABLE ROLE}.
@end table

@node System Variables
@subsection Variables Automatically Defined by @pspp{}
@cindex system variables
@cindex variables, system

There are seven system variables.  These are not like ordinary
variables because system variables are not always stored.  They can be used only
in expressions.  These system variables, whose values and output formats
cannot be modified, are described below.

@table @code
@cindex @code{$CASENUM}
@item $CASENUM
Case number of the case at the moment.  This changes as cases are
shuffled around.

@cindex @code{$DATE}
@item $DATE
Date the @pspp{} process was started, in format A9, following the
pattern @code{DD-MMM-YY}.

@cindex @code{$DATE11}
@item $DATE11
Date the @pspp{} process was started, in format A11, following the
pattern @code{DD-MMM-YYYY}.

@cindex @code{$JDATE}
@item $JDATE
Number of days between 15 Oct 1582 and the time the @pspp{} process
was started.

@cindex @code{$LENGTH}
@item $LENGTH
Page length, in lines, in format F11.

@cindex @code{$SYSMIS}
@item $SYSMIS
System missing value, in format F1.

@cindex @code{$TIME}
@item $TIME
Number of seconds between midnight 14 Oct 1582 and the time the active dataset
was read, in format F20.

@cindex @code{$WIDTH}
@item $WIDTH
Page width, in characters, in format F3.
@end table

@node Sets of Variables
@subsection Lists of variable names
@cindex @code{TO} convention
@cindex convention, @code{TO}

To refer to a set of variables, list their names one after another.
Optionally, their names may be separated by commas.  To include a
range of variables from the dictionary in the list, write the name of
the first and last variable in the range, separated by @code{TO}.  For
instance, if the dictionary contains six variables with the names
@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
variables @code{X2}, @code{GOAL}, and @code{MET}.

Commands that define variables, such as @cmd{DATA LIST}, give
@code{TO} an alternate meaning.  With these commands, @code{TO} define
sequences of variables whose names end in consecutive integers.  The
syntax is two identifiers that begin with the same root and end with
numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
@code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.

After a set of variables has been defined with @cmd{DATA LIST} or
another command with this method, the same set can be referenced on
later commands using the same syntax.

@node Input and Output Formats
@subsection Input and Output Formats

@cindex formats
An @dfn{input format} describes how to interpret the contents of an
input field as a number or a string.  It might specify that the field
contains an ordinary decimal number, a time or date, a number in binary
or hexadecimal notation, or one of several other notations.  Input
formats are used by commands such as @cmd{DATA LIST} that read data or
syntax files into the @pspp{} active dataset.

Every input format corresponds to a default @dfn{output format} that
specifies the formatting used when the value is output later.  It is
always possible to explicitly specify an output format that resembles
the input format.  Usually, this is the default, but in cases where the
input format is unfriendly to human readability, such as binary or
hexadecimal formats, the default output format is an easier-to-read
decimal format.

Every variable has two output formats, called its @dfn{print format} and
@dfn{write format}.  Print formats are used in most output contexts;
write formats are used only by @cmd{WRITE} (@pxref{WRITE}).  Newly
created variables have identical print and write formats, and
@cmd{FORMATS}, the most commonly used command for changing formats
(@pxref{FORMATS}), sets both of them to the same value as well.  Thus,
most of the time, the distinction between print and write formats is
unimportant.

Input and output formats are specified to @pspp{} with
a @dfn{format specification} of the
form @subcmd{@var{TYPE}@var{w}} or @code{TYPE@var{w}.@var{d}}, where
@var{TYPE} is one of the format types described later, @var{w} is a
field width measured in columns, and @var{d} is an optional number of
decimal places.  If @var{d} is omitted, a value of 0 is assumed.  Some
formats do not allow a nonzero @var{d} to be specified.

The following sections describe the input and output formats supported
by @pspp{}.

@menu
* Basic Numeric Formats::
* Custom Currency Formats::
* Legacy Numeric Formats::
* Binary and Hexadecimal Numeric Formats::
* Time and Date Formats::
* Date Component Formats::
* String Formats::
@end menu

@node Basic Numeric Formats
@subsubsection Basic Numeric Formats

@cindex numeric formats
The basic numeric formats are used for input and output of real numbers
in standard or scientific notation.  The following table shows an
example of how each format displays positive and negative numbers with
the default decimal point setting:

@float
@multitable {DOLLAR10.2} {@code{@tie{}$3,141.59}} {@code{-$3,141.59}}
@headitem Format @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
@item F8.2       @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
@item COMMA9.2   @tab @code{@tie{}3,141.59}  @tab @code{-3,141.59}
@item DOT9.2     @tab @code{@tie{}3.141,59}  @tab @code{-3.141,59}
@item DOLLAR10.2 @tab @code{@tie{}$3,141.59} @tab @code{-$3,141.59}
@item PCT9.2     @tab @code{@tie{}3141.59%}  @tab @code{-3141.59%}
@item E8.1       @tab @code{@tie{}3.1E+003}  @tab @code{-3.1E+003}
@end multitable
@end float

On output, numbers in F format are expressed in standard decimal
notation with the requested number of decimal places.  The other formats
output some variation on this style:

@itemize @bullet
@item
Numbers in COMMA format are additionally grouped every three digits by
inserting a grouping character.  The grouping character is ordinarily a
comma, but it can be changed to a period (@pxref{SET DECIMAL}).

@item
DOT format is like COMMA format, but it interchanges the role of the
decimal point and grouping characters.  That is, the current grouping
character is used as a decimal point and vice versa.

@item
DOLLAR format is like COMMA format, but it prefixes the number with
@samp{$}.

@item
PCT format is like F format, but adds @samp{%} after the number.

@item
The E format always produces output in scientific notation.
@end itemize

On input, the basic numeric formats accept positive and numbers in
standard decimal notation or scientific notation.  Leading and trailing
spaces are allowed.  An empty or all-spaces field, or one that contains
only a single period, is treated as the system missing value.

In scientific notation, the exponent may be introduced by a sign
(@samp{+} or @samp{-}), or by one of the letters @samp{e} or @samp{d}
(in uppercase or lowercase), or by a letter followed by a sign.  A
single space may follow the letter or the sign or both.

On fixed-format @cmd{DATA LIST} (@pxref{DATA LIST FIXED}) and in a few
other contexts, decimals are implied when the field does not contain a
decimal point.  In F6.5 format, for example, the field @code{314159} is
taken as the value 3.14159 with implied decimals.  Decimals are never
implied if an explicit decimal point is present or if scientific
notation is used.

E and F formats accept the basic syntax already described.  The other
formats allow some additional variations:

@itemize @bullet
@item
COMMA, DOLLAR, and DOT formats ignore grouping characters within the
integer part of the input field.  The identity of the grouping
character depends on the format.

@item
DOLLAR format allows a dollar sign to precede the number.  In a negative
number, the dollar sign may precede or follow the minus sign.

@item
PCT format allows a percent sign to follow the number.
@end itemize

All of the basic number formats have a maximum field width of 40 and
accept no more than 16 decimal places, on both input and output.  Some
additional restrictions apply:

@itemize @bullet
@item
As input formats, the basic numeric formats allow no more decimal places
than the field width.  As output formats, the field width must be
greater than the number of decimal places; that is, large enough to
allow for a decimal point and the number of requested decimal places.
DOLLAR and PCT formats must allow an additional column for @samp{$} or
@samp{%}.

@item
The default output format for a given input format increases the field
width enough to make room for optional input characters.  If an input
format calls for decimal places, the width is increased by 1 to make
room for an implied decimal point.  COMMA, DOT, and DOLLAR formats also
increase the output width to make room for grouping characters.  DOLLAR
and PCT further increase the output field width by 1 to make room for
@samp{$} or @samp{%}.  The increased output width is capped at 40, the
maximum field width.

@item
The E format is exceptional.  For output, E format has a minimum width
of 7 plus the number of decimal places.  The default output format for
an E input format is an E format with at least 3 decimal places and
thus a minimum width of 10.
@end itemize

More details of basic numeric output formatting are given below:

@itemize @bullet
@item
Output rounds to nearest, with ties rounded away from zero.  Thus, 2.5
is output as @code{3} in F1.0 format, and -1.125 as @code{-1.13} in F5.1
format.

@item
The system-missing value is output as a period in a field of spaces,
placed in the decimal point's position, or in the rightmost column if no
decimal places are requested.  A period is used even if the decimal
point character is a comma.

@item
A number that does not fill its field is right-justified within the
field.

@item
A number is too large for its field causes decimal places to be dropped
to make room.  If dropping decimals does not make enough room,
scientific notation is used if the field is wide enough.  If a number
does not fit in the field, even in scientific notation, the overflow is
indicated by filling the field with asterisks (@samp{*}).

@item
COMMA, DOT, and DOLLAR formats insert grouping characters only if space
is available for all of them.  Grouping characters are never inserted
when all decimal places must be dropped.  Thus, 1234.56 in COMMA5.2
format is output as @samp{@tie{}1235} without a comma, even though there
is room for one, because all decimal places were dropped.

@item
DOLLAR or PCT format drop the @samp{$} or @samp{%} only if the number
would not fit at all without it.  Scientific notation with @samp{$} or
@samp{%} is preferred to ordinary decimal notation without it.

@item
Except in scientific notation, a decimal point is included only when
it is followed by a digit.  If the integer part of the number being
output is 0, and a decimal point is included, then @pspp{} ordinarily
drops the zero before the decimal point.  However, in @code{F},
@code{COMMA}, or @code{DOT} formats, @pspp{} keeps the zero if
@code{SET LEADZERO} is set to @code{ON} (@pxref{SET LEADZERO}).

In scientific notation, the number always includes a decimal point,
even if it is not followed by a digit.

@item
A negative number includes a minus sign only in the presence of a
nonzero digit: -0.01 is output as @samp{-.01} in F4.2 format but as
@samp{@tie{}@tie{}.0} in F4.1 format.  Thus, a ``negative zero'' never
includes a minus sign.

@item
In negative numbers output in DOLLAR format, the dollar sign follows the
negative sign.  Thus, -9.99 in DOLLAR6.2 format is output as
@code{-$9.99}.

@item
In scientific notation, the exponent is output as @samp{E} followed by
@samp{+} or @samp{-} and exactly three digits.  Numbers with magnitude
less than 10**-999 or larger than 10**999 are not supported by most
computers, but if they are supported then their output is considered
to overflow the field and they are output as asterisks.

@item
On most computers, no more than 15 decimal digits are significant in
output, even if more are printed.  In any case, output precision cannot
be any higher than input precision; few data sets are accurate to 15
digits of precision.  Unavoidable loss of precision in intermediate
calculations may also reduce precision of output.

@item
Special values such as infinities and ``not a number'' values are
usually converted to the system-missing value before printing.  In a few
circumstances, these values are output directly.  In fields of width 3
or greater, special values are output as however many characters
fit from @code{+Infinity} or @code{-Infinity} for infinities, from
@code{NaN} for ``not a number,'' or from @code{Unknown} for other values
(if any are supported by the system).  In fields under 3 columns wide,
special values are output as asterisks.
@end itemize

@node Custom Currency Formats
@subsubsection Custom Currency Formats

@cindex currency formats
The custom currency formats are closely related to the basic numeric
formats, but they allow users to customize the output format.  The
SET command configures custom currency formats, using the syntax
@display
SET CC@var{x}=@t{"}@var{string}@t{"}.
@end display
@noindent
where @var{x} is A, B, C, D, or E, and @var{string} is no more than 16
characters long.

@var{string} must contain exactly three commas or exactly three periods
(but not both), except that a single quote character may be used to
``escape'' a following comma, period, or single quote.  If three commas
are used, commas are used for grouping in output, and a period
is used as the decimal point.  Uses of periods reverses these roles.

The commas or periods divide @var{string} into four fields, called the
@dfn{negative prefix}, @dfn{prefix}, @dfn{suffix}, and @dfn{negative
suffix}, respectively.  The prefix and suffix are added to output
whenever space is available.  The negative prefix and negative suffix
are always added to a negative number when the output includes a nonzero
digit.

The following syntax shows how custom currency formats could be used to
reproduce basic numeric formats:

@example
@group
SET CCA="-,,,".  /* Same as COMMA.
SET CCB="-...".  /* Same as DOT.
SET CCC="-,$,,". /* Same as DOLLAR.
SET CCD="-,,%,". /* Like PCT, but groups with commas.
@end group
@end example

Here are some more examples of custom currency formats.  The final
example shows how to use a single quote to escape a delimiter:

@example
@group
SET CCA=",EUR,,-".   /* Euro.
SET CCB="(,USD ,,)". /* US dollar.
SET CCC="-.R$..".    /* Brazilian real.
SET CCD="-,, NIS,".  /* Israel shekel.
SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
@end group
@end example

@noindent These formats would yield the following output:

@float
@multitable {CCD13.2} {@code{@tie{}@tie{}USD 3,145.59}} {@code{(USD 3,145.59)}}
@headitem Format @tab @code{@tie{}3145.59}         @tab @code{-3145.59}
@item CCA12.2 @tab @code{@tie{}EUR3,145.59}        @tab @code{EUR3,145.59-}
@item CCB14.2 @tab @code{@tie{}@tie{}USD 3,145.59} @tab @code{(USD 3,145.59)}
@item CCC11.2 @tab @code{@tie{}R$3.145,59}         @tab @code{-R$3.145,59}
@item CCD13.2 @tab @code{@tie{}3,145.59 NIS}       @tab @code{-3,145.59 NIS}
@item CCE10.0 @tab @code{@tie{}Rp. 3.146}          @tab @code{-Rp. 3.146}
@end multitable
@end float

The default for all the custom currency formats is @samp{-,,,},
equivalent to COMMA format.

@node Legacy Numeric Formats
@subsubsection Legacy Numeric Formats

The N and Z numeric formats provide compatibility with legacy file
formats.  They have much in common:

@itemize @bullet
@item
Output is rounded to the nearest representable value, with ties rounded
away from zero.

@item
Numbers too large to display are output as a field filled with asterisks
(@samp{*}).

@item
The decimal point is always implicitly the specified number of digits
from the right edge of the field, except that Z format input allows an
explicit decimal point.

@item
Scientific notation may not be used.

@item
The system-missing value is output as a period in a field of spaces.
The period is placed just to the right of the implied decimal point in
Z format, or at the right end in N format or in Z format if no decimal
places are requested.  A period is used even if the decimal point
character is a comma.

@item
Field width may range from 1 to 40.  Decimal places may range from 0 up
to the field width, to a maximum of 16.

@item
When a legacy numeric format used for input is converted to an output
format, it is changed into the equivalent F format.  The field width is
increased by 1 if any decimal places are specified, to make room for a
decimal point.  For Z format, the field width is increased by 1 more
column, to make room for a negative sign.  The output field width is
capped at 40 columns.
@end itemize

@subsubheading N Format

The N format supports input and output of fields that contain only
digits.  On input, leading or trailing spaces, a decimal point, or any
other non-digit character causes the field to be read as the
system-missing value.  As a special exception, an N format used on
@cmd{DATA LIST FREE} or @cmd{DATA LIST LIST} is treated as the
equivalent F format.

On output, N pads the field on the left with zeros.  Negative numbers
are output like the system-missing value.

@subsubheading Z Format

The Z format is a ``zoned decimal'' format used on IBM mainframes.  Z
format encodes the sign as part of the final digit, which must be one of
the following:
@example
0123456789
@{ABCDEFGHI
@}JKLMNOPQR
@end example
@noindent
where the characters in each row represent digits 0 through 9 in order.
Characters in the first two rows indicate a positive sign; those in the
third indicate a negative sign.

On output, Z fields are padded on the left with spaces.  On input,
leading and trailing spaces are ignored.  Any character in an input
field other than spaces, the digit characters above, and @samp{.} causes
the field to be read as system-missing.

The decimal point character for input and output is always @samp{.},
even if the decimal point character is a comma (@pxref{SET DECIMAL}).

Nonzero, negative values output in Z format are marked as negative even
when no nonzero digits are output.  For example, -0.2 is output in Z1.0
format as @samp{J}.  The ``negative zero'' value supported by most
machines is output as positive.

@node Binary and Hexadecimal Numeric Formats
@subsubsection Binary and Hexadecimal Numeric Formats

@cindex binary formats
@cindex hexadecimal formats
The binary and hexadecimal formats are primarily designed for
compatibility with existing machine formats, not for human readability.
All of them therefore have a F format as default output format.  Some of
these formats are only portable between machines with compatible byte
ordering (endianness) or floating-point format.

Binary formats use byte values that in text files are interpreted as
special control functions, such as carriage return and line feed.  Thus,
data in binary formats should not be included in syntax files or read
from data files with variable-length records, such as ordinary text
files.  They may be read from or written to data files with fixed-length
records.  @xref{FILE HANDLE}, for information on working with
fixed-length records.

@subsubheading P and PK Formats

These are binary-coded decimal formats, in which every byte (except the
last, in P format) represents two decimal digits.  The most-significant
4 bits of the first byte is the most-significant decimal digit, the
least-significant 4 bits of the first byte is the next decimal digit,
and so on.

In P format, the most-significant 4 bits of the last byte are the
least-significant decimal digit.  The least-significant 4 bits represent
the sign: decimal 15 indicates a negative value, decimal 13 indicates a
positive value.

Numbers are rounded downward on output.  The system-missing value and
numbers outside representable range are output as zero.

The maximum field width is 16.  Decimal places may range from 0 up to
the number of decimal digits represented by the field.

The default output format is an F format with twice the input field
width, plus one column for a decimal point (if decimal places were
requested).

@subsubheading IB and PIB Formats

These are integer binary formats.  IB reads and writes 2's complement
binary integers, and PIB reads and writes unsigned binary integers.  The
byte ordering is by default the host machine's, but SET RIB may be used
to select a specific byte ordering for reading (@pxref{SET RIB}) and
SET WIB, similarly, for writing (@pxref{SET WIB}).

The maximum field width is 8.  Decimal places may range from 0 up to the
number of decimal digits in the largest value representable in the field
width.

The default output format is an F format whose width is the number of
decimal digits in the largest value representable in the field width,
plus 1 if the format has decimal places.

@subsubheading RB Format

This is a binary format for real numbers.  By default it reads and
writes the host machine's floating-point format, but SET RRB may be
used to select an alternate floating-point format for reading
(@pxref{SET RRB}) and SET WRB, similarly, for writing (@pxref{SET
WRB}).

The recommended field width depends on the floating-point format.
NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use
a field width of 8.  ISL, ISB, VF, and ZS formats should use a field
width of 4.  Other field widths do not produce useful results.  The
maximum field width is 8.  No decimal places may be specified.

The default output format is F8.2.

@subsubheading PIBHEX and RBHEX Formats

These are hexadecimal formats, for reading and writing binary formats
where each byte has been recoded as a pair of hexadecimal digits.

A hexadecimal field consists solely of hexadecimal digits
@samp{0}@dots{}@samp{9} and @samp{A}@dots{}@samp{F}.  Uppercase and
lowercase are accepted on input; output is in uppercase.

Other than the hexadecimal representation, these formats are equivalent
to PIB and RB formats, respectively.  However, bytes in PIBHEX format
are always ordered with the most-significant byte first (big-endian
order), regardless of the host machine's native byte order or @pspp{}
settings.

Field widths must be even and between 2 and 16.  RBHEX format allows no
decimal places; PIBHEX allows as many decimal places as a PIB format
with half the given width.

@node Time and Date Formats
@subsubsection Time and Date Formats

@cindex time formats
@cindex date formats
In @pspp{}, a @dfn{time} is an interval.  The time formats translate
between human-friendly descriptions of time intervals and @pspp{}'s
internal representation of time intervals, which is simply the number of
seconds in the interval.  @pspp{} has three time formats:

@float
@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 01:31:17.01}}
@headitem Time Format @tab Template                  @tab Example
@item MTIME    @tab @code{MM:SS.ss}             @tab @code{91:17.01}
@item TIME     @tab @code{hh:MM:SS.ss}          @tab @code{01:31:17.01}
@item DTIME    @tab @code{DD HH:MM:SS.ss}       @tab @code{00 04:31:17.01}
@end multitable
@end float

A @dfn{date} is a moment in the past or the future.  Internally, @pspp{}
represents a date as the number of seconds since the @dfn{epoch},
midnight, Oct. 14, 1582.  The date formats translate between
human-readable dates and @pspp{}'s numeric representation of dates and
times.  @pspp{} has several date formats:

@float
@multitable {Date Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
@headitem Date Format @tab Template                  @tab Example
@item DATE     @tab @code{dd-mmm-yyyy}          @tab @code{01-OCT-1978}
@item ADATE    @tab @code{mm/dd/yyyy}           @tab @code{10/01/1978}
@item EDATE    @tab @code{dd.mm.yyyy}           @tab @code{01.10.1978}
@item JDATE    @tab @code{yyyyjjj}              @tab @code{1978274}
@item SDATE    @tab @code{yyyy/mm/dd}           @tab @code{1978/10/01}
@item QYR      @tab @code{q Q yyyy}             @tab @code{3 Q 1978}
@item MOYR     @tab @code{mmm yyyy}             @tab @code{OCT 1978}
@item WKYR     @tab @code{ww WK yyyy}           @tab @code{40 WK 1978}
@item DATETIME @tab @code{dd-mmm-yyyy HH:MM:SS.ss} @tab @code{01-OCT-1978 04:31:17.01}
@item YMDHMS   @tab @code{yyyy-mm-dd HH:MM:SS.ss} @tab @code{1978-01-OCT 04:31:17.01}
@end multitable
@end float

The templates in the preceding tables describe how the time and date
formats are input and output:

@table @code
@item dd
Day of month, from 1 to 31.  Always output as two digits.

@item mm
@itemx mmm
Month.  In output, @code{mm} is output as two digits, @code{mmm} as the
first three letters of an English month name (January, February,
@dots{}).  In input, both of these formats, plus Roman numerals, are
accepted.

@item yyyy
Year.  In output, DATETIME and YMDHMS always produce 4-digit years;
other formats can produce a 2- or 4-digit year.  The century assumed
for 2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).
In output, a year outside the epoch causes the whole field to be
filled with asterisks (@samp{*}).

@item jjj
Day of year (Julian day), from 1 to 366.  This is exactly three digits
giving the count of days from the start of the year.  January 1 is
considered day 1.

@item q
Quarter of year, from 1 to 4.  Quarters start on January 1, April 1,
July 1, and October 1.

@item ww
Week of year, from 1 to 53.  Output as exactly two digits.  January 1 is
the first day of week 1.

@item DD
Count of days, which may be positive or negative.  Output as at least
two digits.

@item hh
Count of hours, which may be positive or negative.  Output as at least
two digits.

@item HH
Hour of day, from 0 to 23.  Output as exactly two digits.

@item MM
In MTIME, count of minutes, which may be positive or negative.  Output
as at least two digits.

In other formats, minute of hour, from 0 to 59.  Output as exactly two
digits.

@item SS.ss
Seconds within minute, from 0 to 59.  The integer part is output as
exactly two digits.  On output, seconds and fractional seconds may or
may not be included, depending on field width and decimal places.  On
input, seconds and fractional seconds are optional.  The DECIMAL setting
controls the character accepted and displayed as the decimal point
(@pxref{SET DECIMAL}).
@end table

For output, the date and time formats use the delimiters indicated in
the table.  For input, date components may be separated by spaces or by
one of the characters @samp{-}, @samp{/}, @samp{.}, or @samp{,}, and
time components may be separated by spaces or @samp{:}.  On
input, the @samp{Q} separating quarter from year and the @samp{WK}
separating week from year may be uppercase or lowercase, and the spaces
around them are optional.

On input, all time and date formats accept any amount of leading and
trailing white space.

The maximum width for time and date formats is 40 columns.  Minimum
input and output width for each of the time and date formats is shown
below:

@float
@multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year}
@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option
@item DATE @tab 8 @tab 9 @tab 4-digit year
@item ADATE @tab 8 @tab 8 @tab 4-digit year
@item EDATE @tab 8 @tab 8 @tab 4-digit year
@item JDATE @tab 5 @tab 5 @tab 4-digit year
@item SDATE @tab 8 @tab 8 @tab 4-digit year
@item QYR @tab 4 @tab 6 @tab 4-digit year
@item MOYR @tab 6 @tab 6 @tab 4-digit year
@item WKYR @tab 6 @tab 8 @tab 4-digit year
@item DATETIME @tab 17 @tab 17 @tab seconds
@item YMDHMS @tab 12 @tab 16 @tab seconds
@item MTIME @tab 4 @tab 5
@item TIME @tab 5 @tab 5 @tab seconds
@item DTIME @tab 8 @tab 8 @tab seconds
@end multitable
@end float
@noindent
In the table, ``Option'' describes what increased output width enables:

@table @asis
@item 4-digit year
A field 2 columns wider than the minimum includes a 4-digit year.
(DATETIME and YMDHMS formats always include a 4-digit year.)

@item seconds
A field 3 columns wider than the minimum includes seconds as well as
minutes.  A field 5 columns wider than minimum, or more, can also
include a decimal point and fractional seconds (but no more than allowed
by the format's decimal places).
@end table

For the time and date formats, the default output format is the same as
the input format, except that @pspp{} increases the field width, if
necessary, to the minimum allowed for output.

Time or dates narrower than the field width are right-justified within
the field.

When a time or date exceeds the field width, characters are trimmed from
the end until it fits.  This can occur in an unusual situation, @i{e.g.}@:
with a year greater than 9999 (which adds an extra digit), or for a
negative value on MTIME, TIME, or DTIME (which adds a leading minus sign).

@c What about out-of-range values?

The system-missing value is output as a period at the right end of the
field.

@node Date Component Formats
@subsubsection Date Component Formats

The WKDAY and MONTH formats provide input and output for the names of
weekdays and months, respectively.

On output, these formats convert a number between 1 and 7, for WKDAY, or
between 1 and 12, for MONTH, into the English name of a day or month,
respectively.  If the name is longer than the field, it is trimmed to
fit.  If the name is shorter than the field, it is padded on the right
with spaces.  Values outside the valid range, and the system-missing
value, are output as all spaces.

On input, English weekday or month names (in uppercase or lowercase) are
converted back to their corresponding numbers.  Weekday and month names
may be abbreviated to their first 2 or 3 letters, respectively.

The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for
MONTH.  No decimal places are allowed.

The default output format is the same as the input format.

@node String Formats
@subsubsection String Formats

@cindex string formats
The A and AHEX formats are the only ones that may be assigned to string
variables.  Neither format allows any decimal places.

In A format, the entire field is treated as a string value.  The field
width may range from 1 to 32,767, the maximum string width.  The default
output format is the same as the input format.

In AHEX format, the field is composed of characters in a string encoded
as hex digit pairs.  On output, hex digits are output in uppercase; on
input, uppercase and lowercase are both accepted.  The default output
format is A format with half the input width.

@node Scratch Variables
@subsection Scratch Variables

@cindex scratch variables
Most of the time, variables don't retain their values between cases.
Instead, either they're being read from a data file or the active dataset,
in which case they assume the value read, or, if created with
@cmd{COMPUTE} or
another transformation, they're initialized to the system-missing value
or to blanks, depending on type.

However, sometimes it's useful to have a variable that keeps its value
between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
use a @dfn{scratch variable}.  Scratch variables are variables whose
names begin with an octothorpe (@samp{#}).

Scratch variables have the same properties as variables left with
@cmd{LEAVE}: they retain their values between cases, and for the first
case they are initialized to 0 or blanks.  They have the additional
property that they are deleted before the execution of any procedure.
For this reason, scratch variables can't be used for analysis.  To use
a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
to copy its value into an ordinary variable, then use that ordinary
variable in the analysis.

@node Files
@section Files Used by @pspp{}

@pspp{} makes use of many files each time it runs.  Some of these it
reads, some it writes, some it creates.  Here is a table listing the
most important of these files:

@table @strong
@cindex file, command
@cindex file, syntax file
@cindex command file
@cindex syntax file
@item command file
@itemx syntax file
These names (synonyms) refer to the file that contains instructions
that tell @pspp{} what to do.  The syntax file's name is specified on
the @pspp{} command line.  Syntax files can also be read with
@cmd{INCLUDE} (@pxref{INCLUDE}).

@cindex file, data
@cindex data file
@item data file
Data files contain raw data in text or binary format.  Data can also
be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.

@cindex file, output
@cindex output file
@item listing file
One or more output files are created by @pspp{} each time it is
run.  The output files receive the tables and charts produced by
statistical procedures.  The output files may be in any number of formats,
depending on how @pspp{} is configured.

@cindex system file
@cindex file, system
@item system file
System files are binary files that store a dictionary and a set of
cases.  @cmd{GET} and @cmd{SAVE} read and write system files.

@cindex portable file
@cindex file, portable
@item portable file
Portable files are files in a text-based format that store a dictionary
and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
portable files.
@end table

@node File Handles
@section File Handles
@cindex file handles

A @dfn{file handle} is a reference to a data file, system file, or
portable file.  Most often, a file handle is specified as the
name of a file as a string, that is, enclosed within @samp{'} or
@samp{"}.

A file name string that begins or ends with @samp{|} is treated as the
name of a command to pipe data to or from.  You can use this feature
to read data over the network using a program such as @samp{curl}
(@i{e.g.}@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
read compressed data from a file using a program such as @samp{zcat}
(@i{e.g.}@: @code{GET '|zcat mydata.sav.gz'}), and for many other
purposes.

@pspp{} also supports declaring named file handles with the @cmd{FILE
HANDLE} command.  This command associates an identifier of your choice
(the file handle's name) with a file.  Later, the file handle name can
be substituted for the name of the file.  When @pspp{} syntax accesses a
file multiple times, declaring a named file handle simplifies updating
the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
also required to read data files in binary formats.  @xref{FILE HANDLE},
for more information.

In some circumstances, @pspp{} must distinguish whether a file handle
refers to a system file or a portable file.  When this is necessary to
read a file, @i{e.g.}@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
@pspp{} uses the file's contents to decide.  In the context of writing a
file, @i{e.g.}@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, @pspp{}
decides based on the file's name: if it ends in @samp{.por} (with any
capitalization), then @pspp{} writes a portable file; otherwise, @pspp{}
writes a system file.

INLINE is reserved as a file handle name.  It refers to the ``data
file'' embedded into the syntax file between @cmd{BEGIN DATA} and
@cmd{END DATA}.  @xref{BEGIN DATA}, for more information.

The file to which a file handle refers may be reassigned on a later
@cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
HANDLE}.  @xref{CLOSE FILE HANDLE}, for
more information.

@node BNF
@section Backus-Naur Form
@cindex BNF
@cindex Backus-Naur Form
@cindex command syntax, description of
@cindex description of command syntax

The syntax of some parts of the @pspp{} language is presented in this
manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
following table describes BNF:

@itemize @bullet
@cindex keywords
@cindex terminals
@item
Words in all-uppercase are @pspp{} keyword tokens.  In BNF, these are
often called @dfn{terminals}.  There are some special terminals, which
are written in lowercase for clarity:

@table @asis
@cindex @code{number}
@item @code{number}
A real number.

@cindex @code{integer}
@item @code{integer}
An integer number.

@cindex @code{string}
@item @code{string}
A string.

@cindex @code{var-name}
@item @code{var-name}
A single variable name.

@cindex operators
@cindex punctuators
@item @code{=}, @code{/}, @code{+}, @code{-}, etc.
Operators and punctuators.

@cindex @code{.}
@item @code{.}
The end of the command.  This is not necessarily an actual dot in the
syntax file (@pxref{Commands}).
@end table

@item
@cindex productions
@cindex nonterminals
Other words in all lowercase refer to BNF definitions, called
@dfn{productions}.  These productions are also known as
@dfn{nonterminals}.  Some nonterminals are very common, so they are
defined here in English for clarity:

@table @code
@cindex @code{var-list}
@item var-list
A list of one or more variable names or the keyword @code{ALL}.

@cindex @code{expression}
@item expression
An expression.  @xref{Expressions}, for details.
@end table

@item
@cindex ``is defined as''
@cindex productions
@samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
the name of the nonterminal being defined.  The right side of @samp{::=}
gives the definition of that nonterminal.  If the right side is empty,
then one possible expansion of that nonterminal is nothing.  A BNF
definition is called a @dfn{production}.

@item
@cindex terminals and nonterminals, differences
So, the key difference between a terminal and a nonterminal is that a
terminal cannot be broken into smaller parts---in fact, every terminal
is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
composed of a (possibly empty) sequence of terminals and nonterminals.
Thus, terminals indicate the deepest level of syntax description.  (In
parsing theory, terminals are the leaves of the parse tree; nonterminals
form the branches.)

@item
@cindex start symbol
@cindex symbol, start
The first nonterminal defined in a set of productions is called the
@dfn{start symbol}.  The start symbol defines the entire syntax for
that command.
@end itemize