File: q2c.texi

package info (click to toggle)
pspp 0.10.2-1
  • links: PTS
  • area: main
  • in suites: stretch
  • size: 39,500 kB
  • ctags: 21,483
  • sloc: ansic: 226,289; sh: 13,141; xml: 11,886; perl: 1,000; lisp: 597; makefile: 119
file content (289 lines) | stat: -rw-r--r-- 9,523 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
@node q2c Input Format
@appendix @code{q2c} Input Format

PSPP statistical procedures have a bizarre and somewhat irregular
syntax.  Despite this, a parser generator has been written that
adequately addresses many of the possibilities and tries to provide
hooks for the exceptional cases.  This parser generator is named
@code{q2c}.

@menu
* Invoking q2c::                q2c command-line syntax.
* q2c Input Structure::         High-level layout of the input file.
* Grammar Rules::               Syntax of the grammar rules.
@end menu

@node Invoking q2c
@section Invoking q2c

@example
q2c @var{input.q} @var{output.c}
@end example

@code{q2c} translates a @samp{.q} file into a @samp{.c} file.  It takes
exactly two command-line arguments, which are the input file name and
output file name, respectively.  @code{q2c} does not accept any
command-line options.

@node q2c Input Structure
@section @code{q2c} Input Structure

@code{q2c} input files are divided into two sections: the grammar rules
and the supporting code.  The @dfn{grammar rules}, which make up the
first part of the input, are used to define the syntax of the
statistical procedure to be parsed.  The @dfn{supporting code},
following the grammar rules, are copied largely unchanged to the output
file, except for certain escapes.

The most important lines in the grammar rules are used for defining
procedure syntax.  These lines can be prefixed with a dollar sign
(@samp{$}), which prevents Emacs' CC-mode from munging them.  Besides
this, a bang (@samp{!}) at the beginning of a line causes the line,
minus the bang, to be written verbatim to the output file (useful for
comments).  As a third special case, any line that begins with the exact
characters @code{/* *INDENT} is ignored and not written to the output.
This allows @code{.q} files to be processed through @code{indent}
without being munged.

The syntax of the grammar rules themselves is given in the following
sections.

The supporting code is passed into the output file largely unchanged.
However, the following escapes are supported.  Each escape must appear
on a line by itself.

@table @code
@item /* (header) */

Expands to a series of C @code{#include} directives which include the
headers that are required for the parser generated by @code{q2c}.

@item /* (decls @var{scope}) */

Expands to C variable and data type declarations for the variables and
@code{enum}s input and output by the @code{q2c} parser.  @var{scope}
must be either @code{local} or @code{global}.  @code{local} causes the
declarations to be output as function locals.  @code{global} causes them
to be declared as @code{static} module variables; thus, @code{global} is
a bit of a misnomer.

@item /* (parser) */

Expands to the entire parser.  Must be enclosed within a C function.

@item /* (free) */

Expands to a set of calls to the @code{free} function for variables
declared by the parser.  Only needs to be invoked if subcommands of type
@code{string} are used in the grammar rules.
@end table

@node Grammar Rules
@section Grammar Rules

The grammar rules describe the format of the syntax that the parser
generated by @code{q2c} will understand.  The way that the grammar rules
are included in @code{q2c} input file are described above.

The grammar rules are divided into tokens of the following types:

@table @asis
@item Identifier (@code{ID})

An identifier token is a sequence of letters, digits, and underscores
(@samp{_}).  Identifiers are @emph{not} case-sensitive.

@item String (@code{STRING})

String tokens are initiated by a double-quote character (@samp{"}) and
consist of all the characters between that double quote and the next
double quote, which must be on the same line as the first.  Within a
string, a backslash can be used as a ``literal escape''.  The only
reasons to use a literal escape are to include a double quote or a
backslash within a string.

@item Special character

Other characters, other than white space, constitute tokens in
themselves.

@end table

The syntax of the grammar rules is as follows:

@example
grammar-rules ::= command-name opt-prefix : subcommands .
command-name ::= ID
             ::= STRING
opt-prefix ::=
           ::= ( ID )
subcommands ::= subcommand
            ::= subcommands ; subcommand
@end example

The syntax begins with an ID token that gives the name of the
procedure to be parsed.  For command names that contain multiple
words, a STRING token may be used instead, e.g.@: @samp{"FILE
HANDLE"}.  Optionally, an ID in parentheses specifies a prefix used
for all file-scope identifiers declared by the emitted code.

The rest of the syntax consists of subcommands separated by semicolons
(@samp{;}) and terminated with a full stop (@samp{.}).

@example
subcommand ::= default-opt arity-opt ID sbc-defn
default-opt ::=
            ::= *
arity-opt ::=
          ::= +
          ::= ^
sbc-defn ::= opt-prefix = specifiers
         ::= [ ID ] = array-sbc
         ::= opt-prefix = sbc-special-form
@end example

A subcommand that begins with an asterisk (@samp{*}) is the default
subcommand.  The keyword used for the default subcommand can be omitted
in the PSPP syntax file.

A plus sign (@samp{+}) indicates that a subcommand can appear more than
once.  A caret (@samp{^}) indicate that a subcommand must appear exactly
once.  A subcommand marked with neither character may appear once or not
at all, but not more than once.

The subcommand name appears after the leading option characters.

There are three forms of subcommands.  The first and most common form
simply gives an equals sign (@samp{=}) and a list of specifiers, which
can each be set to a single setting.  The second form declares an array,
which is a set of flags that can be individually turned on by the user.
There are also several special forms that do not take a list of
specifiers.

Arrays require an additional @code{ID} argument.  This is used as a
prefix, prepended to the variable names constructed from the
specifiers.  The other forms also allow an optional prefix to be
specified.

@example
array-sbc ::= alternatives
          ::= array-sbc , alternatives
alternatives ::= ID
             ::= alternatives | ID
@end example

An array subcommand is a set of Boolean values that can independently be
turned on by the user, listed separated by commas (@samp{,}).  If an value has more
than one name then these names are separated by pipes (@samp{|}).

@example
specifiers ::= specifier
           ::= specifiers , specifier
specifier ::= opt-id : settings
opt-id ::=
       ::= ID
@end example

Ordinary subcommands (other than arrays and special forms) require a
list of specifiers.  Each specifier has an optional name and a list of
settings.  If the name is given then a correspondingly named variable
will be used to store the user's choice of setting.  If no name is given
then there is no way to tell which setting the user picked; in this case
the settings should probably have values attached.

@example
settings ::= setting
         ::= settings / setting
setting ::= setting-options ID setting-value
setting-options ::=
                ::= *
                ::= !
                ::= * !
@end example

Individual settings are separated by forward slashes (@samp{/}).  Each
setting can be as little as an @code{ID} token, but options and values
can optionally be included.  The @samp{*} option means that, for this
setting, the @code{ID} can be omitted.  The @samp{!} option means that
this option is the default for its specifier.

@example
setting-value ::=
              ::= ( setting-value-2 )
              ::= setting-value-2
setting-value-2 ::= setting-value-options setting-value-type : ID
setting-value-options ::=
                      ::= *
setting-value-type ::= N
                   ::= D
                   ::= S
@end example

Settings may have values.  If the value must be enclosed in parentheses,
then enclose the value declaration in parentheses.  Declare the setting
type as @samp{n}, @samp{d}, or @samp{s} for integer, floating-point,
or string type, respectively.  The given @code{ID} is used to
construct a variable name.
If option @samp{*} is given, then the value is optional; otherwise it
must be specified whenever the corresponding setting is specified.

@example
sbc-special-form ::= VAR
                 ::= VARLIST varlist-options
                 ::= INTEGER opt-list
                 ::= DOUBLE opt-list
                 ::= PINT
                 ::= STRING @r{(the literal word STRING)}
                 ::= CUSTOM
varlist-options ::=
                ::= ( STRING )
opt-list ::=
         ::= LIST
@end example

The special forms are of the following types:

@table @code
@item VAR

A single variable name.

@item VARLIST

A list of variables.  If given, the string can be used to provide
@code{PV_@var{*}} options to the call to @code{parse_variables}.

@item INTEGER

A single integer value.

@item INTEGER LIST

A list of integers separated by spaces or commas.

@item DOUBLE

A single floating-point value.

@item DOUBLE LIST

A list of floating-point values.

@item PINT

A single positive integer value.

@item STRING

A string value.

@item CUSTOM

A custom function is used to parse this subcommand.  The function must
have prototype @code{int custom_@var{name} (void)}.  It should return 0
on failure (when it has already issued an appropriate diagnostic), 1 on
success, or 2 if it fails and the calling function should issue a syntax
error on behalf of the custom handler.

@end table
@setfilename ignored