1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224
|
%
% PSI Programmer's Manual
%
% Input Parsing Section
%
% Justin T. Fermann, 1 February 1996
%
% Updated and improved(?) by Edward F. Valeev, 7 June 2000
% Updated to C++ by David Sherrill, 29 Jan 2008
%
The input parsing library is built for the purpose of reading in the
contents of an input file with the syntax of \inputdat\ and storing
the contents specific to certain keywords supplied. To perform such a
task \library{libipv1.a} has three parts: (1) the parser; (2) the
lexical scanner; (3) keyword storage and retrieval.
The format of \inputdat\ follows certain rules which should probably
referred to as the PSI input grammar. There is a description of most
of those rules in \PSIthree\ User's Manual. A complete definition of
the PSI input grammar is encoded in \file{parse.y} (see below). To
read a grammar we need a parser -- the first component of
\library{libipv1.a}. Then the identified lexical elements of
\inputdat\ (keywords and keyword values) need to be scanned for
presence of ``forbidden'' characters (e.g.\ a space may not be a part
of a string unless the string is placed between parentheses). This
task is performed by the lexical scanner --- the second component of
\library{libipv1.a}. Finally, scanned-in pairs of keyword-value(s) are
stored in a hierarchical data structure (a tree). When a particular
option is needed, the set of stored keywords and values is searched
for the one queried and the value returned. In this way, options of
varying type can be assigned, i.e.\ rather than having a line of
integers, each corresponding to a program variable, mnemonic character
string variables can be parsed and interpreted into program variables.
It's also easier to implement default options, allowing a more spartan
input deck. The set of input-parsing routines in \library{libipv1.a}
is really not complicated to use, but the manner in which data is
stored is somewhat painful to grasp at first.
The following is a list of the names of the individual source files in
\library{libipv1} and a summary of their contents. After that is a
list of the syntax of specific functions and their use. Last is a
simple illustration of the use of this library, taken mostly from
\PSIcscf.
\subsubsection{Source Files}
\begin{itemize}
\item Header files
\begin{itemize}
\item \file{ip\_error.h} Defines for error return values.
\item \file{ip\_global.h} cpp macros to make Curt happy.
\item \file{ip\_lib.h} \#include's everything.
\item \file{ip\_types.h} Various structures and unions specific to
\library{libipv1}.
\end{itemize}
\item Other Source
\begin{itemize}
\item \file{parse.y} Yacc source encoding the PSI input grammar.
Read by {\tt yacc} (or {\tt bison}) -- a parser generator program.
\item \file{scan.l} Lex source describing lexical elements allowed
in \inputdat. Read by {\tt lex} (or {\tt flex}) -- a lexer generator
program.
\item \file{*.gbl, *.lcl} cpp macros to mimic variable argument lists.
\end{itemize}
\item C source
\begin{itemize}
\item \file{ip\_alloc.cc} Allocates keyword tree elements.
\item \file{ip\_cwk.cc} Routines to manipulate the current working
keyword tree.
\item \file{ip\_data.cc} Routines to handle reading of arrays and
scaler keyword assignments in input.
\item \file{ip\_error.cc} Error reporting functions.
\item \file{ip\_karray.cc} Other things to deal with keyword arrays.
\item \file{ip\_print.cc} Routines to print sections of the keyword
tree.
\item \file{ip\_read.cc} All the file manipulation routines. Reading
of \inputdat\ and building the keyword tree from
which information is later plucked.
\end{itemize}
\end{itemize}
\subsubsection{Syntax}
\begin{center} \file{ip\_cwk.cc}\\ \end{center}
\celem{void ip\_cwk\_clear();} \\
Clears current working keyword. Used when initializing input or switching
from one section to another (:DEFAULT and :CSCF to :INTCO, for instance).
\celem{void ip\_cwk\_add(char *kwd);} \\
Adds \celem{kwd} to the list of current working keywords. Allows parsing of
variables under that keyword out of the input file (files) which has
(have) been read or will be read in the future using \celem{ip\_append}.
The keyword \celem{kwd} can only be removed from the list of current working
keywords by purging the entire list using \celem{ip\_cwk\_clear}.
{\em You must ensure that they keyword strings begin with a colon.}
\begin{center} \file{ip\_data.cc} \\ \end{center}
\celem{int ip\_count(char *kwd, int *count, int n);} \\
Counts the elements in the n'th element of the array \celem{kwd}.
\celem{int ip\_boolean(char *kwd, int *bool, int n);} \\
Parses n'th element of \celem{kwd} as boolean (true, 1, yes; false, 0, no)
into 1 or 0 returned in \celem{bool}.
\file{int ip\_exist(char *kwd, int n);} \\
Returns 1 if n'th element of \celem{kwd} exists. Unfortunately, n must be 0.
\file{int ip\_data(char *kwd, char *conv, void *value, int n
[, int o1, ..., int on]);} \\
Looks for keyword \celem{kwd}, finds the value associated with it,
converts it according to the format specification given in
\celem{conv}, and stores the result in \celem{value}. Note that
\celem{value} is a \celem{void *} so this routine can handle any data
type, but it is the programmer's responsibility to ensure that the
pointer passed to this routine is of the appropriate pointer type for
the data. The value found by the input parser depends on the value of
\celem{n} and any optional additional arguments. \celem{n} is the
number of additional arguments. If \celem{n} is 0, then there are no
additional arguments, and the keyword has only one value associated
with it. If the keyword has an array associated with it, then
\celem{n} is 1 and the one additional argument is which element of the
array to pick. If \celem{kwd} specifies an array of arrays, then
\celem{n} is 2, the first additional argument is the number of the
first array, and the second argument is the number of the element
within that array, etc. Deep in here, the code calls a
\celem{sscanf(read, conv, value);}, so that's the real meaning of
variables.
\celem{int ip\_string(char *kwd, char **value, int n, [int o1, ..., int on]);}\\
Parses the string associated with \celem{kwd} stores it in \celem{value}.
The role of \celem{n} and optional arguments is the same as that
described above for \celem{ip\_data()}.
\celem{int ip\_value(char *kwd, ip\_value\_t **ip\_val, int n);} \\
Grabs the section of keyword tree at \celem{kwd} and stores it in
\celem{ip\_val}
for the programmer's use - this is usually not used, since you need to
understand the structure of \celem{ip\_value\_t}.
\celem{int ip\_int\_array(char *kwd, int *arr, int n);} \\
Reads n integers into array \celem{arr}.
\begin{center} \celem{ip\_read.cc} \\ \end{center}
\celem{void ip\_set\_uppercase(int uc);} \\
Sets parsing to case sensitive if uc==0, I think.
\celem{void ip\_initialize(FILE *in, FILE *out);} \\
Calls \celem{yyparse();} followed by \celem{ip\_cwk\_clear();} followed by
\celem{ip\_internal\_values();}. This routine reads the entire input deck
and stores it into the keyword tree for access later.
\celem{void ip\_append(FILE *in, FILE *out);} \\
Same thing as \celem{ip\_initialize();}, except this doesn't clear the
\celem{cwk} first. Used for parsing another input file, such
as \celem{intco.dat}.
\celem{void ip\_done();} \\
Frees up the keyword tree.
\begin{center} \celem{ip\_read.cc} \\ \end{center}
\celem{void ip\_print\_tree(FILE *out, ip\_keyword\_tree\_t *tree);} \\
Prints out \celem{tree} to \celem{out}. If \celem{tree} is set to \celem{NULL},
then the current working keyword tree will be printed out.
This function is useful for debugging problems with parsing.
\subsubsection{Sample Use from \PSIcscf}
These are two slightly simplified pieces of (former versions of) actual code.
From \file{cscf.cc}:
\begin{verbatim}
#include <libipv1/ip_lib.h>
#include <libpsio/psio.h>
int main(int argc,char* argv[])
{
using namespace psi::cscf;
...
psi_start(&infile,&outfile,&psi_file_prefix,argc-1, argv+1, 0);
ip_cwk_add(":SCF");
\end{verbatim}
From \file{scf\_input.cc}:
\begin{verbatim}
errcod = ip_string("LABEL",&alabel,0);
if(errcod == IPE_OK) fprintf(outfile," label = %s\n",alabel);
reordr = 0; /* this sets the default that will be used in case the
user hasn't specified this keyword */
errcod = ip_boolean("REORDER",&reordr,0);
if(reordr) {
errcod = ip_count("MOORDER",&size,0);
for(i=0; i < size ; i++) {
errcod = ip_data("MOORDER","%d",&iorder[i],1,i);
errchk(errcod,"MOORDER");
}
}
second_root = 0;
if (twocon) {
errcod = ip_boolean("SECOND_ROOT",&second_root,0);
}
if(iopen) {
errcod = ip_count("SOCC",&size,0);
if(errcod == IPE_OK && size != num_ir) {
fprintf(outfile,"\n SOCC array is the wrong size\n");
fprintf(outfile," is %d, should be %d\n",size,num_ir);
exit(1);
}
if(errcod != IPE_OK) {
fprintf(outfile,"\n try adding some electrons buddy!\n");
fprintf(outfile," need SOCC\n");
ip_print_tree(outfile,NULL);
exit(1);
}
\end{verbatim}
|