File: ip.tex

package info (click to toggle)
psicode 3.4.0-6
links: PTS, VCS
area: main
in suites: bookworm, bullseye, buster, stretch
size: 46,416 kB
ctags: 18,563
sloc: cpp: 291,425; ansic: 12,788; fortran: 10,489; perl: 3,206; sh: 2,702; makefile: 2,205; ruby: 2,178; yacc: 110; lex: 53
file content (224 lines) | stat: -rw-r--r-- 9,353 bytes
parent folder | download | duplicates (4)
%
% PSI Programmer's Manual
%
% Input Parsing Section
%
% Justin T. Fermann, 1 February 1996
%
% Updated and improved(?) by Edward F. Valeev, 7 June 2000
% Updated to C++ by David Sherrill, 29 Jan 2008
%
The input parsing library is built for the purpose of reading in the
contents of an input file with the syntax of \inputdat\ and storing
the contents specific to certain keywords supplied. To perform such a
task \library{libipv1.a} has three parts: (1) the parser; (2) the
lexical scanner; (3) keyword storage and retrieval.

The format of \inputdat\ follows certain rules which should probably
referred to as the PSI input grammar. There is a description of most
of those rules in \PSIthree\ User's Manual. A complete definition of
the PSI input grammar is encoded in \file{parse.y} (see below).  To
read a grammar we need a parser -- the first component of
\library{libipv1.a}. Then the identified lexical elements of
\inputdat\ (keywords and keyword values) need to be scanned for
presence of ``forbidden'' characters (e.g.\ a space may not be a part
of a string unless the string is placed between parentheses).  This
task is performed by the lexical scanner --- the second component of
\library{libipv1.a}. Finally, scanned-in pairs of keyword-value(s) are
stored in a hierarchical data structure (a tree). When a particular
option is needed, the set of stored keywords and values is searched
for the one queried and the value returned.  In this way, options of
varying type can be assigned, i.e.\ rather than having a line of
integers, each corresponding to a program variable, mnemonic character
string variables can be parsed and interpreted into program variables.
It's also easier to implement default options, allowing a more spartan
input deck.  The set of input-parsing routines in \library{libipv1.a}
is really not complicated to use, but the manner in which data is
stored is somewhat painful to grasp at first.

The following is a list of the names of the individual source files in
\library{libipv1} and a summary of their contents.  After that is a
list of the syntax of specific functions and their use.  Last is a
simple illustration of the use of this library, taken mostly from
\PSIcscf.

\subsubsection{Source Files}

\begin{itemize}
\item Header files
  \begin{itemize}
  \item \file{ip\_error.h} Defines for error return values.
  \item \file{ip\_global.h} cpp macros to make Curt happy.
  \item \file{ip\_lib.h} \#include's everything.
  \item \file{ip\_types.h} Various structures and unions specific to
                          \library{libipv1}.
  \end{itemize}
\item Other Source
  \begin{itemize}
  \item \file{parse.y} Yacc source encoding the PSI input grammar.
  Read by {\tt yacc} (or {\tt bison}) -- a parser generator program.
  \item \file{scan.l} Lex source describing lexical elements allowed
  in \inputdat. Read by {\tt lex} (or {\tt flex}) -- a lexer generator
  program. 
  \item \file{*.gbl, *.lcl} cpp macros to mimic variable argument lists.
  \end{itemize}
\item C source
  \begin{itemize} 
  \item \file{ip\_alloc.cc} Allocates keyword tree elements.
  \item \file{ip\_cwk.cc} Routines to manipulate the current working
                       keyword tree.
  \item \file{ip\_data.cc} Routines to handle reading of arrays and
                         scaler keyword assignments in input.
  \item \file{ip\_error.cc} Error reporting functions.
  \item \file{ip\_karray.cc} Other things to deal with keyword arrays.
  \item \file{ip\_print.cc} Routines to print sections of the keyword
  tree.
  \item \file{ip\_read.cc} All the file manipulation routines.  Reading
                  of \inputdat\ and building the keyword tree from
                  which information is later plucked.
  \end{itemize}
\end{itemize}

\subsubsection{Syntax}

\begin{center} \file{ip\_cwk.cc}\\ \end{center}

\celem{void ip\_cwk\_clear();} \\
Clears current working keyword.  Used when initializing input or switching
from one section to another (:DEFAULT and :CSCF to :INTCO, for instance).

\celem{void ip\_cwk\_add(char *kwd);} \\
Adds \celem{kwd} to the list of current working keywords.  Allows parsing of 
variables under that keyword out of the input file (files) which has
(have) been read or will be read in the future using \celem{ip\_append}.
The keyword \celem{kwd} can only be removed from the list of current working
keywords by purging the entire list using \celem{ip\_cwk\_clear}. 
{\em You must ensure that they keyword strings begin with a colon.}

\begin{center} \file{ip\_data.cc} \\ \end{center}

\celem{int ip\_count(char *kwd, int *count, int n);} \\
Counts the elements in the n'th element of the array \celem{kwd}.

\celem{int ip\_boolean(char *kwd, int *bool, int n);} \\
Parses n'th element of \celem{kwd} as boolean (true, 1, yes; false, 0, no)
into 1 or 0 returned in \celem{bool}.

\file{int ip\_exist(char *kwd, int n);} \\
Returns 1 if n'th element of \celem{kwd} exists.  Unfortunately, n must be 0.

\file{int ip\_data(char *kwd, char *conv, void *value, int n 
      [, int o1, ..., int on]);} \\
Looks for keyword \celem{kwd}, finds the value associated with it,
converts it according to the format specification given in
\celem{conv}, and stores the result in \celem{value}.  Note that
\celem{value} is a \celem{void *} so this routine can handle any data
type, but it is the programmer's responsibility to ensure that the
pointer passed to this routine is of the appropriate pointer type for
the data.  The value found by the input parser depends on the value of
\celem{n} and any optional additional arguments.  \celem{n} is the
number of additional arguments.  If \celem{n} is 0, then there are no
additional arguments, and the keyword has only one value associated
with it.  If the keyword has an array associated with it, then
\celem{n} is 1 and the one additional argument is which element of the
array to pick.  If \celem{kwd} specifies an array of arrays, then
\celem{n} is 2, the first additional argument is the number of the
first array, and the second argument is the number of the element
within that array, etc.  Deep in here, the code calls a
\celem{sscanf(read, conv, value);}, so that's the real meaning of
variables.

\celem{int ip\_string(char *kwd, char **value, int n, [int o1, ..., int on]);}\\
Parses the string associated with \celem{kwd} stores it in \celem{value}.
The role of \celem{n} and optional arguments is the same as that
described above for \celem{ip\_data()}.

\celem{int ip\_value(char *kwd, ip\_value\_t **ip\_val, int n);} \\
Grabs the section of keyword tree at \celem{kwd} and stores it in 
\celem{ip\_val}
for the programmer's use - this is usually not used, since you need to 
understand the structure of \celem{ip\_value\_t}.

\celem{int ip\_int\_array(char *kwd, int *arr, int n);} \\
Reads n integers into array \celem{arr}.

\begin{center} \celem{ip\_read.cc} \\ \end{center}

\celem{void ip\_set\_uppercase(int uc);} \\
Sets parsing to case sensitive if uc==0, I think.

\celem{void ip\_initialize(FILE *in, FILE *out);} \\
Calls \celem{yyparse();} followed by \celem{ip\_cwk\_clear();} followed by 
\celem{ip\_internal\_values();}.  This routine reads the entire input deck
and stores it into the keyword tree for access later.

\celem{void ip\_append(FILE *in, FILE *out);} \\
Same thing as \celem{ip\_initialize();}, except this doesn't clear the 
\celem{cwk} first.  Used for parsing another input file, such 
as \celem{intco.dat}.

\celem{void ip\_done();} \\
Frees up the keyword tree.

\begin{center} \celem{ip\_read.cc} \\ \end{center}

\celem{void ip\_print\_tree(FILE *out, ip\_keyword\_tree\_t *tree);} \\
Prints out \celem{tree} to \celem{out}. If \celem{tree} is set to \celem{NULL},
then the current working keyword tree will be printed out.
This function is useful for debugging problems with parsing.

\subsubsection{Sample Use from \PSIcscf}
These are two slightly simplified pieces of (former versions of) actual code.  

From \file{cscf.cc}:
\begin{verbatim}
#include <libipv1/ip_lib.h>
#include <libpsio/psio.h>


int main(int argc,char* argv[])
{
  using namespace psi::cscf;

  ...

  psi_start(&infile,&outfile,&psi_file_prefix,argc-1, argv+1, 0);
  ip_cwk_add(":SCF");
\end{verbatim}

From \file{scf\_input.cc}:
\begin{verbatim}

   errcod = ip_string("LABEL",&alabel,0);
   if(errcod == IPE_OK) fprintf(outfile,"  label       = %s\n",alabel);

   reordr = 0;    /* this sets the default that will be used in case the
                     user hasn't specified this keyword */
   errcod = ip_boolean("REORDER",&reordr,0); 
   if(reordr) {
      errcod = ip_count("MOORDER",&size,0);
      for(i=0; i < size ; i++) {
         errcod = ip_data("MOORDER","%d",&iorder[i],1,i);
         errchk(errcod,"MOORDER");
         }
      }
   second_root = 0;
   if (twocon) {
      errcod = ip_boolean("SECOND_ROOT",&second_root,0);
      }

   if(iopen) {
      errcod = ip_count("SOCC",&size,0);
      if(errcod == IPE_OK && size != num_ir) {
         fprintf(outfile,"\n SOCC array is the wrong size\n");
         fprintf(outfile," is %d, should be %d\n",size,num_ir);
         exit(1);
         }
      if(errcod != IPE_OK) {
         fprintf(outfile,"\n try adding some electrons buddy!\n");
         fprintf(outfile," need SOCC\n");
         ip_print_tree(outfile,NULL);
         exit(1);
         }
\end{verbatim}