1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568
|
\section{Discussion of a typical \FORM\ run}
We discuss in the following what is happening inside \FORM\ when it executes a
given program. The discussion focuses more on the interplay between the various
parts of \FORM\ and on key concepts of the internal data representation than on
in-depth details of the code. For the latter, the reader is referred to section
\ref{sec:indepth}. This section should for better comprehension be read with the
referenced \FORM\ source files opened aside.
We consider the following exemplary \FORM\ program \C{test.frm} (which we run
with the command "\C{form test}"):
\begin{verbatim}
1 #define N "3"
2
3 Symbol x, y, z;
4
5 L f = (x+y)^2 - (x+z)^`N';
6 L g = f - x;
7
8 Brackets x;
9 Print;
10 .sort
11
12 #do i=2,3
13 Id x?^`i' = x;
14 #enddo
15
16 Print +s;
17 .end
\end{verbatim}
The entry function \C{main()} is in \C{startup.c}. It does various
initializations before it calls the preprocessor \C{PreProcessor()}, which
actually deals with the \FORM\ program. The code shows some typical features:
Preprocessor macros are frequently used to select code specific to certain
configurations. The two most common macros can be seen here: \C{WITHPTHREADS}
for a \TFORM\ executable and \C{PARALLEL} for a \PARFORM\ executable. Macros are
used to access the global data contained in the variable \C{A}, like
\C{AX.timeout} for example. The code uses (usually) own functions instead of
standard functions provided by the C library for common tasks. Examples in
\C{main()} are \C{strDup1} or \C{MesPrint} (replacing \C{printf()}). Another
very often used function is \C{Malloc1()} replacing \C{malloc()}. The reasons
are better portability and the inclusion of special features. \C{Malloc1()} for
example makes a custom memory debugger available while \C{MesPrint()} knows
among other things how to print encoded expressions from the internal buffers.
% needs to be rewritten ->
The initializations in \C{main()} are done in several steps. Some like the
initialization of \C{A} with zeros is done directly, most others are done by
calls to dedicated functions. The initializations are split up according to the
type of objects involved and the available information at this point. The
command line parameters passed to \FORM\ (none in our example run) are treated
in the function \C{DoTail()}. After that, files are opened and also parsed for
addtional settings. Then, as all settings are known, the large part of the
internal data is allocated and initialized. Finally, recovery settings are
checked, threads are started if necessary, timers are started, and variable
initializations that might need to be repeated later (e.g. clear modules) are
done in \C{IniVars()}.
The call to \C{OpenInput()} reads the actual \FORM\ program into memory. The
input is handled in an abstract fashion as character streams. The stream implementation
(\C{tools.c}) offers several functions to open, close, and read from a stream.
Streams can be of different types including files, in-memory data like parts of
other streams or dollar variables, as well as external channels. The access to
the characters in all streams though is nicely uniform. In
\C{OpenInput()} a stream is representing our input file. Most of the logic
there deals with the jump to the requested module (skipping clear instructions).
It uses the function \C{GetInput()} to get the next character in the stream.
Which stream it reads from is determined by the variable \C{AC.CurrentStream}.
This global variable in the sub-struct \C{C\_const} of the \C{ALLGLOBALS}
variable \C{A} is an example of how the different parts of \FORM\ typically
communicate with each other by means of global variables.
Next is the preprocessor. The preprocessor is implemented in the function
\C{PreProcessor()} in \C{pre.c}. This function consists basically of two nested
for-loops without conditions (\C{for (;;) \{ \ldots \}}). The outer loop deals
with one \FORM\ module for each iteration, the inner loop deals with one input
line. We have certain initializations done before in our example the code runs
into the inner loop, where \C{GetInput()} reads our input file. The variables
are all set such that the reading starts from the beginning of out input file.
The input in variable \C{c} is tested for special cases. Whitespaces are
skipped. Comments starting with a star \C{*} (unless \C{AP.ComChar} is set to a
different character) are also skipped including whole folds. The crucial check
on \C{c} is the if-clause that checks it for being a preprocessor command (\C{\#}),
a module statement (\C{.}), or something else which is usually an ordinary
statement.
\begin{verbatim}
1 #define N "3"
\end{verbatim}
In our case, we have a preprocessor command in the input. The function
\C{PreProInstruction()} is called to read and interpret the rest of the line.
The first part deals with the loading of the command in a dedicated buffer. For
the moment, we ignore the details for the special treatment of cases when we are
already inside a if or switch clause in a \FORM\ program. In our run, the
function \C{LoadInstruction(0)} is simply called.
\C{LoadInstruction()} copies input into the preprocessor instruction buffer.
Three variables govern this buffer: \C{AP.preStart} points to the start of the
buffer, \C{AP.preFill} to the point where new input can be copied to, and
\C{AP.preStop} to (roughly) the end of the buffer. This setup is quite typical
for buffers in \FORM. The memory is allocated at the start of \FORM. Later, like
at the end of \C{LoadInstruction()}, if the buffer gets to small, it can be
replaced by a larger memory patch with the help of utility functions like
\C{DoubleLList()}. The contents is copied from the old to the new buffer. Since
this dynamical resizing of buffers needs to be done with most buffers
occationally, most buffers in \FORM\ store data such that it easily allows for
copying, i.e. usually C pointers are avoided and instead numbers representing
offsets are used. Since the preprocessor instruction buffer just contains
characters there is no problem here.
In \C{LoadInstruction()} with our input and the mode set to 5 the input is just
copied directly without any special actions taking except for a zero that is
added at the end of the data. \C{PreProInstruction()} examines the data in the
preprocessor instruction buffer for special cases, and then does a look-up in
the \C{precommands} variable. This is a vector of type \C{KEYWORD} which enables
the translation of a string (the command) to a function pointer (the C function
that performs the operations requested by preprocessor command).
\C{FindKeyword()} does these translations and the found function pointer is then
dereferenced with the rest of the input in the instruction buffer as an argument.
The function pointer will point to \C{DoDefine()} in our case. \C{DoDefine()}
just calls \C{TheDefine()} that does the work. The if-clauses for \\
\C{AP.PreSwitchModes} and \C{AP.PreIfStack} are present in most of the
functions dealing with preprocessor commands. They check whether we are in a
preprocessor if or switch block that is not to be considered, because the
condition didn't hold. Then, the standard action is to just exit the current
function leaving it with no effect. Since there are preprocessor commands like
\C{\#else} or \C{\#endif} this decision can only be taken at this level of the
execution and requires the repeated use of this idiom.
The function scans through possible arguments and the value. In the value, special
characters are interpreted. Ultimately, the preprocessor variable is created and
assigned in the called function \C{PutPreVar()}. The variable \C{chartype}
deserves an explanation. One will find it used very often in the C code that
does input parsing. \C{chartype} is actually a macro standing in for
\C{FG.cTable}. This global, statically initialized (in \C{inivar.h}) vector
contains a value of every possible ASCII character describing its parsing type.
The parsing type groups different ASCII characters such that the syntax checking
is facilitated, see \C{inivar.h} for details.
In \C{PutPreVar()} we get into the details of the name administration. We will
just comment on some of the more general features. \C{NumPre} and \C{PreVar} are
macros to access elements in \C{AP.PreVarList}. The type of \C{AP.PreVarList} is
\C{LIST}. This is a generic type for all kinds of lists and it is used for many
other variables in \FORM. A \C{LIST} stores list entries in a piece of
dynamically allocated memory that has no defined type (\C{void *}). The utility
functions for managing \C{LIST}s like \C{FromList()} are ignorant about the
actual contents and perform list-specific operations like adding, removing or
resizing a list. An actual entry can be accessed by some pointer arithmetic and
type casting. The \C{PreVar} macro contains such a cast to the type \C{PREVAR}
which represents a preprocessor variable.
\C{PutPreVar()} creates a new list entry for us and basically copies the
contents of the parameter \C{value} to the memory allocated to \C{PREVAR}'s
\C{name}. So, by writing \C{PreVar[0]->name} or \C{PreVar[0]->value} we could
access the strings \C{N} or \C{3}.
In \C{TheDefine()} the function \C{Terminate()} is used several times. This
function ultimately exits the program, but first tries to clean up things and
print information about the problems causing this program termination.
\begin{verbatim}
2
3 Symbol x, y, z;
\end{verbatim}
In our run, we return to the function \C{PreProcessor()} and start a new inner
loop iteration that reads a new line. After skipping the empty line we end up
in the else-branch of the big if-clause testing \C{c} this time. Here the major
steps are: we check again whether we are in a preprocessor if or switch, call
\C{LoadStatement()} to read and prepare the input, and call
\C{CompileStatement()} to perform the actions requested by the statement. Th
programs enters the compiler stage.
We also see a call to \C{UngetChar()}, which puts back the character that has
been read into the input stream. This is necessary, because \\
\C{LoadStatement()} and \C{CompileStatement()} need the complete line for
parsing. The variable \C{AP.PreContinuation} is used several times. This variable
deals with statements that span several input lines. \C{LoadStatement()} can
recognize unfinished statements and sets this variable accordingly.
\C{LoadStatement()} basically copies the input to the compiler's input buffer at
\C{AC.iBuffer} (which has \C{AC.iPointer} and \C{AC.iStop} associated to it). It
modifies the copy if necessary. The modification are to replace spaces by commas
or insert commas at teh right spots to separate tokens. The interpretation steps
that are following rely on these synactic conventions.
The call to \C{CompileStatement()} is done only if no errors occured and all
lines of a statement have been gathered into the compiler's input buffer.
\C{CompileStatement()} is called with the address of this input buffer and tries
to identify the statement. Like in the preprocessor, the input string is search
in a vector of \C{KEYWORD}s (in \C{compiler.c} and if found, a function pointer
is dereferenced to the function that actually deals with the command and its
options and arguments. Here, we have actually two vectors of \C{KEYWORD}s,
because some statements might be stated in abbreviated form. The function
\C{findcommand()} deals with the search. \C{CompileStatement()} does some small
extra work, like for example checking the correct order of statements. In our
case, it calls the function \C{CoSymbol()}. This functions is in file
\C{name.c}, because as a declaration it basically adds something to the name
administration. Functions for other statements can be found in \C{compcomm.c}
and \C{compexpr.c}.
\C{CoSymbol()} loops over the arguments and adds proper variable names together
with their options to the symbols list \C{AC.Symbols} and the name
administration (in the call to \C{AddSymbol()}. In our case, we have \C{x},
\C{y}, and \C{z} added. We have already encountered the basic mechanism of how a
specific struct is added to a \C{LIST}. The name administration was not
explained before, though.
Symbols can appear in expressions that need to be encoded. The coding for
symbols can simply be its entry index in the list \C{AC.Symbols}, but symbols
also need to be recognized when an expression is parsed. Therefore a efficient
look-up mechnism is required. This is achieved by a second data structure that
holds the name strings in a tree for fast searching. The data in the symbol list
does not contain the name string itself, but contains a referece (a index) into
this name string tree. The tree is managed by generalized functions and types
that are also used for other, similiar objects like vectors, indices, etc. The
functions for name trees are located in the first part of the file \C{name.c}.
The types \C{NAMENODE} and \C{NAMETREE} are defined in \C{structs.h}.
\C{NAMENODE}s are the node of a balanced binary tree. It does not hold the
name string just an index into \C{NAMETREE}. The actual data is contained in
\C{NAMETREE} that constitute one tree. This type has buffers for the nodes and
for the name strings. This has the benefit of avoiding small malloc calls for
individual nodes. Also, since all referencing is done via offsets into these
buffers, a relocation or serialization of such a tree is very easy. In the
struct \C{C\_const} (aka the global \C{AC}) several name trees are defined, for
dollar variables, expressions, etc. The symbols added in our example program go
into the nametree referenced by \C{AC.activenames}, which is at this point equal
to \C{AC.varnames}.
Our program returns to the \C{PreProcessor()} and starts parsing the next lines:
\begin{verbatim}
5 L f = (x+y)^2 - (x+z)^`N';
6 L g = f - x;
\end{verbatim}
This time the function \C{DoExpr()} will get called (via \C{CoLocal()}) for each
line to do the parsing. The function \C{DoExpr()} first tries to figure out
what type of \C{Local} statement we have. In our cases we have an actual
assignment. With the call to \C{GetVar()} we check whether a variable of the same
name already exists. The search is done in the nametrees \C{AC.varnames} and
\C{AC.exprnames}. Since our names are new we don't find a previous variable and
simply call \C{EntVar()}. \C{EntVar()} creates an entry in \C{AC.ExpressionList}
and puts the name into the \C{AC.exprnames} nametree. The entry in
\C{AC.ExpressionList} is of type \C{struct ExPrEsSiOn}. There are more struct
elements than in the case of symbols, but the principle is the same. Up to now,
the right-hand-side (RHS) has not been looked at and therefore no information
about it is saved in the expression's entry yet. The connection between the
expression's entry in the \C{AC.ExpressionList} and the data containing the RHS
will be made via the elements \C{prototype} and \C{onfile} as we will describe
soon. The access to elements in \C{AC.ExpressionList} is facilitated by the
macro \C{Expressions}. The following code in \C{DoExpr()} builds up a so-called
prototype and puts the RHS in encoded form into the buffer system via the call
to \C{CompileAlgebra()}.
\FORM\ uses the allocated memory in \C{AT.WorkSpace} for operations like the
generation of terms. This memory stores \C{WORD}s and is used in a stack-like
fashion with the help of the pointer \C{AT.WorkPointer}. A function can write to
this memory and set \C{AT.WorkPointer} beyond the written data to insure that
other functions that are called and might use the workspace as well do not
overwrite this data. It is the responsibility of the function to reset
\C{AT.WorkPointer} to its original value again (see variable \C{OldWork} in our
case). Every thread in \TFORM\ will have its own private work space.
\FORM\ now uses \C{AT.WorkSpace} to build up a data structure that contains
everything that needs to be known at a later stage about the expression that is
parsed. The creation and the layout of the data is quite typical. First comes a
header that signifies what is coming. Here, it is \C{TYPEEXPRESSION}. Then comes
the length of the whole data, i.e. the total number of occupied \C{WORD}s. The
actual contents is following, which is a so-called subexpression that we will
discuss soon. The contents is followed by a coefficient and a zero, which
signifies the end of the data.
{\bf Coefficients} are coded in \FORM\ always in the following manner: Since
coefficients can in general be fractional numbers, we encode an integer
numerator and an integer denominator. The integers can have arbitrary length
(limited only by the buffer sizes, see the setup variables \C{MaxNumberSize} and
\C{MaxTermSize}) and are encoded in \C{WORD}-pieces in little-endian convention.
The number of allocated \C{WORD}s is always the same for the numerator and the
denominator. The last word of the coefficient contains the size of the whole
coefficient in words. The formal structure of a coefficients is therefore like
this:
\begin{center}
{\it NUMERATOR WORDS, DENOMINATOR WORDS, LENGTH}.
\end{center}
The integers are always
unsigned, i.e. positive. Negative fractions are encoded by a negative length.
Examples (with 16bit words): $2^{16}+2 = 65538$ gives words 2,1,1,0,5 and $-5/2$
gives $5,2,-3$.
The data structure in \C{AT.WorkSpace} is basically an instruction for the
generator, a central function that does the main work during the execution of
the \FORM\ program, to generate an expression. The content of the expression is
a subexpression. This is a pointer to the real content of the expression and
will be substituted later after the execution. The main reason for this delayed
expression insertion is that it can often save a lot of intermediate operations
and data space and thereby speed up \FORM. A case where such a thing can happen
is, when an expression is used at different places and the different parts are
brought together by some operations. Then, cancellations may occur or terms can
be factored out and when the expressions finally is inserted the workload is
less.
In our example run, the data that will later instruct the generator to
create an expression looks in total like this:
\begin{center}
{\it TYPEEXPRESSION, SUBEXPSIZE+3, 9, SUBEXPRESSION, \\
SUBEXPSIZE, 0, 1, AC.cbufnum, 1, 1, 3, 0}
\end{center}
We used the macro names as in the actual code. \C{AC.cbufnum} is a variable that
is the index of the compile buffer used for this parsed statement.
At the end of the data preparation phase the pointer \C{AT.WorkPointer} is set
beyond the data on the trailing zero, the pointer \C{AT.ProtoType}, which is
used soon in following functions is set to the word \C{SUBEXPRESSION}.
The expression will be put into the scratch buffer system. This system comprises
the small and large buffers and the scratch files. Where new data to the scratch
buffers will be stored is of no concern to a function like \C{DoExpr()}, it
simply uses several utility functions for that purpose. Still, we need to
initialize the variable \C{pos} here that will indicate the position of the
data, i.e. the expression, in the scratch file.
Next, the function \C{CompileAlgebra()} is called to parse the right hand side
and put the codified expression into the \FORM\ buffers. It basically calls two
functions: \C{tokenize} and \C{CompileSubExpressions}. \C{tokenize} is the
tokenizer that translates the input character string in a sanitized and partly
interpreted string of codes. It will look up the variables named in the input
string and put the index they have in the name administration into the tokenized
output. Our input string is transformed into the code string like this
\begin{verbatim}
( -13 LPARENTHESIS
x -1 TSYMBOL
5
+ -26 TPLUS
y -1 TSYMBOL
6
) -14 RPARENTHESIS
^ -25 TPOWER
2 -8 TNUMBER
2
- -27 TMINUS
( -13 LPARENTHESIS
x -1 TSYMBOL
5
+ -26 TPLUS
z -1 TSYMBOL
7
) -14 RPARENTHESIS
^ -25 TPOWER
`N' -8 TNUMBER
3
-29 TENDOFIT
\end{verbatim}
This code string then lies in the \C{AC.tokens} buffer where it is used by
subsequent functions.
The function \C{CompileSubExpression()} finds terms in an expression that might
be reused at another place and extracts them. As one can see in the code, the
function looks for terms in parentheses and works recursively. The end of such a
term is each time marked with \C{TENDOFIT}. Then, the function
\C{CodeGenerator()} called at the end of \C{CompileSubExpression()} does the
real work.
In our example \C{CodeGenerator()} first gets the data
\begin{center}
{\it LPARENTHESIS, TSYMBOL, 5, TPLUS, TSYMBOL, 6, TENDOFIT}
\end{center}
as a parameter, which is the term $x+y$. It builds up the actual term encoding
in the workspace and first reserves for that enough space there. One can see the
pointer arithmetic using constants like \C{AM.MaxTal}, which is the maximum
number of words a number can occupy. It reserves space for the coefficient, an
integer, and the actual term. Once a token is recognized, the equivalent term
data is written to the workspace and the function \C{CompleteTerm} is called.
This function completes the data to
\begin{center}
{\it 8, 1, 4, 5, 1, 1, 1, 3, 0}.
\end{center}
The first word is the total length, i.e. 8 words. This is the length of the
whole expression. The second word is the type of the term, which is a symbol. It
is the value \C{SYMBOL} as defined in \C{ftypes.h}. This macro definition
\C{SYMBOL} has the value 1 (in the \FORM\ version at this time this reference is
written). Following the type signifying word is the length of the term, which is
4. Several such terms could follow each other, but we only have one term at the
moment. Finally, we have the trailing words for the coefficient being 1 and a
terminating zero. The meaning and interpretation of the words in the data of a
single term after the type word and the length word are dependent on the type.
For symbols, we have pairs of word, where the first word is the index of the
symbol in the name administration and the second word is the exponent. Here we
have symbol 5 ($ = x$) with an exponent 1. After \C{CompleteTerm()} has
constructed the whole expression it copies the data to the compile buffers with
the help of the function \C{AddNtoC()}.
The compile buffers contain the instruction for the execution engine, the
\C{Processor()}, that will start when the \C{.sort} command is parsed. Our terms
are put into the right-hand-side buffers in the compile buffer. When the
\C{Processor()} will read these buffers one after the other, it will take the
terms and put them into the scratch buffer system. Then, they become the
expressions upon which further statements do act. The compile buffers are stored
in the list \C{AC.cbufList} and we get access to the elements via the cast
\C{((CBUF *)(AC.cbufList.lijst))}. This cast is defined as a preprocessor macro
called \C{cbuf}. The element \C{cbuf[0]->numrhs} (0 is the current compile
buffer we are using) gives the number of entries in \C{cbuf[0]->rhs}, which is
an array of pointer into \C{cbuf[0]->Buffer}. We have 3 elements:
\begin{verbatim}
cbuf[0]->rhs[1] -->
8, 1, 4, 5, 1, 1, 1, 3, 8, 1, 4, 6, 1, 1, 1, 3, 0
cbuf[0]->rhs[2] -->
8, 1, 4, 5, 1, 1, 1, 3, 8, 1, 4, 7, 1, 1, 1, 3, 0
cbuf[0]->rhs[3] -->
9, 6, 5, 1, 2, 0, 1, 1, 3, 9, 6, 5, 2, 3, 0, 1, 1, -3, 0
\end{verbatim}
\C{cbuf[0]->rhs[0]} is not used and the data lies consecutively in \\
\C{cbuf[0]->Buffer}. The meaning of the first two entries has already been
explained. These are expressions containing $x+y$ and $x+z$, respectively.
The last expression uses subexpressions that have the type \C{SUBEXPRESSION}
$ = 6$. The length of a subexpression is 5 and the contents $1,2,0$ means
that expression 1 needs to be inserted with an exponent of 2. The zero is a
dirty flag that signals to the processor the state of the subexpression. Here in
the compile buffers it is simply cleared to zero. The contents $2,3,0$ of the
second subexpression should be obvious. Finally, we have an negative
coefficient for the second subexpression which accounts for the minus sign
between the parentheses in our original expression.
We return to the function \C{DoExpr()} where the prototype of the expression is
put into the scratch system via the call \C{PutOut()} and we are finished with
this line in the input file. The next line defining a second local expression
works the same.
We come to the parsing of the following statements:
\begin{verbatim}
7
8 Brackets x;
9 Print;
\end{verbatim}
The bracket statement is dealt with in function \C{DoBrackets()}. It sets the
flag \C{AR.BracketOn} to 1 and constructs the term that will stand outside the
bracket. This term is copied into the \C{AT.BrackBuf} buffer, where it can be
used by the execution engine when it needs to insert this heading term into an
expression.
The print statement is parsed in function \C{DoPrint()}. Since we don't have any
arguments to \C{Print} all active expressions shall be printed.
\C{DoPrint()} just loops through the \C{Expressions} list and sets the
\C{printflag} to 1 for each expression.
With the next statement in our input file
\begin{verbatim}
10 .sort
\end{verbatim}
we will get to know the other central parts of \FORM: the processor and the
sorting routines. The code in the \C{PreProcessor()} will call
\C{ExecModule()}
which calls \C{DoExecute()}. We can ignore a lot of code there that is only for
parallelized versions of \FORM. There are three important functions calls
happening. First, \C{RevertScratch()} is called. \FORM\ uses three scratch
buffers: input buffer, output buffer, and the hide buffer. The usual mode of
operation is to apply statements on expressions in the input buffer, sort and
normalize the result, and write it into the output buffer. This repeats for
every executing module and therefore an important optimization is made: the
input buffer and the output buffer simply change their roles.
\C{RevertScratch()} does this job. The second and third important calls are to
\C{Processor()} and \C{WriteAll()}.
\C{Processor()} is, as the name suggests, the main processor that executes
statements and deals with the results. A lot of initialization work is done
before we go into the large loop over the expressions that spans almost the
whole function. Our expressions have as regular expressions from the scratch
buffers the \C{inmem} flag set to zero, so we go into the else branch of the
checking if-clause. There we go to the case of a \C{LOCALEXPRESSION}. The main
logic here is to do a single call to \C{GetTerm()} to get the first term from
the input file and copy that to the output with the call to \C{PutOut()}. This
first term, which is a subexpression, serves as a header for the expression. It
follows a (while-)loop that calls \C{GetTerm()}, and if there are still terms,
the loop executes its body and calls \C{Generator()}. After this loop, some
clean-up and a final \C{EndSort()} is done, before the outer loop over the
expressions repeats. \C{Generator()} is the function where the read input, which
is {\it 9, 6, 5, 3, 1, 0, 1, 1, 3}, will be substituted and expanded.
\C{Generator()} gets the term in the workspace and first tries to do all
substitutions (\C{SUBEXPRESSION}), then applies the statements in the compile
buffers to the normalized terms, substitutes again if necessary, do brackets,
and finally sorts the result.
The call to \C{TestSub()} does the search for subexpressions. \C{TestSub()} will
find a subexpression in our case and return the number (3) of this subexpression
and set other global variables ready for the following steps. In \C{Generator()}
we enter therefore the if-clause checking \C{replac}$> 0$. Depending on the
power of the subexpression different operations are taken. We have our
subexpression to the power one only, which is an easy case. The actual
substitution is performed by the function \C{InsertTerm()}. Since the new term
might again contain subexpressions we do a recursive call to \C{Generator()}.
Our expression contains several layers of subexpressions which are all dealt
with as described above. Only the powers of the other subexpressions are
different from one, so we get slightly more work to be done which involves the
expansion of the terms using binomials.
Finally, the call to \C{TestSub()} at the beginning of \C{Generator()} will
return zero. The function \C{Normalize()} is called, which puts the terms in a
canonical form, i.e. terms are ordered and collected with the correct
coefficient. In our example, as the first fully subsituted term we have
{\it 12, 1, 4, 6, 1, 1, 4, 6, 1, 1, 1, 3} before the call to
\C{Normalize()}, which means we have a term $x*x$. \C{Normalize()}
makes this into {\it 8, 1, 4, 6, 2, 1, 1, 3}, which is $x^2$.
Then, we loop over the statements in the compile buffer. \C{level} is the
instruction counter. We have a long switch-clause that interprets the statement
type identifiers like \C{TYPECOUNT}. Statements with \C{TYPEEXPRESSION} are not
treated here. So we loop over all the compile buffer statements here and only
call \C{TestMatch()} at the loop's end. This function has no effect in our
example, because we have no pattern matching going on.
Then, the function \C{PutBracket()} is called to deal with brackets. Brackets
are implemented by putting the special code \C{HAAKJE} inside the expression.
The terms before the \C{HAAKJE} are outside the bracket, everything following it
will be inside the bracket.
At the end of the loop over the terms in the expressions, the function
\C{StoreTerm()} is called. This function puts the result of the processing in
the output scratch buffers. Finally, we return to \C{Processor()}. There the
final sorting is started. Also, the printing of the expressions is done here.
The parsing in \C{PreProcessor()} continues with
\begin{verbatim}
11
12 #do i=2,3
13 Id x?^`i' = x;
14 #enddo
\end{verbatim}
Here we have a somewhat more complicated example of preprocessor instructions.
The do-loop is treated in \C{DoDo()} which sets up data structures (\C{DOLOOP})
to guide the preprocessor when it is parsing the loop body. The statement line
will then be presented to the compiler two times and with the correct values of
the preprocessor variable \C{i}. The compiler deals with this statement in
\C{CoId()} which is just calling \C{CoIdExpression()}. \C{CoIdExpression()}
puts a \C{TYPEIDNEW} code into the lhs compile buffer. This tells the processor
later how to do the pattern matching. The rhs is the term \C{x} that will be
inserted.
The parsing continues and ends with
\begin{verbatim}
15
16 Print +s;
17 .end
\end{verbatim}
The way these statements are treated and how the program is executed has already
been described. The pattern matching is something that has not occurred before,
though. We will not describe it here, since there is a dedicated section in this
manual for that. After the final sorting, \FORM\ will clean up tempory files and
other resources that are not automatically freed by the operating system before
\FORM\ ends itself.
|