1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254
|
\section{Overview of the source code}
\label{sec:source}
Here we will discuss general aspects of the source code, i.e. the files contained in the directory
\C{sources}.
\FORM\ is written in ANSI C. The code is split up in header files \C{*.h} and source files
\C{*.c}. Files usually don't come in pairs of a header file with the declarations and a source file
with the definitions, but instead most declarations are collected in a few headers. The declaration
of function headers is done in \C{declare.h} for example. The most prominent exceptions are
\C{parallel.h} and \C{minos.h}.
Each file usually contains many hundred lines of code. To make the files more accessible, the code
is structure by so--called folds. If you use the editor STedi, the code will be visualized
correctly. If you use a vi--compatible editor, it is advisable to activate folds and set the
foldmarkers to \C{set foldmarker=\#[,\#]}
% Folds in Emacs anybody??
\subsection{The header files}
% INDENTATION HACK to be improved!
$\quad\;\:$\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{declare.h} & Contains the declarations of all publicly relevant functions as
well as of commonly used macros like \C{NCOPY} or \C{LOCK}. \\
\C{form3.h} & Global settings and macro definitions like word size or version
number. It includes several different system
header files depending on the computer's architecture.\\
\C{fsizes.h} & Defines macros that determine the size and layout of \FORM's internal data like the
sizes of the work buffers etc. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{ftypes.h} & Contains preprocessor definitions of the codes used in the internal representation of
parsed input and expressions. \\
\C{fwin.h} & Special settings for the Windows operating system. \\
\C{inivar.h} & Contains the initialization of various global data like the
\FORM\
function names or the character table for parsing. It also defines the global
struct \C{A}, and for \TFORM\ the struct pointer \C{AB}. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{minos.h} & Dedicated header to the minos.c source file. \\
\C{parallel.h} & Dedicated header to the parallel.c source file. \\
\C{portsignals.h} & Preprocessor definition of the OS signals \FORM\ can deal with. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{structs.h} & Defines the structs that contain almost all of
\FORM's internal data. \\
\C{unix.h} & Special definitions for Unix--like operating systems. \\
\C{variable.h} & Some convinience preprocessor definitions to ease the access to
global variables, like \C{cbuf} or \C{AC}. \\
\end{tabular}
\subsection{The source files}
% INDENTATION HACK to be improved!
$\quad\;\:$\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{argument.c} & Code for the \C{argument} and \C{term}
\FORM\ statements. \\
\C{bugtool.c} & Low-level debugging code. \\
\C{checkpoint.c} & Code to test for checkpoint conditions, to create
snapshots, and to recover from snapshot data. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{comexpr.c} & Functions the compiler calls to translate a statement that
involves an algebraic expression, e.g. \C{Local} or \C{Id}. \\
\C{compcomm.c} & Functions the compiler calls to translate a statement that
neither involves an algebraic expression nor is a variable declaration. \\
\C{compiler.c} & Main compiler code. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{compress.c} & Code for GZIP (de-)compression in sort files. \\
\C{comtool.c} & Utility functions for the compiler, like \C{AddRHS}. \\
\C{dollar.c} & Code dealing with dollar variables. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{execute.c} & Code for the execution phase of a module. Also, code dealing
with brackets in \FORM\ expressions. \\
\C{extcmd.c} & External command code. \\
\C{factor.c} & Simple factorizing code for dollar variables and expressions. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{findpat.c} & Pattern matching for symbols and dot products. \\
\C{function.c} & Pattern matching for functions. \\
\C{if.c} & Code for the \C{if} statement. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{index.c} & Code for bracket indexing. \\
\C{lus.c} & Code to find loops in index contractions. \\
\C{message.c} & Text output functions, like \C{MesPrint} or \C{PrintTerm}. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{minos.c} & The minos database. \\
\C{module.c} & Code for module execution and the \C{moduleoption}, \C{exec} and
\C{pipe} statements. \\
\C{mpi2.c} & MPI2 code for \PARFORM. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{mpi.c} & MPI1 code for \PARFORM. \\
\C{names.c} & Name administration code to deal with the declaration of
\FORM\ variables. \\
\C{normal.c} & Code to normalize terms, i.e. bring them to standard form. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{opera.c} & Code for doing traces, contractions, and tensor conversions. \\
\C{optim.c} & Code to optimize FORTRAN or C output. \\
\C{parallel.c} & \PARFORM\ (MPI-independant code). \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{pattern.c} & General pattern matching and substitution. \\
\C{poly.c} & Code for polynomial arithmetic (experimental). \\
\C{polynito.c} & Code for polynomial arithmetic and manipulation. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{pre.c} & The preprocessor. \\
\C{proces.c} & The central processor. \\
\C{ratio.c} & Partial fractioning and summing functions. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{reken.c} & Code for numerics. \\
\C{reshuf.c} & Utility functions for the renumbering of dummy indices, and for
statements like \C{shuffle}, \C{stuffle}, \C{multiply}. \\
\C{sch.c} & Code for the textual output of terms and expressions. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{setfile.c} & Code to deal with setup parameters and setup files. \\
\C{smart.c} & Code doing optimized pattern matching. \\
\C{sort.c} & Code for the sorting of expressions. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{startup.c} & Start of program (\C{main()}). Code for the startup and shutdown
phase of \FORM. \\
\C{store.c} & Code to read from disk or write to disk terms and expressions.
Also, store file and save file management. \\
\C{symmetr.c} & Pattern matching for functions with symmetric properties. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{tables.c} & Code for the tablebases. \\
\C{threads.c} & \TFORM. Almost all of the \TFORM\ specific code. \\
\C{token.c} & The tokenizer. \\
\end{tabular}
\begin{tabular}{p{0.2\textwidth}p{0.65\textwidth}}
\C{tools.c} & Utility functions to deal with streams, files, strings, memory
management, and timers. \\
\C{unixfile.c} & Wrapper functions for UNIX file I/O functions. \\
\C{wildcard.c} & Code for wildcards.
\end{tabular}
\subsection{The global structs}
\FORM\ keeps its data organized in several global structs. These structs are defined in
\C{structs.h} (in the fold \C{A}) and come by the names \C{M\_const}, \C{P\_const}, \ldots. The
various global variables are grouped in these structs according to their r\^ole in the
program. The fold commentaries give details on this. \C{M\_const} is for global settings at startup
and \C{.clear}, for example.
The various structs are collected in the struct \C{AllGlobals}. In the case of sequential \FORM,
this struct is made into the type \C{ALLGLOBALS}, and in \C{inivar.h}, the global variable \C{A} is
defined having this type. This global variable \C{A} holds all the data defined in the various
structs. In \C{variable.h} several macros are defined to simplify (and more importantly unify) the
access to the struct elements. For example, one can access the variable \C{S0} in \C{T\_const} as
\C{AT.S0}.
With the multi-threaded version \TFORM\ things are a little bit more complicated, because some data
needs to be replicated and made private for each thread. This kind of data is situated in the
structs \C{N\_const}, \C{R\_const}, and \C{T\_const}. For \TFORM, these structs are collected in the
struct \C{AllPrivates} (which makes up the type \C{ALLPRIVATES}), all other structs go into the
\C{AllGlobals} struct. The global variable \C{A} now contains only the non-thread specific data. For
each thread a \C{AllPrivates} struct is dynamically allocated and the global pointer variable (in
\C{inivar.h}) \C{AB} holds their references. \C{AB} is an array of pointers where the index
corresponds to the thread number. The macros defined in \C{variable.h} to access the global struct
data are made such that they transparently work with the \C{AB} array. The user doesn't need to care
about these details and can still write as in the previous example \C{AT.S0}. This keeps the code
of sequential \FORM\ and multi-threaded \TFORM\ uniform.
The only small price one has to pay to make this uniform access by macros possible is to make sure
every function in \FORM\ knows in which thread it is executed. The \C{AN}, \C{AR}, and \C{AT} macros
use a variable \C{B}, which is set to the correct entry in \C{AB} by one of two ways. First, a
function can use the macro \C{GETIDENTITY} (defined in \C{declare.h}). In \TFORM\, it calls
\C{WhoAmI()} to get the thread number, declares the pointer \C{B}, and sets \C{B} to point to the
correct entry in \C{AB}. In sequential \FORM\ this macro is empty. The second way is to get the
variable \C{B} as a parameter from the caller. For this method the macros \C{PHEAD}, \C{PHEAD0},
\C{BHEAD}, and \C{BHEAD0} exist (defined in \C{ftypes.h}), which can be used in the parameter list of
the function declarations. The variants with a zero differ only by not including a trailing comma,
which is not allowed if no other parameters are following in the declaration. Usually, \C{PHEAD} is
used in the declaration (it includes type information), while \C{BHEAD} appears in the calling of
functions. Which way to set \C{B} is chosen, depends on the use of the function. The \C{PHEAD} method
is faster than \C{GETIDENTITY} and should be preferred in functions that are called very often. On
the other hand, \C{GETIDENTITY} is more general as it does not rely on every caller to supply \C{B}.
The elements of the structs are of various types. Some types are just simple macros mapping directly
to built-in types (see \C{form3.h}) like \C{WORD}, others are names for structs that are defined
(mostly) in \C{structs.h}. Often, variables of the same type are grouped together to help the
compiler with alignment. Also, a lot of structs use macros like \C{PADLONG} (\C{unix.h} or
\C{fwin.h}) to pad a struct such that its size is a multiple of a built-in type size. This again
is to help with the data alignment.
Most struct elements have comments that explain their use. These commentaries often include
the information where this element was once located in the old version 2 of \FORM\ (it is the pair
of parentheses with or without a capital letter inside). Pointers come in two flavors: Some
pointers reference a dynamically allocated piece of memory, basically owning this memory. Others
just reference another variable or point into allocated memory. The first kind is usually marked
with \C{[D]} for easy identification. These pointers often need to be treated particularly, e.g. during the
snapshot creation, when recovering, or when shutting down.
During start up (\C{main()}), all the memory of these global structs, i.e. their element variables, is
initialized to zero.
\subsection{Configuration}
The source code evaluates several preprocessor definitions that can be defined by the user.
According to these definitions the executable can be configured in different ways. As a default, the
sequential version of \FORM\ is generated. But if, for example, the preprocessor variable
\C{WITHPTHREADS} is defined, the multi-threaded version \TFORM\ will be compiled. These preprocessor
variables can be set when calling the compiler, like
\C{gcc -c -DWITHPTHREADS -o pre.o pre.c}
The most commonly considered preprocessor variables are: \\ \C{WITHPTHREADS}, \C{PARALLEL},
\C{WITHZLIB}, \C{WITHGMP}, \C{WITHSORTBOTS}, \C{LINUX}, \\ \C{OPTERON}, \C{DEBUGGING}. The first two
change the flavor of the executable: \TFORM\ or \PARFORM. The next two configure whether \FORM\ uses
the zlib library for compression during sorts or the GMP library for arbitrary precision arithmetics.
The next decides whether \FORM\ uses dedicated sorting threads in \TFORM. \C{LINUX}
specifies that the executable is to be compiled for a Linux or UNIX compliant operating system. An
alternative here would be to set the variable \C{ALPHA} or \C{MYWIN64} instead, but these builds are
less common. \C{OPTERON} has to be set if one compiles a 64bit executable. \C{DEBUGGING} enables
some features for a non-release debugging version of the executable (commonly named \C{vorm} or
\C{tvorm}).
When using the autoconf setup, the settings concerning the operating system, architecture (32/64bit), and
flavor of the executable are automatically done right. Additional settings like \C{WITHZLIB} can be
changed by manually editing the file \C{config.h}, which is included in \C{form3.h}.
Version numbers and production date can also be set, but then one either needs to edit the
appropriate lines in \C{form3.h} when in a manual compiling setup, or by editing \C{configure.ac} in
an autoconf setup.
|