File: codestyle.tex

package info (click to toggle)
infernal 1.1.5-3
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 74,208 kB
sloc: ansic: 230,749; perl: 14,433; sh: 6,147; makefile: 3,071; python: 1,247
file content (1804 lines) | stat: -rw-r--r-- 69,606 bytes
parent folder | download | duplicates (5)

This chapter describes Easel from a developer's perspective. It shows
how a module's source code is organized, written, tested, and
documented. It should help you with implementing new Easel code, and
also with understanding the structure of existing Easel code.

We expect Easel to constantly evolve, both in code and in style.
Talking about our code style does not mean we enforce foolish
consistency. Rather, the goal is aspirational; one way we try to
manage the complexity of our growing codebase is to continuously
cajole Easel code toward a clean and consistent presentation. We try
to organize code modules in similar ways, use certain naming
conventions, and channel similar functions towards common
\esldef{interfaces} that provide common calling conventions and
behaviors.

But because it evolves, not all Easel code obeys the code style
described in this chapter. Easel code style is like a local building
ordinance. Any new construction should comply. Older construction is
grandfathered in and does not have to immediately conform to the
current rules. When it comes time to renovate, it's also time to bring
the old work up to the current standards.

For a concrete example we will focus primarily on one Easel module,
the \eslmod{buffer} module. We'll take a bottom up approach, starting
from the overall organization of the module and working down into
details. If you're a starting developer, you might have preferred a
bottom-up description; you might just want to know how to write or
improve a single Easel function, for example. In that case, skim
ahead.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Table: Easel naming conventions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{table}
\begin{minipage}{\textwidth}
\begin{tabular}{l>{\raggedright}p{3.5in}l}
\textbf{What}        & \textbf{Explanation}              & \textbf{Example} \\ \hline
Easel module
  &
    Module names should be 10 characters or less.\footnote{sqc assumes
    this in output formatting, for example.}
    Many modules are organized around a single Easel object
    that they implement. The name of the module matches the
    name of the object. For example, \ccode{esl\_buffer.c} implements \ccode{ESL\_BUFFER}.
  & \eslmod{buffer} \\ \\

tag name
  & Names in the module are constructed either using the module's full
    name or sometimes with a shorter abbreviation, usually 3
    characters (sometimes 2 or 4).
  & \ccode{buf} \\ \\

source file
  & Each module has one source file, named \ccode{esl\_}\itcode{modulename}\ccode{.c}.
  & \ccode{esl\_buffer.c} \\ \\

header file
  & Each module has one header file, named \ccode{esl\_}\itcode{modulename}\ccode{.h}.
  & \ccode{esl\_buffer.h} \\  \\

documentation 
  & Each module has one documentation chapter, named \ccode{esl\_}\itcode{modulename}\ccode{.tex}.
  & \ccode{esl\_buffer.tex} \\ \\

Easel object          
  & Easel ``objects'' are typedef'ed C structures (usually) or
    types (rarely\footnote{\ccode{ESL\_DSQ} is a \ccode{uint8\_t}, for example.}).
  & \ccode{ESL\_BUFFER} \\ \\  

external function 
  & All exposed functions have tripartite names \ccode{esl\_}\itcode{module}\ccode{\_specificname}().
    The specific part of function names often adhere to a standardized API
    ``interface'' nomenclature. (All \ccode{\_Open()} functions must follow the same standardized
    behavior guidelines, for example.) Functions in the base \ccode{easel.c} module
    have a bipartite name, omitting the module name. The specific 
    name part generally uses mixed case capitalization.
  & \ccode{esl\_buffer\_OpenFile()} \\ \\

static function 
  & Internal functions (static within a module file) drop the
    \ccode{esl\_} prefix, and are 
    named \itcode{modulename}\ccode{\_function}.
  & \ccode{buffer\_refill()} \\ \\

macro 
  & Macros follow the same naming convention as external functions,
    except they are all upper case.
  & \ccode{ESL\_ALLOC()} \\ \\ 

defined constant
  & Defined constants in Easel modules are named
    \ccode{esl}\itcode{MODULENAME}\ccode{\_FOO}. Constants defined
    in the base \ccode{easel.h} module are named just 
    \ccode{eslFOO}.
   & \ccode{eslBUFFER\_SLURPSIZE}\\ \\

return codes
  & Return codes are constants defined in \ccode{easel.h}, so 
    they obey the rules of other defined constants in the base module (\ccode{eslOK},
    \ccode{eslFAIL}). Additionally, error codes start with
    \ccode{E}, as in \ccode{eslE}\itcode{ERRTYPE}.
  & \ccode{eslENOTFOUND} \\ \\

config constant
  & Constants that don't start with \ccode{esl} are almost always 
    configuration (compile-time) constants determined by the autoconf
    \ccode{./configure} script and defined in \ccode{esl\_config.h}.
  & \ccode{HAVE\_STDINT\_H} \\ \\
\end{tabular}
\end{minipage}
\caption{\textbf{Easel naming conventions.} }
\end{table}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{An Easel module}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Each module consists of three files: a .c C code file, a .h header
file, and a .tex documentation file. These filenames are constructed
from the module name. For example, the \eslmod{buffer} module is
implemented in \ccode{esl\_buffer.c}, \ccode{esl\_buffer.h}, and
\ccode{esl\_buffer.tex}.

%%%%%%%%%%%%%%%%
\subsection{The .c file}
%%%%%%%%%%%%%%%%

Easel \ccode{.c} files are larger than most coding styles would
advocate. Easel module code is designed to be \emph{read}, to be
\emph{self-documenting}, to contain its own \emph{testing methods},
and to provide useful \emph{working examples}.  Thus the size of the
files is a little deceptive, compared to C code that's solely
implementating some functions. In general, only about a a quarter of
an Easel module's \ccode{.c} file is the actual module implementation.
Typically, around half of an Easel \ccode{.c} file is documentation,
and much of this gets automatically parsed into the PDF userguide. The
rest consists of drivers for unit testing and examples.

Module files are organized into a somewhat stereotypical set of
sections, to facilitate navigating the code, as follows.

The \ccode{.c} file starts with a comment that contains the {\bfseries
  table of contents}. The table of contents helps us navigate a long
Easel source file. This initial comment also includes a short
description of the module's purpose. It may also contain miscellaneous
notes.

For example, from the \eslmod{buffer} module:

\input{cexcerpts/header_example}

None of this is parsed automatically. Its structure is just
convention.

The short description lines in the table of contents match section
headings in comments later in the file. A search forward with the text
of a heading will move you to that section of the code.

Next come the {\bfseries includes} and any {\bf definitions}. Of the
include files, the \ccode{esl\_config.h} header must always be
included first. It contains platform-independent configuration code
that may affect even the standard library header files. Standard
headers like \ccode{stdio.h} come next, then Easel's main header
\ccode{easel.h}; then headers of any other Easel modules this module
depends on, then the module's own header. For example, the
\ccode{\#include}'s in the \eslmod{buffer} module look like:

\input{cexcerpts/include_example}

Next come the {\bfseries private function declarations}.  We declare
all private functions at the top of the file, where they can be seen
easily by a developer who's casually reading the source. Their
definitions are buried deeper, in one or more sections following the
implementation of the exposed API.

\input{cexcerpts/statics_example}

The rest of the file is the {\bfseries code}. It is split into
sections. Each section is numbered and given one-line titles that
appear in the table of contents.  Each section starts with a section
header, a comment block in front of each code section in the
\ccode{.c} file.  These section headers match comments in front of
that section's declarations in the \ccode{.h} file. Because of the
numbering and titling, a particular section of code can be located by
searching on the number or title.  A common section structure includes
the following, in this order:


\begin{description}
\item[\textbf{The \ccode{FOOBAR} object.}]
  The first section of the file provides the API for creating and
  destroying the object that this module implements.

\item[\textbf{The rest of the API.}]
  Everything else that is part of the API for this module.
  This might be split across multiple sections.

\item[\textbf{Debugging/dev code.}]
  Most objects can be validated or dumped to an output stream
  for inspection.

\item[\textbf{Private functions.}]
  Easel isn't rigorous about where private (non-exposed) functions go,
  but they often go in a separate section in about the middle of the
  \ccode{.c} file, after the API and before the drivers.

\item[\textbf{Optional drivers}] Stats, benchmark, and regression
  drivers, if any. 

\item [\textbf{Unit tests.}]
  The unit tests are internal controls that test that the module's API
  works as advertised.

\item [\textbf{Test driver.}]
  All modules have an automated test driver is a \ccode{main()} that
  runs the unit tests.
 
\item [\textbf{Examples.}]
  All modules have at least one \ccode{main()} showing an example of
  how to use the main features of the module.

\end{description}

%%%%%%%%%%%%%%%%
\subsection{The .h file}
%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%
\subsection{Special syntax in Easel C comments}
%%%%%%%%%%%%%%%%

Easel comments sometimes include special syntax recognized by tools other
than the compiler.  Here are some quick explanations of the special
stuff a developer needs to be aware of. 

\begin{table}
\begin{tabular}{l>{\raggedright}p{3.5in}l}
\textbf{Special syntax}  & \textbf{Description}  & \textbf{Parsed by}\\ \hline

\ccode{/* Function: }\itcode{funcname} 
  & Function documentation that gets converted to \LaTeX\ and included
    in Easel's PDF documentation.
  & \emcode{autodoc} \\ \\

\ccode{ *\# }\itcode{x.\ secheading} 
  & Section heading corresponding to section number x in a \ccode{.c}
    file's table of contents. This is automatically extracted as part
    of creating a summary table in the PDF documentation.
  & \emcode{autodoc -t} \\ \\

\ccode{/*::cexcerpt::} ...
  & Comments that marking beginning/end of code that is extracted
    verbatim into the documentation.
  & \emcode{cexcerpt} \\ \\

\hline
\end{tabular}
\caption{{\bfseries Summary of special syntax in Easel C comments.}}
\end{table}

%%%%
\subsubsection{function documentation}
%%%%

Any comment that starts with
\begin{cchunk}
/* Function:  ...
\end{cchunk}
will be recognized and parsed by our \prog{autodoc} program, 
which assumes it is looking at a structured function documentation
header.

See section XX for details on how these headers work.

We want all external functions in the Easel API to be documented
automatically by \prog{autodoc}. We don't want internal functions tp
appear in the documentation, but we do want them documented in the
code.  To keep \prog{autodoc} from recognizing the function header of
an internal (static) function, we just leave off the \ccode{Function:}
tag in the comment block.   

%%%%
\subsubsection{section headings}
%%%%

The automatically generated \LaTeX\ code for a module's documentation
includes a table summarizing the functions in the exposed API. This
table is constructed automatically from the source code by
\prog{autodoc -t}. The list of functions in this table is extracted
from the function documentation (above). The table is broken into
sections, just as the module code is, using section headings. The
comment block marking the start of a section heading for exposed API
code has an extra \ccode{\#}:

\begin{cchunk}
/*****************************************************************
 *# 1. ESL_BUFFER object: opening/closing.
 *****************************************************************/
\end{cchunk}

Section headings for internal functions omit the \ccode{\#}, and
\prog{autodoc} ignores them:

\begin{cchunk}
/*****************************************************************
 * 10. Unit tests
 *****************************************************************/
\end{cchunk}

%%%%
\subsubsection{excerpting}
%%%%

This book includes many examples of C code extracted verbatim from
Easel source.  These {\bfseries excerpts} are marked with specially
formatted comments in the C file:

\begin{cchunk}
/*::cexcerpt::my_example::begin::*/
   while (esl_sq_Read(sqfp, sq) == eslOK)
     { n++; }
/*::cexcerpt::my_example::end::*/
\end{cchunk}

When we build the Easel documentation from its source, our
\prog{cexcerpt} program extracts all marked excerpts from \ccode{.c}
and \ccode{.h} files, and places them in individual files in a
temporary \ccode{cexcerpts/} directory, from where they are included
in the main \LaTeX documentation.



%%%%%%%%%%%%%%%%
\subsection{Driver programs}
%%%%%%%%%%%%%%%%

An unusual (innovative?) thing about Easel modules is how we embed
{\bfseries driver programs} directly in the module's \ccode{.c}
file. Driver programs include our unit tests, benchmarks, and working
examples. These small programs are enclosed in standardized
\ccode{\#ifdef}'s that enable them to be conditionally compiled.

None of these programs are installed by \ccode{make install}.  Test
drivers are compiled as part of \ccode{make check}.  A \ccode{make
  dev} compiles all driver programs.

There are six main types of drivers used in Easel:

\begin{description} 

\item[\textbf{Unit test driver(s).}] (Mandatory.) Each module has one (and only one)
  \ccode{main()} that runs the unit tests and any other automated for
  the module. The test driver is compiled and run by the testsuite in
  \ccode{testsuite/testsuite.sqc} when one does a \ccode{make check}
  on the package. It is also run by several of the automated tools
  used in development, including the coverage (\ccode{gcov}) and
  memory (\ccode{valgrind}) tests. A test driver takes no arguments
  (it must generate any input files it needs). If it succeeds, it
  returns 0, with no output. If it fails, it returns nonzero and calls
  \ccode{esl\_fatal()} to issue a short error message on
  \ccode{stdout}. Our test harness, \emcode{sqc}, depends on these
  output and exit status conventions. Optionally, it may use a flag
  to show more useful output when it's run more interactively.
  (usually a \ccode{-v}, for verbose).
  The test driver is enclosed by
  \ccode{\#ifdef esl}\itcode{MODULE}\ccode{\_TESTDRIVE} for
  conditional compilation.

\item[\textbf{Regression/comparison test(s).}] (Optional.) These tests
  link to one or more libraries that provide identical comparable
  functionality, such as previous versions of Easel, the old
  \prog{SQUID} library, \prog{LAPACK} or the GNU Scientific Library.
  They test that Easel's functionality performs at least as it used
  to, or as well as the 'competition'. These tests are run on demand,
  and not included in automated testing, because the other libraries
  may only be present on a subset of our development machines. They
  are enclosed by \ccode{\#ifdef
    esl}\itcode{MODULE}\ccode{\_REGRESSION} for conditional
  compilation.

\item[\textbf{Benchmark(s).}] (Optional.) These tests run a
  standardized performance benchmark and collect time and/or memory
  statistics. They may generate output suitable for graphing. They are
  run on demand, not by automated tools. They typically use 
  \eslmod{stopwatch} for timing. They are enclosed by
  \ccode{\#ifdef esl}\itcode{MODULE}\ccode{\_BENCHMARK}  for
  conditional compilation.

\item[\textbf{Statistics generator(s).}] (Optional.) These tests collect
  statistics used to characterize the module's scientific performance,
  such as its accuracy at some task. They may generate graphing
  output. They are run on demand, not by automated tools. They are
  enclosed by \ccode{\#ifdef esl}\itcode{MODULE}\ccode{\_STATS}
  for conditional compilation.

\item[\textbf{Experiment(s).}] (Optional.) These are other reproducible
  experiments we've done on the module code, essentially the same as
  statistics generators. They are
  enclosed by \ccode{\#ifdef esl}\itcode{MODULE}\ccode{\_EXPERIMENT}
  for conditional compilation.

\item[\textbf{Example(s).}] (Mandatory). Every module has at least one example
  \ccode{main()} that provides a ``hello world'' level example of
  using the module's API. Examples are enclosed in \ccode{cexcerpt}
  tags for extraction and verbatim inclusion in the documentation.
  They are enclosed by \ccode{\#ifdef esl}\itcode{MODULE}\ccode{\_EXAMPLE} 
  for conditional compilation.
\end{description}  

All modules have at least one test driver and one example. Other tests
and examples are optional. When there is more than one \ccode{main()}
of a given type, the additional tags are numbered starting from 2: for
example, a module with three example \ccode{main()'s} would have three
tags for conditional compilation, \ccode{eslFOO\_EXAMPLE},
\ccode{eslFOO\_EXAMPLE2}, and \ccode{eslFOO\_EXAMPLE3}.

The format of the conditional compilation tags for all the drivers
(including test and example drivers) must be obeyed. Some test scripts
are scanning the .c files and identifying these tags
automatically. For instance, the driver compilation test identifies any
tag named
\ccode{esl}\itcode{MODULENAME}\ccode{\_\{TESTDRIVE,EXAMPLE,REGRESSION,BENCHMARK,STATS\}*}
and attempt to compile the code with that tag defined.

Which driver is compiled (if any) is controlled by conditional
compilation of the module's \ccode{.c} file with the appropriate
tag. For example, to compile and run the \eslmod{sqio} test driver as
a standalone module:

\begin{cchunk}
   %  gcc -g -Wall -I. -o esl_sqio_utest -DeslSQIO_TESTDRIVE esl_sqio.c easel.c -lm
   %  ./esl_sqio_utest
\end{cchunk}

or to compile and run it in full library configuration:

\begin{cchunk}
   %  gcc -g -Wall -I. -L. -o esl_sqio_utest -DeslSQIO_TESTDRIVE esl_sqio.c -leasel -lm
   %  ./esl_sqio_utest
\end{cchunk}


\begin{table}
\begin{tabular}{llll}
\textbf{Driver type}     &  \textbf{Compilation flag}                       & \textbf{Driver program name}                     & \textbf{Notes}\\ \hline
Unit test                &  \ccode{esl}\itcode{MODULE}\ccode{\_TESTDRIVE}   & \ccode{esl\_}\itcode{module}\ccode{\_utest}      & output and exit status standardized for \emcode{sqc}\\
Regression test          &  \ccode{esl}\itcode{MODULE}\ccode{\_REGRESSION}  & \ccode{esl\_}\itcode{module}\ccode{\_regression} & may require other libraries installed\\
Benchmark                &  \ccode{esl}\itcode{MODULE}\ccode{\_BENCHMARK}   & \ccode{esl\_}\itcode{module}\ccode{\_benchmark}  & \\
Statistics collection    &  \ccode{esl}\itcode{MODULE}\ccode{\_STATS}       & \ccode{esl\_}\itcode{module}\ccode{\_stats}      & \\
Experiment               &  \ccode{esl}\itcode{MODULE}\ccode{\_EXPERIMENT}  & \ccode{esl\_}\itcode{module}\ccode{\_experiment} & \\
Example                  &  \ccode{esl}\itcode{MODULE}\ccode{\_EXAMPLE}     & \ccode{esl\_}\itcode{module}\ccode{\_example}    & \\
\end{tabular}
\caption{{\bfseries Summary of types of driver programs in Easel.}}
\end{table}









%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Writing an Easel function}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Documentation of functions, particularly in the structured comment
header that's parsed by the \emcode{autodoc} program, is described in
a different section of its own.

%%%%
\subsubsection{conventions for function names}
%%%%

Function names are tripartite, constructed as
\ccode{esl\_}\itcode{moduletag\_funcname}.  

The \itcode{moduletag} should generally be the module's full name;
sometimes (historically) it is an abbreviated tag name for the module
(such as \ccode{abc} for the \eslmod{alphabet} module); on occasion,
it is the name of an Easel object or datatype that has not yet budded
off into its own module. Long versus short \itcode{moduletag}'s are
sometimes used to indicate functions that operate directly on objects
via common interfaces, versus other functions in the exposed API. The
long form may indicate functions that obey a common interface, such as
\ccode{esl\_alphabet\_Create()}.\footnote{This is a clumsy C version
  of what C++ would do with namespaces, object methods, and
  constructors/destructors.} Miscellaneous exposed functions in the API
  of a module may be named by the three-letter short tag, such as
  \ccode{esl\_abc\_Digitize()}.

The function's \ccode{\{funcname\}} can be anything. Some names
are standard and indicate the use of a common {\bfseries interface}.
This part of the name is usually in mixed-case capitalization.

Only exposed (\ccode{extern}) functions must follow these rules. In
general, private (\ccode{static}) functions can have any
name. However, it's common in Easel for private functions to obey the
same naming conventions except without the \ccode{esl\_} prefix.

Sometimes essentially the same function must be provided for different
data types. In these cases one-letter prefixes are used to indicate
datatype:

\begin{tabular}{ll}
\ccode{C} & \ccode{char} type, or a standard C string \\
\ccode{X} & \ccode{ESL\_DSQ} type, or an Easel digitized sequence\\
\ccode{I} & \ccode{int} type \\
\ccode{F} & \ccode{float} type \\
\ccode{D} & \ccode{double} type \\
\end{tabular}

For example, \eslmod{vectorops} uses this convention heavily;
\ccode{esl\_vec\_FNorm()} normalizes a vector of floats and
\ccode{esl\_vec\_DNorm()} normalizes a vector of doubles.  A second
example is in \eslmod{randomseq}, which provides routines for shuffling
either text strings or digitized sequences, such as
\ccode{esl\_rsq\_CShuffle()} and \ccode{esl\_rsq\_XShuffle()}.

%%%%
\subsubsection{conventions for argument names}
%%%%

When using pointers in C, it can be hard to tell which arguments are
for input data (which are provided by the caller and will not be
modified), output data (which are created and returned by the
function), and modified data (which are both input and output).  

For output consisting of pointers to nonscalar types such as objects
or arrays, it also can be hard to distinguish when the caller is
supposed to provide pre-allocated storage for the result, versus the
storage being newly allocated by the function.\footnote{A common
strategy in C library design is to strive for \emph{no} allocation in
the library, so the caller is always responsible for explicit
alloc/free pairs. I feel this puts a tedious burden of allocation code
on an application.}

When functions return more than one kind of result, it is convenient
to make all the individual results optional, so the caller doesn't
have to deal with managing storage for results it isn't interested in.
In Easel, an optional result pointer is passed as \ccode{NULL} to
indicate a possible result is not wanted (and is not allocated, if
returning that result required new allocation).

Easel uses a prefix convention on pointer argument names to indicate
these situations:

\begin{table}[h]
\begin{center}
{\small
\begin{tabular}{cp{2.5in}p{3in}}
 \textbf{prefix} &  \textbf{argument type}                  & \textbf{allocation (if any):}\\
none           & If qualified as \ccode{const}, a pointer
                 to input data, not modified by the call. 
                 If unqualified, a pointer to data modified
                 by the call (it's both input and output). & by caller\\ 
\ccode{ret\_}  & Pointer to result.                        & in the function \\
\ccode{opt\_}  & Pointer to optional result.               
                 If non-\ccode{NULL}, result is obtained. & in the function \\
\end{tabular}
}
\end{center}
\end{table}



%%%%
\subsubsection{Return status}
%%%%

%%%%
\subsubsection{conventions for exception handling}
%%%%

Easel functions {\bfseries should never exit except through an Easel
  return code or through the Easel exception handler}. When you write
Easel code you must {\bfseries always} deal with the case when the
caller has registered a nonfatal exception handler, causing thrown
exceptions to return a nonzero code rather than exiting. The Easel
library is designed to be used in programs that can't just suddenly
crash out with an error message (such as a graphical user interface
environment), and programs that have specialized error handlers
because they don't even have access to a \ccode{stderr} stream on a
terminal (such as a UNIX daemon).

This means that Easel functions must clean up their memory and set
appropriate return status and return arguments, even in the case of
thrown exceptions.


%%%%
\subsubsection{Easel's idiomatic function structure}
%%%%

To deal with the above strictures of return status, returned
arguments, and exception handling and cleanup, most Easel functions
follow an idiomatic structure.  The following snippet illustrates the
key ideas:

\begin{cchunk}
1    int
2    esl_example_Hello(char *opt_hello, char *opt_len)
3    {
4      char *msg = NULL;
5      int   n;
6      int   status;

7      if ( (status = esl_strdup("hello world!\n", -1, &msg)) != eslOK) goto ERROR;
8      n = strlen(msg);

9      if (opt_hello) *opt_hello = msg; else free(msg);
10     if (opt_len)   *opt_len   = n;
11     return eslOK;

12  ERROR:
13     if (msg)        free(msg);
14     if (opt_hello) *opt_hello = NULL;
15     if (opt_n)     *opt_n     = 0;
16     return status;
17  }
\end{cchunk}

The stuff to notice here:

\begin{itemize}
\item[line 2:] The \ccode{opt\_hello} and \ccode{opt\_len} arguments
  are optional. The caller might want only one of them (or neither,
  but that would be weird). We're expecting calls like
  \ccode{esl\_example\_Hello(\&hello, \&n)},
  \ccode{esl\_example\_Hello(\&hello, NULL)}, or
  \ccode{esl\_example\_Hello(NULL, \&n)}.

\item[line 4:] Anything we allocate, we initialize its pointer to \ccode{NULL}. 
  Now, if an exception occurs and we have to break out of the function early,
  we can tell whether the allocation has already happened (and hence we need
  to clean up its memory), if the pointer has become non-\ccode{NULL}.

\item[line 6:] Most functions have an explicit \ccode{status} variable.
  Standard error-handling macros (\ccode{ESL\_XEXCEPTION()} for example) expect it to be present,
  as do standard allocation macros (\ccode{ESL\_ALLOC()} for example).
  If we have to handle an exception, we're going to make sure the status
  is set how we want it, then jump to a cleanup block.

\item[line 7:] When any Easel function calls another Easel function,
  it must check the return status for both normal errors and thrown
  exceptions. If an exception has already been thrown by a callee,
  usually the caller just relays the exception status up the call
  stack. The idiom is to set the return \ccode{status} and go
  immediately to the error cleanup block, \ccode{ERROR:}. We use a
  \ccode{goto} for this, Dijkstra notwithstanding.

\item[lines 9,10:] When we set optional arguments for a normal return,
  we first check whether a valid return pointer was provided. If the
  optional pointer is \ccode{NULL} the caller doesn't want the result,
  and we clean up any memory we need to (line 9).

\item[line 13:] In the error cleanup block, we first free any memory
  that got allocated before the failure point. The idiom of
  immediately initializing all allocated pointers to \ccode{NULL} 
  enables us to tell which things have been allocated or not.

\item[line 14:] When we return from a function with an unsuccessful 
  status, we also make sure that any returned arguments are in 
  a documented ground state, usually \ccode{NULL}'s and \ccode{0}'s.
\end{itemize}

%%%%
\subsubsection{reentrancy: plan for threads}
%%%%

Easel code must expect to be called in multithreaded applications. All
functions must be reentrant. There should be no use of global or
static variables. 





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Standard Easel function interfaces}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Some function names are shared and have common behaviors across
modules, like \ccode{\_Get*()} and \ccode{\_Set*()} functions.  These
special names are called \esldef{common interfaces}.

\begin{table}
\begin{minipage}{\textwidth}
\begin{tabular}{l>{\raggedright}p{3.0in}ll}
\textbf{Function name}        & \textbf{Description}              & \textbf{Returns} &  \textbf{Example} \\ \hline
 \multicolumn{4}{c}{\bfseries Creating and destroying new objects}\\
\ccode{\_Create}
  & Create a new object.
  & \ccode{ESL\_}\itcode{FOO}\ccode{ *}
  & \ccode{esl\_alphabet\_Create()} \\

\ccode{\_Destroy}
  & Free an object.
  & \ccode{void}
  & \ccode{esl\_alphabet\_Destroy()} \\

\ccode{\_Clone}
  & Duplicate an object, by creating and allocating a new one.
  & \ccode{ESL\_}\itcode{FOO}\ccode{ *}
  & \ccode{esl\_msa\_Clone()} \\

\ccode{\_Shadow}
  & Partially duplicate an object, creating a dependent shadow.
  & \ccode{ESL\_}\itcode{FOO}\ccode{ *}
  & \ccode{p7\_oprofile\_Shadow()} \\

\ccode{\_Copy}
  & Make a copy of an object, using an existing allocated object for space.
  & [standard]
  & \ccode{esl\_msa\_Copy()} \\

 \multicolumn{4}{c}{\bfseries Opening and closing input sources}\\
\ccode{\_Open} 
  & Open an input source, associating it with an Easel object. 
  & [standard]
  & \ccode{esl\_buffer\_Open()} \\

\ccode{\_Close}
  & Close an Easel object corresponding to an input source.
  & [standard]
  & \ccode{esl\_buffer\_Close()} \\

 \multicolumn{4}{c}{\bfseries Managing memory allocation}\\

\ccode{\_Grow}
  & Expand the allocation in an existing object, typically by doubling.
  & [standard]
  & \ccode{esl\_tree\_Grow()} \\

\ccode{\_GrowTo}
  & Reallocate object (if needed) for some new data size.
  & [standard]
  & \ccode{esl\_sq\_GrowTo()} \\

\ccode{\_Reuse}
  & Recycle an object, reinitializing it while reusing as much of its existing
    allocation(s) as possible.
  & [standard]
  & \ccode{esl\_keyhash\_Reuse()} \\

\ccode{size\_t \_Sizeof}
  & Return the allocation size of an object
  & size, in bytes
  & - \\



 \multicolumn{4}{c}{\bfseries Accessing information in objects}\\

\ccode{\_Is}
  & Return \ccode{TRUE} or \ccode{FALSE} for some query of the
    internal state of an object.
  & \ccode{TRUE | FALSE}
  & \ccode{esl\_opt\_IsOn()} \\

\ccode{\_Get}
  & Return a value for some query of the internal state of an object.
  & value
  & \ccode{esl\_buffer\_Get()} \\

\ccode{\_Read}
  & Get a value in the object and return it in a location provided (and possibly allocated) by the caller.
  & [standard]
  & \ccode{esl\_buffer\_Read()} \\

\ccode{\_Fetch}
  & Get a value in the object and return it in newly allocated space;
    the caller becomes responsible for the newly allocated space.
  & [standard]
  & \ccode{esl\_buffer\_FetchLine()} \\  

\ccode{\_Set}
  & Set a value in the object.
  & [standard]
  & \ccode{esl\_buffer\_Set()} \\

\ccode{\_Format}
  & Set a string in the object using \ccode{sprintf()}-like
    semantics.
  & [standard]
  & \ccode{esl\_msa\_FormatName()} \\



 \multicolumn{4}{c}{\bfseries Debugging}\\
\ccode{\_Validate}
  & Run validation tests on the internal state of an object.
  & [standard]
  & \ccode{esl\_tree\_Validate()} \\

\ccode{\_Compare}
  & Compare two objects to each other for equality (or close enough).
  & [standard]
  & \ccode{esl\_msa\_Compare()} \\

\ccode{\_Dump}
  & Dump a verbose, possibly ugly, but developer-readable output 
    of the internal state of an object.
  & [standard]
  & \ccode{esl\_keyhash\_Dump()} \\

\ccode{\_TestSample}
  & Sample a mostly syntactically correct object for test purposes
  & [standard]
  & \ccode{p7\_tophits\_TestSample()} \\



 \multicolumn{4}{c}{\bfseries Miscellaneous}\\

\ccode{\_Write}
  & Write something from an object to an output stream.
  & [standard]
  & \ccode{esl\_msa\_Write()} \\

\ccode{\_Encode}
  & Convert a user-readable string (such as ``fasta'') to an
    internal Easel code (such as \ccode{eslSQFILE\_FASTA}).
  & [standard]
  & \ccode{esl\_msa\_EncodeFormat()} \\

\ccode{\_Decode}
  & Convert an internal Easel code (such as \ccode{eslSQFILE\_FASTA}) 
    to a user-readable string (such as ``fasta'').
  & [standard]
  & \ccode{esl\_msa\_DecodeFormat()} \\
\end{tabular}
\end{minipage}
\caption{\textbf{Standard function ``interfaces''.} }
\end{table}


%%%%%%%%%%%%%%%%
\subsection{Creating and destroying new objects}
%%%%%%%%%%%%%%%%

Most Easel objects are allocated and free'd by
\ccode{\_Create()/\_Destroy()} interface. Creating an object often
just means allocating space for it, so that some other routine can
fill data into it. It does not necessarily mean that the object
contains valid data.

\begin{sreapi}


\hypertarget{ifc:Create} 
{\item[\_Create(n)]}

A \ccode{\_Create()} interface takes any necessary initialization or
size information as arguments (there often aren't any), and it returns a
pointer to the newly allocated object. If an (optional) number of
elements \ccode{n} is provided, this specifies the number of elements
that the object is going to contain (for a fixed-size object) or the
initial allocation size (for a resizable object). In the event of an
allocation failure, a \ccode{\_Create} procedure throws \ccode{NULL}.

(If any error other than an allocation failure can happen, you should
use \ccode{\_Build()} instead. A caller is allowed to assume that a
\ccode{NULL} return from \ccode{\_Create()} is equivalent to
\ccode{eslEMEM}.)

The internals of some resizeable objects have an \ccode{nredline}
parameter that controls an additional memory management rule. These
objects are allowed to grow to arbitrary size (either by doubling with
\ccode{\_Grow} or by a specific allocation with \ccode{\_Reinit} or
\ccode{\_GrowTo}) -- but when the object is reused for new data, they
can be reallocated \emph{downward}, back to the redline
limit. Specifically, if the allocation size exceeds \ccode{nredline},
a \ccode{\_Reuse()} or \ccode{\_Reinit()} call will shrink the
allocation back to the \ccode{nredline} limit.  The idea is for a
frequently-reused object to be able to briefly handle a rare
exceptionally large problem, while not permanently committing the
resizeable object to an extreme allocation size.

At least one module (\ccode{esl\_tree}) allows for creating either a
fixed-size or a resizeable object; in this case, there is a
\ccode{\_CreateGrowable()} call for the resizeable version.




\hypertarget{ifc:Build} 
{\item[\_Build()]}

A \ccode{\_Build()} interface is the same as \ccode{\_Create()}, but
instead of returning a pointer to the new object, we return an Easel
error code, and the new object is returned through a \ccode{*ret\_obj}
argument.





\hypertarget{ifc:Destroy} 
{\item[\_Destroy(obj)]}
A \ccode{\_Destroy()} interface takes an object pointer as an
argument, and frees all the memory associated with it. A
\ccode{\_Destroy} procedure returns \ccode{void} (there is no useful
information to return about a failure; the only calls are to 
\ccode{free()} and if that fails, we're in trouble).
\end{sreapi}

For example:
\begin{cchunk}
   ESL_SQ *sq;
   sq = esl_sq_Create();
   esl_sq_Destroy(sq);
\end{cchunk}




%%%%%%%%%%%%%%%%
  \subsubsection{opening and closing input streams}
%%%%%%%%%%%%%%%%

Some objects (such as \ccode{ESL\_SQFILE} and \ccode{ESL\_MSAFILE})
correspond to open input streams -- usually an open file, but possibly
reading from a pipe. Such objects are \ccode{\_Open()}'ed and
\ccode{\_Close()'d}, not created and destroyed.

Input stream objects have to be capable of handling normal failures,
because of bad user input. Input stream objects contain an
\ccode{errbuf[eslERRBUFSIZE]} field to capture informative parse error
messages. 

\begin{sreapi}
\hypertarget{ifc:Open} 
{\item[\_Open(file, formatcode, \&ret\_obj)]}

Opens the \ccode{file}, which is in a format indicated by
\ccode{formatcode} for reading; return the open input object in
\ccode{ret\_obj}. A \ccode{formatcode} of 0 typically means unknown,
in which case the \ccode{\_Open()} procedure attempts to autodetect
the format. If the \ccode{file} is \ccode{"-"}, the object is
configured to read from the \ccode{stdin} stream instead of opening a
file. If the \ccode{file} ends in a \ccode{.gz} suffix, the object is
configured to read from a pipe from \ccode{gzip -dc}. Returns
\ccode{eslENOTFOUND} if \ccode{file} cannot be opened, and
\ccode{eslEFORMAT} if autodetection is attempted but the format cannot
be determined. 

Newer \ccode{\_Open} procedures return a standard Easel error code,
and on a normal error they also return the allocated object, using the
object's error message buffer to report the reason for the failed
open.

\hypertarget{ifc:Close} 
{\item[\_Close(obj)]}

Closes the input stream \ccode{obj}. Should return a standard Easel
error code. There are cases where an error in an input stream is only
detected at closing time (inputs using \ccode{popen()}/\ccode{pclose()}
  are an example).
\end{sreapi}

For example:
\begin{cchunk}
    char        *seqfile = "foo.fa";
    ESL_SQFILE  *sqfp;

    esl_sqio_Open(seqfile, eslSQFILE_FASTA, NULL, &sqfp);
    esl_sqio_Close(sqfp);
\end{cchunk}


%%%%
  \subsubsection{making copies of objects}
%%%%

\begin{sreapi}

\hypertarget{ifc:Clone}
{\item[\_Clone(obj)]}

Creates and returns a pointer to a duplicate of \ccode{obj}.
Equivalent to (and is a shortcut for, and is generally implemented as)
\ccode{dest = \_Create(); \_Copy(src, dest)}. Caller is responsible
for free'ing the duplicate object, just as if it had been
\ccode{\_Create}'d. Throws \ccode{NULL} if allocation fails.


\hypertarget{ifc:Copy}
{\item[\_Copy(src, dest)]}

Copies \ccode{src} object into \ccode{dest}, where the caller has
already created an appropriately allocated and empty \ccode{dest}
object (or buffer, or whatever). Returns \ccode{eslOK} on success;
throws \ccode{eslEINCOMPAT} if the objects are not compatible (for
example, two matrices that are not the same size).

Note that the order of the arguments is always \ccode{src}
$\rightarrow$ \ccode{dest} (unlike the C library's \ccode{strcpy()}
convention, which is the opposite order).


\hypertarget{ifc:Shadow}
{\item[\_Shadow(obj)]}

Creates and returns a pointer to a partial, dependent copy of
\ccode{obj}. Shadow creation arises in multithreading, when threads
can share some but not all internal object data. A shadow keeps
constant data as pointers to the original object.  The object needs to
know whether it is a shadow or not, so that <\_Destroy()> works
properly on both the original and its shadows.

\end{sreapi}

%%%%%%%%%%%%%%%%
  \subsection{Managing memory allocation}
%%%%%%%%%%%%%%%%

%%%%
  \subsubsection{resizable objects}
%%%%

Some objects need to be reallocated and expanded during their use.
These objects are called \esldef{resizable}.

In some cases, the whole purpose of the object is to have elements
added to it, such as \ccode{ESL\_STACK} (pushdown stacks) and
\ccode{ESL\_HISTOGRAM} (histograms). In these cases, the normal
\ccode{\_Create()} interface performs an initial allocation, and the
object keeps track of both its current contents size (often
\ccode{obj->N}) and the current allocation size (often
\ccode{obj->nalloc}). 

In at least one case, an object might be either growable or not,
depending on how it's being used. This happens, for instance, when we
have routines for parsing input data to create a new object, and we
need to dynamically reallocate as we go because the input doesn't tell
us the total size when we start. For instance, with \ccode{ESL\_TREE}
(phylogenetic trees), sometimes we know exactly the size of the tree
we need to create (because we're making a tree ourselves), and
sometimes we need to create a resizable object (because we're reading a
tree from a file). In these cases, the normal \ccode{\_Create()}
interface creates a static, nongrowable object of known size, and a
\ccode{\_CreateGrowable()} interface specifies an initial allocation
for a resizable object.

Easel usually handles its own reallocation of resizable objects. For
instance, many resizable objects have an interface called something
like \ccode{\_Add()} or \ccode{\_Push()} for storing the next element
in the object, and this interface will deal with increasing allocation
size as needed.  In a few cases, a public \ccode{\_Grow()} interface
is provided for reallocating an object to a larger size, in cases
where a caller might need to grow the object itself. \ccode{\_Grow()}
only increases an allocation when it is necessary, and it makes that
check immediately and efficiently, so that a caller can call
\ccode{\_Grow()} before every attempt to add a new element without
worrying about efficiency. An example of where a public
\ccode{\_Grow()} interface is generally provided is when an object
might be input from different file formats, and an application may
need to create its own parser. Although creating an input parser
requires familiarity with the Easel object's internal data structures,
at least the \ccode{\_Grow()} interface frees the caller from having
to understand its memory management.

Resizable objects necessarily waste some memory, because they are
overallocated in order to reduce the number of calls to
\ccode{malloc()}.  The wastage is bounded (to a maximum of two-fold,
for the default doubling strategies, once an object has exceeded its
initial allocation size) but nonetheless may not always be tolerable.

In summary: 

\begin{sreapi}
\hypertarget{ifc:Grow}
{\item[\_Grow(obj)]}

A \ccode{\_Grow()} function checks to see if \ccode{obj} can hold
another element. If not, it increases the allocation, according to
internally stored rules on reallocation strategy (usually, by
doubling). 
\end{sreapi}

\begin{sreapi}
\hypertarget{ifc:GrowTo}
{\item[\_GrowTo(obj, n)]}

A \ccode{\_GrowTo()} function checks to see \ccode{obj} is large
enough to hold \ccode{n} elements. If not, it reallocates to at least
that size.
\end{sreapi}

%%%%
  \subsubsection{reusable objects}
%%%%

Memory allocation is computationally expensive. An application needs
to minimize \ccode{malloc()/free()} calls in performance-critical
regions. In loops where one \ccode{\_Destroy()}'s an old object only
to \ccode{\_Create()} the next one, such as a sequential input loop
that processes objects from a file one at a time, one generally wants
to \ccode{\_Reuse()} the same object instead:

\begin{sreapi}
\hypertarget{ifc:Reuse}
{\item[\_Reuse(obj)]}

A \ccode{\_Reuse()} interface takes an existing object and
reinitializes it as a new object, while reusing as much memory as
possible. Any state information that was specific to the problem the
object was just used for is reinitialized. Any allocations and state
information specific to those allocations are preserved (to the extent
possible).  A \ccode{\_Reuse()} call should exactly replace (and be
equivalent to) a \ccode{\_Destroy()/\_Create()} pair. If the object is
growable, it typically would keep the last allocation size, and it
must keep at least the same allocation size that a default
\ccode{\_Create()} call would give.

If the object is arbitrarily resizeable and it has a \ccode{nredline}
control on its memory, the allocation is shrunk back to
\ccode{nredline} (which must be at least the default initial
allocation).

\end{sreapi}

For example:

\begin{cchunk}
   ESL_SQFILE *sqfp;
   ESL_SQ     *sq;

   esl_sqfile_Open(\"foo.fa\", eslSQFILE_FASTA, NULL, &sqfp);
   sq = esl_sq_Create();
   while (esl_sqio_Read(sqfp, sq) == eslOK)
    {
       /* do stuff with this sq */
       esl_sq_Reuse(sq);
    }
   esl_sq_Destroy(sq);
\end{cchunk}

%%%%
  \subsubsection{other}
%%%%
\begin{sreapi}
\hypertarget{ifc:Sizeof}
{\item[size\_t \_Sizeof(obj)]}

Returns the total size of an object and its allocations, in bytes.
\end{sreapi}


%%%%%%%%%%%%%%%%
 \subsection{Accessing information in objects}
%%%%%%%%%%%%%%%%

\begin{sreapi}

\hypertarget{ifc:Is}
{\item[\_Is*(obj)]}

Performs some specific test of the internal state of an
object, and returns \ccode{TRUE} or \ccode{FALSE}.

\hypertarget{ifc:Get}
{\item[value = \_Get*(obj, ...)]}

Retrieves some specified data from \ccode{obj} and returns it
directly. Because no error code can be returned, a \ccode{\_Get}
call must be a simple access call within the object, guaranteed to
succeed. \ccode{\_Get()} methods may often be implemented as macros.
(\ccode{\_Read} or \ccode{\_Fetch} interfaces are for more complex
access methods that might fail, and require an error code return.)

\hypertarget{ifc:Read}
{\item[\_Read*(obj, ..., \&ret\_value)]}

Retrieves some specified data from \ccode{obj} and puts it in
\ccode{ret\_value}, where caller has provided (and already allocated,
if needed) the space for \ccode{ret\_value}.

\hypertarget{ifc:Fetch}
{\item[\_Fetch*(obj, ..., \&ret\_value)]}

Retrieves some specified data from \ccode{obj} and puts it in
\ccode{ret\_value}, where space for the returned value is allocated by
the function. Caller becomes responsible for free'ing that space.

\hypertarget{ifc:Set}
{\item[\_Set*(obj, value)]}

Sets some value(s) in \ccode{obj} to \ccode{value}. If a value was
already set, it is replaced with the new one. If any memory needs to
be reallocated or free'd, this is done. \ccode{\_Set} functions have
some appropriate longer name, like \ccode{\_SetZero()} (set something
in an object to zero(s)), or \ccode{esl\_dmatrix\_SetIdentity()} (set
a dmatrix to an identity matrix).

\hypertarget{ifc:Format}
{\item[\_Format*(obj, fmtstring, ...)]}

Like \ccode{\_Set}, but with \ccode{sprintf()}-style semantics.  Sets
some string value in \ccode{obj} according to the
\ccode{sprintf()}-style \ccode{fmtstring} and any subsequence
\ccode{sprintf()}-style arguments. If a value was already set, it is
replaced with the new one. If any memory needs to be reallocated or
free'd, this is done.  \ccode{\_Format} functions have some
appropriate longer name, like
\ccode{esl\_msa\_FormatSeqDescription()}.

Because \ccode{fmtstring} is a \ccode{printf()}-style format string,
it must not contain '\%' characters. \ccode{\_Format*} functions
should only be used with format strings set by a program; they should
not be used to copy user input that might contain '\%' characters.
\end{sreapi}


%%%%%%%%%%%%%%%%
\subsection{Debugging, testing, development}
%%%%%%%%%%%%%%%%

\begin{sreapi}
\hypertarget{ifc:Validate}
{\item[\_Validate*(obj, errbuf...)]}

Checks that the internals of \ccode{obj} are all right. Returns
\ccode{eslOK} if they are, and returns \ccode{eslFAIL} if they
aren't. Additionally, if the caller provides a non-\ccode{NULL}
message buffer \ccode{errbuf}, on failure, an informative message
describing the reason for the failure is formatted and left in
\ccode{errbuf}. If the caller provides this message buffer, it must
allocate it for at least \ccode{eslERRBUFSIZE} characters.

Failures in \ccode{\_Validate()} routines are handled by
\ccode{ESL\_FAIL()} (or \ccode{ESL\_XFAIL()}, if the validation
routine needs to do any memory cleanup).  Validation failures are
classified as normal (returned) errors so that \ccode{\_Validate()}
routines can be used in production code -- for example, to validate
user input.

At the same time, because the \ccode{ESL\_FAIL()} and
\ccode{ESL\_XFAIL()} macros call the stub \ccode{esl\_fail()}, you can
set a debugging breakpoint on \ccode{esl\_fail} to get a
\ccode{\_Validate()} routine fail immediately at whatever test
failed. 

The \ccode{errbuf} message therefore can be coarse-grained
(``validation of object X failed'') or fine-grained (``in object X,
data element Y fails test Z''). A validation of user input (which we
expect to fail often) should be fine-grained, to return maximally
useful information about what the user did wrong. A validation of
internal data can be very coarse-grained, knowing that a developer can
simply set a breakpoint in \ccode{esl\_fail()} to get at exactly where
a validation failed.

A \ccode{\_Validate()} function is not intended to test all possible
invalid states of an object, even if that were feasible. Rather, the
goal is to automatically catch future problems we've already seen in
past debugging and testing. So a \ccode{\_Validate()} function is a
place to systematically organize a set of checks that essentially
amount to regression tests against past debugging/testing efforts.

\hypertarget{ifc:Compare}
{\item[\_Compare*(obj1, obj2...)]}

Compares \ccode{obj1} to \ccode{obj2}. Returns \ccode{eslOK} if the
contents are judged to be identical, and \ccode{eslFAIL} if they
differ. When the comparison involves floating point scalar
comparisons, a fractional tolerance argument \ccode{tol} is also
passed. 

Failures in \ccode{\_Compare()} functions are handled by
\ccode{ESL\_FAIL()} (or \ccode{ESL\_XFAIL()}, if the validation
routine needs to do any memory cleanup), because they may be used in a
context where a ``failure'' is expected; for example, when using
\ccode{esl\_dmatrix\_Compare()} as a test for successful convergence
of a matrix algebra routine. 

However, the main use of \ccode{\_Compare()} functions is in unit
tests. During debugging and development, we want to see exactly where
a comparison failed, and we don't want to have to write a bunch
laboriously informative error messages to get that information.
Instead we can exploit the fact that the \ccode{ESL\_FAIL()} and
\ccode{ESL\_XFAIL()} macros call the stub \ccode{esl\_fail()}; you can
set a debugging breakpoint in \ccode{esl\_fail()} to stop execution in
the failure macros.

\hypertarget{ifc:Dump}
{\item[\_Dump*(FILE *fp, obj...)]}

Prints the internals of an object in human-readable, easily parsable
tabular ASCII form. Useful during debugging and development to view
the entire object at a glance. Returns \ccode{eslOK} on success.
Unlike a more robust \ccode{\_Write()} call, \ccode{\_Dump()} call may
assume that all its writes will succeed, and does not need to check
return status of \ccode{fprintf()} or other system calls, because it
is not intended for production use.


\hypertarget{ifc:TestSample}
{\item[\_TestSample(ESL\_RANDOMNESS *rng, ..., OBJTYPE **ret\_obj)]}

Create an object filled with randomly sampled values for all data
elements. The aim is to exercise valid values and ranges, and
presence/absence of optional information and allocations, but not to
obsess about internal semantic consistency. For example, we use
\ccode{\_TestSample()} calls in testing MPI send/receive
communications routines, where we don't care so much about the meaning
of the object's contents, as we do about faithful transmission of any
object with valid contents. 

A \ccode{\_TestSample()} call produces an object that is sufficiently
valid for other debugging tools, including \ccode{\_Dump()},
\ccode{\_Compare()}, and \ccode{\_Validate()}. However, because
elements may be randomly sampled independently, in ways that don't
respect interdependencies, the object may contain data inconsistencies
that make the object invalid for other purposes.  Contrast
\ccode{\_Sample()} routines, which generate fully valid objects for
all purposes, but which may not exercise the object's fields as
thoroughly.

\end{sreapi}

%%%%%%%%%%%%%%%%
\subsection{Miscellaneous other interfaces}
%%%%%%%%%%%%%%%%

\begin{sreapi}
\hypertarget{ifc:Write}
{\item[\_Write(fp, obj)]}
Writes something from an object to an output stream \ccode{fp}. Used
for exporting and saving files in official data exchange formats.
\ccode{\_Write()} functions must be robust to system write errors,
such as filling or unexpectedly disconnecting a disk. They must check
return status of all system calls, and throw an \ccode{eslEWRITE}
error on any failures.




\hypertarget{ifc:Encode}
{\item[code = \_Encode*(char *s)]}

Given a string \ccode{<s>}, match it case-insensitively against a list
of possible string values and convert this visible representation to
its internal \ccode{\#define} or \ccode{enum} code. For example,
\ccode{esl\_sqio\_EncodeFormat("fasta")} returns
\ccode{eslSQFILE\_FASTA}. If the string is not recognized, returns a
code signifying ``unknown''. This needs to be a normal return (not a
thrown error) because the string might come from user input, and might
be invalid.


\hypertarget{ifc:Decode}
{\item[char *s = \_Decode*(int code)]}

Given an internal code (an \ccode{enum} or \ccode{\#define} constant),
return a pointer to an informative string value, for diagnostics and
other output. The string is static. If the code is not recognized,
throws an \ccode{eslEINVAL} exception and returns \ccode{NULL}.

\end{sreapi}






%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Writing unit tests}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

An Easel test driver runs a set of individual unit tests one after
another.  Sometimes there is one unit test assigned to each exposed
function in the API. Sometimes, it makes sense to test several exposed
functions in a single unit test function.

A unit test for \ccode{esl\_foo\_Baz()} is named \ccode{static void
utest\_Baz()}. 

Upon success, unit tests return void.

Upon any failure, a unit test calls \ccode{esl\_fatal()} with an error
message, and terminates. It should not use any other error-catching
mechanism. It aids debugging if the test program terminates
immediately, using a single function that we can easily breakpoint at
(\ccode{break esl\_fatal} in GDB). It must not use \ccode{abort()},
for example, because this will screw up the output of scripts running
automated tests in \ccode{make check} and \ccode{make dcheck}, such as
\emcode{sqc}. \emcode{sqc} traps \ccode{stderr} from
\ccode{esl\_fatal()} correctly. A unit test must not use
\ccode{exit(1)} either, because that leaves no error message, so
someone running a test program on the command line can't easily tell
that it failed.

Unit tests should attempt to deliberately generate exceptions and
failures, and test that the appropriate error code is returned.  Unit
tests must temporarily register a nonfatal error handler when testing
exceptions. 

Every function, procedure, and macro in the exposed API shall be
tested by one or more unit tests. The unit tests aim for complete code
coverage. This is measured by code coverage tests using \ccode{gcov}.



%%%%%%%%%%%%%%%%
\subsection{Dealing with expected stochastic failures in unit tests}
%%%%%%%%%%%%%%%%

Many unit tests are based on statistical samples and/or random number
generation.  For example, we test a maximum likelihood parameter
fitting routine by fitting to samples generated with known parameters,
and testing that the estimated parameters are close enough to the true
parameters.  The trouble is defining ``close enough''. There may be a
small but finite probability that such a test will fail. I call these
``stochastic failures''.  We don't want tests to fail due to expected
statistical deviations, but neither do we want to set p-values so
loose that a flaw escapes notice.

Current Easel strategy is to have such unit tests reinitialize the RNG
to a predetermined fixed seed known to work. Optionally, the test can
be made to use the RNG without reinitialization (therefore allowing
stochastic failures to occur), with a \ccode{-x} option to the test
driver. 
% example: esl_mixdchlet

In the test driver, these unit tests need to be run last; unit tests
that don't have a stochastic failure mode are run first. This is so
the \ccode{-s <seed>} option for setting the RNG seed takes effect
properly. (Otherwise, having a unit test reset the RNG seed would
override the \ccode{-s <seed>} setting.}

Otherwise the default for \ccode{<seed>} should be 0, so all other
tests are randomized from run to run.

In some older Easel code, fixed RNG seeds are used for tests that can
stochastically fail. The newer approach is preferable because it gives
more fine-grained control - only some utests need to deal with
stochastic failure, not all of them.

%%%%%%%%%%%%%%%%
\subsection{Using temporary files in unit tests}
%%%%%%%%%%%%%%%%

If a unit test or testdriver needs to create a named temporary file
(to test i/o), the tmpfile is created with
\ccode{esl\_tmpfile\_named()}:

\begin{cchunk}
   char  tmpfile[16] = "esltmpXXXXXX";
   FILE *fp;

   if (esl_tmpfile_named(tmpfile, &fp) != eslOK) esl_fatal("failed to create tmpfile");
   write_stuff_to(fp);
   fclose(fp);

   if ((fp = fopen(tmpfile)) == NULL) esl_fatal("failed to open tmpfile");
   read_stuff_from(fp);
   fclose(fp);

   remove(tmpfile);
\end{cchunk}

Thus tmp files created by Easel's test suite have a common naming
convention, and are put in the current working directory. On a test
failure, the tmp file remains, to assist debugging; on a test success,
the tmp file is removed. The \ccode{make clean} targets in Makefiles
are looking to remove files matching the target \ccode{esltmp??????}.

It is important to declare it as \ccode{char tmpfile[16]} rather than
\ccode{char *tmpfile}. Compilers are allowed to treat the string in a
\ccode{char *foo = "bar"} initialization as a read-only constant.





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Easel development environment; using development tools}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Easel is developed primarily on GNU/Linux and Mac OS/X systems with
the following tools installed:

\begin{tabular}{ll}
{\bfseries Tool}  & {\bfseries Use} \\
\emcode{emacs}    &  editor   \\
\emcode{gcc}      &  GNU compiler \\
\emcode{icc}      &  Intel compiler \\
\emcode{gdb}      &  debugger\\
\emcode{autoconf} &  platform-independent configuration manager, Makefile generator\\
\emcode{make}     &  build/compilation management\\
\emcode{valgrind} &  memory bounds and leak checking\\
\emcode{gcov}     &  code coverage analysis\\
\emcode{gprof}    &  profiling and optimization (GNU)\\
\emcode{shark}    &  profiling and optimization (Mac OS/X)\\
\LaTeX            &  documentation typesetting\\
Subversion        &  revision control\\
Bourne shell (\ccode{/bin/sh}) & scripting\\
Perl              &  scripting\\
\end{tabular}

Most of these are standard and well-known. The following sections
describe some Easel work patterns with some of the less commonly used
tools.

%%%%%%%%%%%%%%%%
\subsection{Using valgrind to find memory leaks and more}
%%%%%%%%%%%%%%%%

We use \emcode{valgrind} to check for memory leaks and other problems,
especially on the unit tests:

\begin{cchunk}
  % valgrind ./esl_buffer_utest
\end{cchunk}

The \ccode{valgrind\_report.pl} script in \ccode{testsuite} automates
valgrind testing for all Easel modules. To run it:

\begin{cchunk} 
   % cd testsuite
   % ./valgrind_report.pl > valgrind.report
\end{cchunk}




%%%%%%%%%%%%%%%%
\subsection{Using gcov to measure unit test code coverage}
%%%%%%%%%%%%%%%%

We use \emcode{gcov} to measure code coverage of our unit
testing. \emcode{gcov} works best with unoptimized code.  The code
must be compiled with \emcode{gcc} and it needs to be compiled with
\ccode{-fprofile-arcs -ftest-coverage}. The configure script knows
about this: give it the \ccode{--enable-gcov} option. An example:

\begin{cchunk}
  % make distclean
  % ./configure --enable-gcov
  % make esl_buffer_utest
  % ./esl_buffer_utest
  % gcov esl_buffer.c
  File 'esl_buffer.c'
  Lines executed:73.85% of 589
  esl_buffer.c:creating 'esl_buffer.c.gcov'
  % emacs esl_buffer.c.gcov
\end{cchunk}

The file \ccode{esl\_buffer.c.gcov} contains an annotated source listing
of the \ccode{.c} file, showing which lines were and weren't covered
by the test suite.

The \ccode{coverage\_report.pl} script in \ccode{testsuite} automates coverage
testing for all Easel modules. To run it:

\begin{cchunk} 
   % cd testsuite
   % coverage_report.pl > coverage.report
\end{cchunk}


%%%%%%%%%%%%%%%%
\subsection{Using gprof for performance profiling}
%%%%%%%%%%%%%%%%

On a Linux machine (gprof does not work on Mac OS/X, apparently):

\begin{cchunk}
   % make distclean
   % ./configure --enable-gprof
   % make
\end{cchunk}

Run any program you want to profile, then:

\begin{cchunk}
   % gprof -l <progname>
\end{cchunk}

%%%%%%%%%%%%%%%%
\subsection{Using the clang static analyzer, checker}
%%%%%%%%%%%%%%%%

The clang static analyzer for Mac OS/X is at
\url{http://clang-analyzer.llvm.org/}. I install it by moving its
entire distro directory (checker-276, for example) to
\ccode{/usr/local}, and symlinking to \ccode{checker}.
My \ccode{bashrc} has:

\begin{cchunk}
test -d /usr/local/checker         && PATH=${PATH}:/usr/local/checker
\end{cchunk}

and that puts \prog{scan-build} in my \ccode{PATH}.

To use it:

\begin{cchunk}
   % scan-build ./configure --enable-debugging
   % scan-build make
\end{cchunk}

It'll give you a scan-view command line, including the name of its
output html file, so you can then visualize and interact with the
results:

\begin{cchunk}
   % scan-view /var/folders/blah/baz/foo
\end{cchunk}






%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Documentation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%
\subsection{Structured function headers read by autodoc}
%%%%%%%%%%%%%%%%
The documentation for Easel's functions is embedded in the source code
itself, rather than being in separate files. A homegrown documentation
extraction tool (\prog{autodoc}) is used to process the source files
and extract and format the documentation.

An important part of the documentation is the documentation for
individual functions.  Each Easel function is preceded by
documentation in the form of a structured comment header that is
parsed by \prog{autodoc}. For example:

\input{cexcerpts/function_comment_example}

\prog{autodoc} can do one of three things with the text that follows
these tags: it can ignore it, use it verbatim, or process
it. \esldef{Ignored} text is documentation that resides only in the
source code, like the incept date and the notebook
crossreferences.\footnote{Eventually, we will probably process the
\ccode{Args:} part of the header, but for now it is ignored.}
\esldef{Verbatim} text is picked up by \prog{autodoc} and formatted as
\verb+\ccode{}+ in the \LaTeX\ documentation. \esldef{Processed} text
is interpeted as \LaTeX\ code, with a special addition that angle
brackets are used to enclose C code words, such as the argument names.
\prog{autodoc} recognizes the angle brackets and formats the enclosed
text as \verb+\ccode{}+.  Unprotected underscore characters are
allowed inside these angle brackets; \prog{autodoc} protects them
appropriately when it generates the \LaTeX. Citations, such as
\verb+\citep{MolerVanLoan03}+, are formatted for the \LaTeX\
\verb+natbib+ package.

The various fields are:

\begin{sreitems}{\textbf{Function:}}
\item[\textbf{Function:}] 
  The name of the function.  \prog{autodoc} uses this line to
  determine that it's supposed to generate a documentation entry here.
  \prog{autodoc} checks that it matches the name of the immediately
  following C function. One line; verbatim; required.

\item[\textbf{Synopsis:}] 
  A short one-line summary of the function. \ccode{autodoc -t} uses this
  line to generate the API summary tables that appear in this guide.
  One line; processed; not required for \prog{autodoc} itself, but
  required by \ccode{autodoc -t}. 

\item[\textbf{Incept:}] Records the author/date of first
  draft. \prog{autodoc} doesn't use this line.  Used to help track
  development history. The definition of ``incept'' is often fuzzy,
  because Easel is a palimpsest of rewritten code. This line often
  also includes a location, such as \ccode{[AA 673 over Greenland]},
  for no reason other than to remember how many weird places I've
  managed to get work done in..

\item[\textbf{Purpose:}] The main body. \prog{autodoc} processes this
  to produce the \TeX documentation. It explains the purpose of the
  function, then precisely defines what the caller must provide in
  each input argument, and what the caller will get back in each
  output argument. It should be written and referenced as if it will
  appear in the user guide (because it will). Multiline; processed by
  \prog{autodoc}; required.

\item[\textbf{Args:}] A tabular-ish summary of each argument. Not
  picked up by \prog{autodoc}, at least not at present. The
  \ccode{Purpose:} section instead documents each option in free text.
  Multiline and tabular-ish; ignored by \prog{autodoc}; optional.

\item[\textbf{Returns:}] The possible return values from the function,
  starting with what happens on successful completion (usually, return
  of an \ccode{eslOK} code). Also indicates codes for unsuccessful
  calls that are normal (returned) errors. If there are output
  argument pointers, documents what they will contain upon successful
  and unsuccessful return, and whether any of the output involved
  allocating memory that the caller must free.

\item[\textbf{Throws:}] The possible exceptions thrown by the
  function, listing what a program that's handling its own exceptions
  will have to deal with. (Programs should never assume that this list
  is complete.) Programs that are letting Easel handle exceptions do
  not have to worry about any of the thrown codes.  The state of
  output argument pointers is documented -- generally, all output is
  set to \ccode{NULL} or \ccode{0} values when exceptions happen.
  After a thrown exception, there is never any memory allocation in
  output pointers that the caller must free.

\item[\textbf{Xref:}] Crossreferences to notebooks (paper or
  electronic) and to literature, to help track the history of the
  function's development and rationale.\footnote{A typical reference
  to one of SRE's notebooks is \ccode{STL10/143}, indicating St. Louis
  notebook 10, page 143.} Personal developer notebooks are of course
  not immediately available to all developers (especially bound paper
  ones) but still, these crossreferences can be traced if necessary.
\end{sreitems}

\subsection{cexcerpt - extracting C source snippets}

The \prog{cexcerpt} program extracts snippets of C code verbatim from
Easel's C source files.

The \ccode{documentation/Makefile} runs \prog{cexcerpt} on every
module .c and .h file. The extracted cexcerpts are placed in .tex
files in the temporary \ccode{cexcerpts/} subdirectory.

Usage: \ccode{cexcerpt <file.c> <dir>}. Processes C source file
\ccode{file.c}; extracts all tagged excerpts, and puts them in a file
in directory \ccode{<dir>}.

An excerpt is marked with special comments in the C file:
\begin{cchunk}
/*::cexcerpt::my_example::begin::*/
   while (esl_sq_Read(sqfp, sq) == eslOK)
     { n++; }
/*::cexcerpt::my_example::end::*/
\end{cchunk}

The cexcerpt marker's format is \ccode{::cexcerpt::<tag>::begin::} (or
end). A comment containing a cexcerpt marker must be the first text on
the source line. A cexcerpt comment may be followed on the line by
whitespace or a second comment.

The \ccode{<tag>} is used to construct the file name, as
\ccode{<tag>.tex}.  In the example, the tag \ccode{my\_example} creates
a file \ccode{my\_example.tex} in \ccode{<dir>}.

All the text between the cexcerpt markers is put in the file.  In
addition, this text is wrapped in a \ccode{cchunk} environment.  This
file can then be included in a \LaTeX\ file.

For best results, the C source should be free of TAB characters.
"M-x untabify" on the region to clean them out.

Cexcerpts can't overlap or nest in any way in the C file. Only one tag
can be active at a time.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The .tex file}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Portability notes}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Easel is intended to be widely portable. We adhere to the ANSI C99
standard. Any dependency on higher-level functionality (including
POSIX, X/Open, or system-specific stuff) is optional, and Easel is
capable of working around its absence at compile-time.

Although we do not currently include Windows machines in our
development environment, we are planning for the day when we do. Easel
should not include any required UNIX-specific code that wouldn't port
to Windows.\footnote{Though it probably does, which we'll discover
  when we first try to compile for Windows.}


% xref J7/83.
\paragraph{Why not define \ccode{\_POSIX\_C\_SOURCE}?} You might think
it would be a good idea to define \ccode{\_POSIX\_C\_SOURCE} to
\ccode{200112L} or some such, to try to enforce the portability of our
POSIX-dependent code. This doesn't work; don't do it.  According to
the standards, if you define \ccode{\_POSIX\_C\_SOURCE}, the host must
\emph{disable} anything that's \emph{not} in the POSIX
standard. However, Easel \emph{is} allowed to optionally use
system-dependent non-POSIX code. A good example is
\ccode{esl\_threads.c::esl\_threads\_CPUCount()}. There is no
POSIX-compliant way to check for the number of available processors on
a system.\footnote{Apparently the POSIX threads standards committee
  intends it that way; see
  \url{http://ansi.c.sources.free.fr/threads/butenhof.txt}.} 
Easel's implementation tries to find one of several system-specific
alternatives, including the non-POSIX function \ccode{sysctl{}}.