File: coding_standards.tex

package info (click to toggle)
code-saturne 6.0.2-2
links: PTS, VCS
area: main
in suites: bookworm, bullseye, sid
size: 63,340 kB
sloc: ansic: 354,724; f90: 119,812; python: 87,716; makefile: 4,653; cpp: 4,272; xml: 2,839; sh: 1,228; lex: 170; yacc: 100
file content (442 lines) | stat: -rw-r--r-- 19,338 bytes
%-------------------------------------------------------------------------------

% This file is part of Code_Saturne, a general-purpose CFD tool.
%
% Copyright (C) 1998-2019 EDF S.A.
%
% This program is free software; you can redistribute it and/or modify it under
% the terms of the GNU General Public License as published by the Free Software
% Foundation; either version 2 of the License, or (at your option) any later
% version.
%
% This program is distributed in the hope that it will be useful, but WITHOUT
% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
% FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
% details.
%
% You should have received a copy of the GNU General Public License along with
% this program; if not, write to the Free Software Foundation, Inc., 51 Franklin
% Street, Fifth Floor, Boston, MA 02110-1301, USA.

%-------------------------------------------------------------------------------

\section{Coding style guidelines}

\subsection{Master rule}

Keep the style consistent !

This rule should be observed above all others. The coding style in \CS
has evolved over the years, but unless you are ready to update a whole
file to a more current style (in which case the other guidelines should be
followed), try to remain consistent with the style in the current file.

For new files, use recently updated examples, such as
\texttt{src/base/cs\_field.c} and  \texttt{src/basecs\_field.h} for C,
\texttt{src/base/field.f90} for Fortran modules, or
\texttt{src/base/codits.f90} for other Fortran files.

\subsection{General rules}

The following general rules are strongly recommended:

\begin{itemize}
\item Except for files in which they have a special meaning (such as
      Makefiles), use spaces, not tabs. \emph{Absolutely} avoid this in
      Python code \footnote{Keeping to Python's humoristic example style,
      anybody doing this should learn ``how not to be seen''}. Most importantly,
      use a decent text editor that does not randomly mix spaces and tabs.
      \CS has a \texttt{sbin/rmb} script which removes trailing
      white-space and replaces tabs with spaces, but this may appear to damage
      indentation when it was defined with an odd mix of spaces and tabs.
\item 80 characters maximum line length; split lines longer than this
      to ensure readability on small screens, or when viewing code side-by-side
      on wider screens. This rule is less important for \LaTeX documentation
      sources (one could argue that using one line per paragraph and relying
      on line wrapping would actually make revision merging simpler).
\end{itemize}

For new developements, prefer C to Fortran, as the code should progressively
mve to purely C code. As many variables and arrays are still accessible
only through Fortran modules, this is not always possible, but defining
Fortran/C bindings such as in the \texttt{field.f90} module helps
make data accessible to both languages, easing the progressive migration
from Fortran to C. Fortran bindings should only be defined when access
to C functions or variables from Fortran is required, and may be removed
for parts of the code purely handled in C.

\subsection{C coding style}

\subsubsection{Punctuation}

Except when adding additional white space to align similar definitions
or arguments on several lines, standard English punctuation rules should be
followed:

\begin{itemize}
\item no white space before a punctuation mark (, ; . ), one white space
      after a punctuation mark.

\item white space before an opening parenthesis, no white space after an opening
      parenthesis.

\item no white space before a closing parenthesis, white-space after a closing
      parenthesis.
\end{itemize}

\subsubsection{General rules}

The following presentation rules are strongly recommended:
\begin{itemize}
\item indentation step: 2 characters (4 characters in \texttt{cs\_gui\_*}
      files).
\item always use lowercase characters for instructions and identifiers,
      except for enumerations and macros which should be in uppercase.
\end{itemize}

The following coding rules are strongly recommended:

\begin{itemize}

\item header (\texttt{.h}) files should have a mechanism to prevent
  multiple inclusions;

\item all macro parameters must be enclosed inside parentheses;

\item a function's return type must always be defined.

\item variables should be initialized before use
  (pointers are set to NULL). A good compiler should issue warnings when
  this is not the case, and those warnings must be acted upon;

\item when a structure definition is only needed in a single file,
  it is preferred to define it directly in the C source file,
  so as to make as little visible as possible in the matching header file.
  structures only used through pointers may be made opaque in this
  manner, which ensures that their possible future modification should
  not have unexpected side-effects.

\item When a public function is defined in a C source file, a matching
  header file containing its prototype must be included.

\item usage of global variables must be kept to a minimum, though such
  variables may be useful to maintain state or references to mesh or
  variable structures in C code callable by Fortran code.
  If a global variable is only needed inside a single file, it should
  be declared ``static''. It it is needed in other files, then it must
  instead  be declared ``extern'' in the matching header file.

\item a \texttt{const} type must not be cast into a non-\texttt{const}
  type;

\item every \texttt{switch} construct should have a \texttt{default}
  clause (which may reduce to \texttt{assert(0) to check code paths in
  debug mode, but at least this much must be ensured);}

\item a \texttt{const} attribute should be used when an array or structure
  is not  modified. Recall that for example \texttt{const cs\_mesh\_t *m}
  means that the contents of mesh structure \texttt{m} are not modified
  by the function, while \texttt{cs\_mesh\_t *const m} only means that
  the pointer to \texttt{m} is not modified;
  \texttt{const cs\_mesh\_t *const m} means both, but its usage in
  a function prototype gives no additional useful information on
  the function's side effects than the first form
  (\texttt{const cs\_mesh\_t *m}), so that form is preferred, as it
  does not clutter the code;

\item when an array is passed to a function, describing it as
  \texttt{array[]} is preferred to \texttt{*array}, as the array
  nature of the argument is better conveyed.

\item where both a macro or an enumerated constant could be used,
  an enumeration is preferred, as values will appear with the
  enumerated value's name under a debugger, while only a macro's
  expanded value will appear. An additional advantage of enumerated
  values is that a compiler may issue a warning when a \texttt{switch}
  construct has no \texttt{case} for a given enumeration value.
\end{itemize}

\subsubsection{Language\label{sec:regle.lang}}

ANSI C 1999 or above is required, so C99-specific constructs are allowed,
though C++ style comments should be avoided, so as to maintain a consistent
style. C99 variable-length arrays should be avoided, as it is not
always clear whether they are allocated on the stack or heap, and are
an optional feature only in the C newer 2011 standard (though we could
expect that support for those constructs will remain available on
general-purpose architectures, and removed only in the embedded space).

\subsubsection{Assertions}

Assertions are conditions which must always be verified. Several
expanded macro libraries may be available, but a standard C language
assertion has the following properties:

\begin{itemize}

\item it is only compiled in debug mode (and so incur no run-time
  performance penalty in production code, where the \texttt{NDEBUG}
  macro is defined);

\item when its predicate are not verified, it causes a core dump;
  when running under a debugger, the code is stopped inside the
  assertion, but does not exit, which simplifies debugging.

\end{itemize}

Assertions are thus very useful to ensure that conditions
which are always expected (and not dependent on program input)
are met. They also make code more readable, in the sense that
it is made clear that conditions checked by an assertion
are always expected, and that not handling other cases is not an
programming error or omission.

If a condition may not be met for some program inputs,
and not just in case of programmer error, a more complete
test and call to an error handler (such as \texttt{bft\_error})
is preferred.

\subsection{Naming conventions}

The following rules should be followed:

\begin{itemize}

\item identifier lengths should not exceed 31~characters if avoidable;
  this was a portability requirement using C89, and is now more a
  readability recommendation;

\item identifier names are in lowercase, except for macro or enumeration
  definitions, which are in uppercase; words in an identifier are
  separated by an underscore character (for example,
  \verb=n_elt_groups_=).

\item global identifier names are prefixed by the matching library prefix,
  such as \verb=cs_= or \verb=BFT_=;

\item local identifiers should be prefixed by an underscore character.

\item Index arrays used with $0$ to $n-1$ numbering should be named
  using a \verb=idx_= or \verb=index_= prefix or suffix, while
  similar arrays using a $0$ to $n-1$ numbering (usually those that may be
  also used in Fortran code) should be named using a \verb=pos_=
  prefix or suffix.

\end{itemize}

\subsubsection{Naming of enumerations}

The following form is preferred for enumerations:

\begin{quote}
\begin{alltt}
typedef myclass \{ CS_MYCLASS_ENUM1,
                   CS_MYCLASS_ENUM2,
                \( /* etc. */ \)
                \} cs_myclass_t;
\end{alltt}
\end{quote}

\subsubsection{Naming of structures and associated functions}

Macros and enumerations related to myclass structures
are prefixed by \verb=CS_MYCLASS_=.

Public functions implementing methods are named
 \texttt{cs\_\textit{class\_method}}, while private functions are simply named:
\texttt{\_\textit{class\_method}} and are declared static.

Files containing these functions are named \texttt{\_\textit{class}.c}.

\subsubsection{Integer types}

Several integer types are found in \CS:

\begin{itemize}
\item \texttt{cs\_lnum\_t} should be used for local entity (i.e. vertex, face,
      cell) numbers or connectivity. It is a signed integer, normally identical
      to \texttt{int}, but a larger size could be used in the future for very
      large meshes on shared memory machines.

\item \texttt{cs\_gnum\_t} should be used for global entity numbers, usually
      necessary only for I/O aspects. It is an unsigned 32 or 64-bit integer,
      depending on whether the code was configured with the
      \texttt{--enable-long-gnum} option. Global numbers should always use
      this type, as for very large meshes, they may exceed the maximum size
      of a 32-bit integer (2~147~483~648). The choice of unsigned integers
      is two-fold: it doubles the range of available values, and good compilers
      will issue warnings when this type is mixed without precaution with
      the usual integer types. These warnings should be heeded, as they may
      avoid many hours of debugging.

\item \texttt{cs\_int\_t} is deprecated; it was used for integer variables or arrays
      passed between C and Fortran, though using \texttt{integer(kind)} statements
      in Fortran should be a better future solution. In practice,
      \texttt{cs\_int\_t} and \texttt{cs\_lnum\_t} are identical. The former
      is more commonly found in older code, but the latter should be used where
      applicable for better clarity.

\item in all other cases, the standard C types \texttt{int} and \texttt{size\_t}
      should be preferred (for example for loops over variables, probes, or
      any entity independent of mesh size.
\end{itemize}

\subsection{Base functions and types}

In the \CS kernel, it is preferable to use base functions provided by the
BFT subsystem to the usual C functions, as those logging, exit and error-handling
functions will work correctly when running in parallel, and the memory
management macros ensure return value checking and allow additional logging.

The array below summarizes the replacements for usual functions:

\begin{center}
\begin{tabular}{|l|l|l|l|}
\hline
\multicolumn{1}{|c}{C function}  & \multicolumn{1}{|c}{\CS macro or function} & \multicolumn{1}{|c|}{Header}\\
\hline
\verb=exit()=    & \verb=cs_exit()=         & \verb=cs_base.h=\\
                 & \verb=bft_error()=       & \verb=bft_error.h=\\
\verb=printf()=  & \verb=bft_printf()=      & \verb=bft_printf.h=\\
\verb=malloc(=   & \verb=BFT_MALLOC()=      & \verb=bft_mem.h=\\
\verb=realloc()= & \verb=BFT_REALLOC()=     & \verb=bft_mem.h=\\
\verb=free()=    & \verb=BFT_FREE()=        & \verb=bft_mem.h=\\
\hline
\end{tabular}
\end{center}

\subsection{Internationalization}

Internationalization of messages uses the \texttt{gettext()} mechanism.
Messages should always be defined in US English in the source
code (which avoids using extended characters and the accompanying
text encoding issues in source code), and a French translation
is defined and maintained using a translation file \texttt{po/fr.po}.
Translations to other languages are of course possible, and only
require a volunteer.

Using the \texttt{gettext()} mechanism has several advantages:

\begin{itemize}

\item accented or otherwise extended characters appear normally whether
  using a Latin-1 (or Latin-9 or Latin-15) environment or whether
  using a ``Unicode'' (or generally UTF-8) environment (assuming
  that a terminal's encoding matches that of the {\tt LANG} environment
  variable, usually {\tt LANG=fr\_FR} or {\tt LANG=fr\_FR.UTF-8}
  for French;

\item if a message is not translated, it simply appears in its
  untranslated version;

\item maintenance of the translations only requires editing a single
  file, gettext related tools also make it easy to check that
  translations are consistent (i.e. matching format descriptors
  or line returns) without requiring complete code coverage tests.
  In fact, translations could be maintained by a non-programmer.

\item internationalization may be disabled using the
 {\tt --disable-nls} configure option, so possible comfort vs. speed
 trade-offs may be decided by the user;

\end{itemize}

To make internationalization possible, translatable strings should be
encased in a {\tt\_( )} macro (actually an abbreviation for a call
to {\tt gettext()} if available, which reverts to an empty (identity)
macro is internationalization is unavailable or disabled).
Strings assigned to variables must be encased in a {\tt N\_( )}
macro (which is an ``empty'' macro, used by the {\tt gettext} toolchain
to determine that those strings should appear in the translation dictionary),
and the variable to which such a string is assigned should be
encased in the {\tt\_( )} macro where used.

Note that with UTF-8 type character strings, accented or otherwise
extended characters are represented on multiple bytes. The {\tt strlen()}
C function will return the string's real size, which may be greater
than the number of output columns it uses. In the preprocessor, the
{\tt ecs\_print\_padded\_str()} may be used to print such a string
and padding it with the correct number of white spaces so as to meet
a given format width. No such function is used or currently needed in the
main code, though it could be added if needed.

\subsection{Fortran coding style}

\subsubsection{Conventions inherited from Fortran 77}

The following coding conventions were applied when the code used
Fortran 77, prior to conversion to Fortran 95. Some of them should be
updated, as long as we maintain consistency within a given
file.

\begin{itemize}

\item one routine per file (except if all routines except the first
      are ``private''). This rule has a few exceptions, such as in modules,
      in the \texttt{cs\_user\_parameters.f90} user file which contains several
      subroutines (it initially followed the rule, but subroutines were split,
      while the file was not), and Fortran wrappers for several C
      functions defined in a single C file are also usually defined
      in a single source, as they are a consistent whole.

\item exactly 6 characters per routine name, with no underscores.
      As Fortran 95 allows longer identifiers, the 6 character limit is
      obsolete, buy avoiding \texttt{\_} characters is still recommended
      in routines that might be called from C, unless \texttt{bind(C)}
      is used, as most compilers add an underscore character to the routine
      name, and often add a second underscore if no special options are given,
      while the C \texttt{CS\_PROCF} macro does not handle this situation,
      possibly leading to link issues.
      For routine names longer than 6 character, more verbose names
      are recommended (using underscores as word separators), though
      the Fortran 95 standard requires those names be limited to 31
      characters.
      Also, for routine names with underscores, the \texttt{CS\_PROCF}
      macro must not be used. ISO C bindings must be used instead.

\item at least 2 characters per variable names
      (banish variables \texttt{i}, \texttt{j}, or \texttt{k},
       preferring \texttt{ii}, \texttt{jj}, or \texttt{kk}, as this makes
       searching under a text editor easier for non-experts).

\item avoid commented example lines in user subroutines; otherwise,
      the code is never compiled and thus probably incorrect.
      Using a \texttt{if (iuse) ... endif} construct with
      \texttt{iuse = 0} instead is recommended.

\item use \texttt{do}~/~\texttt{enddo} constructs instead of
      \texttt{do}~/~\texttt{continue}.

\item avoid \texttt{goto} constructs where \texttt{select}~/~\texttt{case}
      would be more appropriate.

\item avoid print statements using \texttt{write(format, *)},
      or \texttt{print} constructs, to ensure that output is redirected
      correctly in parallel mode.

\item use \texttt{d} and not \texttt{e} to define double-precision
      floating-point constant definitions. Especially avoid
      constants with exponents such as \texttt{e50}, which are impossible
      in single precision (the limit is \texttt{e38}), and may thus not
      be accepted by ``strict'' compilers, or worst, lead to run-time
      exceptions.

\end{itemize}

\subsubsection{Language\label{sec:regle.lang}}

Fortran 1995 or above is required, and constructs or intrinsic functions
requiring Fortran 2003 or above should be avoided, except for the
Fortran 2003 ISO\_C\_BINDING module, which is available in all major
current Fortran compilers.

\subsection{Python coding style}

The GUI is not yet compatible with Python 3, due to differences in the
usual PyQt4 API and strings/unicode handling, but the rest of the Python code
should work both with Python 2 and Python 3.
This compatibility should be maintained or improved upon, not broken...

Details on the differences between major Python versions and associated
recommendations may be found at:
\url{http://docs.python.org/py3k/howto/pyporting.html}.