File: batch.tex

package info (click to toggle)
diet 2.8.0-1
links: PTS, VCS
area: main
in suites: wheezy
size: 38,140 kB
sloc: ansic: 65,575; cpp: 58,570; xml: 365; sh: 83; makefile: 29
file content (347 lines) | stat: -rw-r--r-- 14,631 bytes
%**
%*  @file  batch.tex
%*  @brief   DIET User's Manual: batch/parallel submissions 
%*  @author  - Yves Caniou (yves.caniou@ens-lyon.fr)
%*  @section Licence 
%*    |LICENCE|

\chapter{Batch and parallel submissions}\label{chapter:parallelSubmission}
\section{Introduction}

Most of resources in a grid are parallel, either clusters of
workstations or parallel machines. Computational grids are even
considered as hierachical sets of parallel resources, as we can see in
ongoing project like the french research grid project,
Grid'5000~\cite{grid5000} (for the moment, 9 sites are involved), or
like the \textsc{Egee}\footnote{\url{http://public.eu-egee.org/}}
project ({\it Enabling Grids for E-science in Europe}), composed of
more than a hundred centers in 48 countries. Then, in order to provide
transparent access to resources, grid middleware must supply efficient
mechanisms to provide parallel services.

Because parallel resources are managed differently on each site, it is
neither the purpose of \diet to deal with the deployment of parallel
tasks inside the site, nor manage copies of data which can possibly be
on NFS. \diet implements mechanisms for a \sed programmer to easily
provide a service that can be portable on different sites; for clients
to request services which can be explicitly sequential, parallel or
solved in the real transparent and efficient metacomputing way: only
the name of the service is given and \diet chooses the best resource
where to solve the problem.

%****************************************************************************%
\section{Terminology}
%****************************************************************************%

%Because a good understanding comes with correct terms, we provide here
%the definition of the terms that we will use thereafter.

Servers provide {\it services}, \eg instanciation of {problems} that a
server can solve: for example, two services can provide the resolution
of the same problem, one being sequential and the other parallel. A
\diet~{\it task}, also called a {\it job}, is created by the {\it
request} of a client: it refers to the resolution of a service on a
given server.

A service can be sequential or parallel, in which case its resolution
requires numerous processors of a parallel resource (a parallel
machine or a cluster of workstations). If parallel, the task can be
modeled with the MPI standard, or composed of multiple sequential
tasks (deployed for example with \verb!ssh!) resolving a single
service: it is often the case with data parallelism problems.

Note that when dealing with batch reservation systems, we will likely
speak about {\it jobs} rather than about {\it tasks}.

%****************************************************************************%
\section{Configuration for compilation}
%****************************************************************************%

You must enable the batch flag in cmake arguments. Typically, if you
build \diet from the command line, you can use the following: \\

\verb!   ccmake $diet_src_path !\\
     \verb!         -DDIET_USE_ALT_BATCH:BOOL=ON !

%\begin{lstlisting}[language=bash,basewidth={.5em,.4em},fontadjust]
%   ccmake $$diet_src_path                 \\
%          -DDIET_USE_ALT_BATCH:BOOL=ON
%\end{lstlisting}

%% $$$

%****************************************************************************%
\section{Parallel systems}
%****************************************************************************%

Single parallel systems are surely the less deployed in actual computing
grids. They are usually composed of a frontal node where clients log in, and
from which they can log on numerous nodes and execute their parallel jobs, {\it
  without any kind of reservation (time and space)}. Some problems occur with
such a use of parallel resources: multiple parallel tasks can share a single
processor, hence delaying the execution of all applications using it; during
the deployment, the application must at least check the connectivity of the
resources; if performance is wanted, some monitoring has to be performed by the
application.

%****************************************************************************%
\section{Batch system}
%****************************************************************************%

Generally, a parallel resource is managed by a batch system, and jobs are
submitted to a site queue. The batch system is responsible for managing
parallel jobs: it schedules each job, and determines and allocates the
resources needed for the execution of the job. 

There are many batch system, among which
Torque\footnote{\url{http://old.clusterresources.com/products/torque/}} (a fork
of
PBS\footnote{\url{http://www.clusterresources.com/pages/products/torque-resource-manager.php}}),
Loadleveler\footnote{\url{http://www-03.ibm.com/servers/eserver/clusters/software/loadleveler.html}}
(developped by IBM), Oracle Grid
Engine\footnote{\url{http://www.oracle.com/technetwork/oem/grid-engine-166852.html}}
(formerly SunGrid Engine\footnote{\url{http://www.sun.com/software/gridware/}}:
SGE, developped by Sun), OAR\footnote{\url{http://oar.imag.fr}} (developped at
the IMAG lab). Each one implements its own language syntax (with its own
mnemonics), as well as its own scheduler. Jobs can generally access the
identity of the reserved nodes through a file during their execution, and are
assured to exclusively possess them.

%****************************************************************************%
\section{Client extended API}
%****************************************************************************%

Even if older client codes must be recompiled (because internal
structures have evolved), they do not necessarily need modifications.

\diet provides means to request exclusively sequential services,
parallel services, or let \diet choose the best implementation of a
problem for efficiency purposes (according to the scheduling metric
and the performance function).

\begin{lstlisting}[language=c,basewidth={.5em,.4em},fontadjust]

/* To explicitly call a sequential service */
diet_error_t
diet_parallel_call(diet_profile_t * profile) ;

diet_error_t
diet_sequential_call_async(diet_profile_t* profile, diet_reqID_t* reqID);

/* To explicitly call a parallel service in sync or async way */
diet_error_t
diet_sequential_call(diet_profile_t * profile) ;

diet_error_t
diet_parallel_call_async(diet_profile_t* profile, diet_reqID_t* reqID);

/* To mark a profile as parallel or sequential. The default call to 
   diet_call() or diet_call_async() will perform a call to the correct
   previous call */
int
diet_profile_set_parallel(diet_profile_t * profile) ;
int
diet_profile_set_sequential(diet_profile_t * profile) ;

/* To let the user choose a given amount of resources */ 
int
diet_profile_set_nbprocs(diet_profile_t * profile, int nbprocs) ;

\end{lstlisting}

%****************************************************************************%
\section{Batch server extended API and configuration file}
%****************************************************************************%

There are too many diverse scenarii about the communication and execution of
parallel applications: the code can be a MPI code or composed of different
interacting programs possibly launched via \verb!ssh! on every nodes; input and
output files can use NFS if this file system is present, or they can be
splitted and uploaded to each node participating to the calculus.

Then, we will see: what supplementary information has to be provided
in the server configuration file; how to write a batch submission
meta-script in a \sed; and how to record the parallel/batch service.

\section{Server API}

\begin{lstlisting}[language=C,basewidth={.5em,.4em},fontadjust]

/* Set the status of the SeD among SERIAL and BATCH */
void 
diet_set_server_status( diet_server_status_t st ) ;

/* Set the nature of the service to be registered to the SeD */
int
diet_profile_desc_set_sequential(diet_profile_desc_t * profile) ;

int
diet_profile_desc_set_parallel(diet_profile_desc_t * profile) ;

/* A service MUST call this command to perform the submission to the batch system */
int
diet_submit_parallel(diet_profile_t * profile, const char * command) ;

\end{lstlisting}

\subsection{Registering the service}

A server is mostly built like described in section~\ref{ch:server}. In order to
let the \sed know that the service defined within the profile is a parallel
one, the \sed programmer {\bf must} use the function:

\begin{lstlisting}[language=c,basewidth={.5em,.4em},fontadjust]
void diet_profile_desc_set_parallel(diet_profile_desc_t* profile)
\end{lstlisting}

By default, a service is registered as sequential. Nevertheless, for
code readability reasons, we also give the pendant function to
explicitly register a sequential service:

\begin{lstlisting}[language=c,basewidth={.5em,.4em},fontadjust]
void diet_profile_desc_set_sequential(diet_profile_desc_t* profile)
\end{lstlisting}

\subsection{Server configuration file}

The programmer of a batch service available in a \sed do not have to worry
about which batch system to submit to, except for its name, because \diet
provides all the mechanisms to transparently submit the job to them.

\diet is currently able to submit batch scripts to
OAR (version 1.6 and 2.0), PBS/Torque, LoadLeveler, and SGE.
%
%among which: "condor", "dqs", "loadleveler", "lsf", "pbs", "sge" or
%"oar".
%
The name of the batch scheduler managing the parallel resource where
the \sed is running has to be incorporated with the keyword
\verb!batchName! in the server configuration file. This is how the \sed knows
the correct way of submitting a job.

Furthermore, if there is no default queue, the \diet deployer must
also provide the queue on which jobs have to be submitted, with the
keyword \verb!batchQueue!.

You also have to provide a directory where the \sed can read and write data on
the parallel resource. Please note that this directory is used by \diet to
store the new built script that is submitted to the batch scheduler. In
consequence, because certain batch schedulers (like OAR) need the script to be
available on all resources, {\it this directory might be on NFS} (remember that
\diet cannot replicate the script on all resources before submission because of
access rights).  Note that concerning OAR (v1.6), in order to use the
CoRI\_batch features for 0AR 1.6 (see Section~\ref{section:cori_batch}), the
Batch \sed deployer must also provide the keyword \verb$internQueue$ with the
corresponding name. For example, the server configuration file can contain the
following lines:

\begin{lstlisting}[language=bash,basewidth={.5em,.4em},fontadjust]
batchName = oar
batchQueue = queue_9_13
pathToNFS = /home/ycaniou/tmp/nfs
pathToTmp = /tmp/YC/
internOARbatchQueueName = 913
\end{lstlisting}

%% \subsubsection{\bf Parallel server}

%% With the aim to not extend the \diet API too much, we consider that a
%% parallel server corresponds to an ordinary shell that submits the
%% generated script. In consequence, the server configuration file should
%% contain the following line

%% \begin{lstlisting}[language=bash,label=dietConfig.sh,basewidth={.5em,.4em},fontadjust]
%% batchName = shellscript
%% \end{lstlisting}

\subsection{Server API for writing services}

The writing of a service corresponding to a parallel or batch job is
very simple. The \sed programmer builds a shell script that he would
have normally used to execute the job, \ie a script that must take
care of data replication and executable invocation depending on the
site.

In order for the service to be system independent, the \sed API
provides some meta-variables which can be used in the script. 

\begin{itemize}
\item \verb!$DIET_NAME_FRONTALE!: frontale name
\item \verb!$DIET_USER_NBPROCS!: number of processors
\item \verb!$DIET_BATCH_NODESLIST!: list of reserved nodes
\item \verb!$DIET_BATCH_NBNODES!: number of reserved nodes
\item \verb!$DIET_BATCH_NODESFILE!: name of the file containing the
identity of the reserved nodes
\item \verb!$DIET_BATCH_JOBID!: batch job ID
\item \verb!$DIET_BATCHNAME!: name of the batch system
\end{itemize}

%%% $$$

Once the script written in a string, it is given as an argument to the
following function:
\begin{lstlisting}[language=C,basewidth={.5em,.4em},fontadjust]
int 
diet_submit_parallel(diet_profile_t * pb, char * script)
\end{lstlisting}
\subsection{Example of the client/server 'concatenation' problem}

There are fully commented client/server examples in
\verb!<diet_src>/src/examples/Batch! directory. The root directory
contains a simple example, and \verb!TestAllBatch! and
\verb!Cori_cycle_stealing! are more practical, the latter being a code to
explain the \verb!CoRI_batch! API.

The root directory contains a simple basic example on how to use the batch API
is given here: no IN or INOUT args, the client receives as a result the number
of processors on which the service has been executed. The service only writes
to a file, with batch-independent mnemonics, some information on the batch
system.

The \verb!<diet_src>/src/examples/Batch/file_transfer! directory 
contains 3 servers, one sequential, one parallel and one batch, and one 
synchronous and one asynchronous
client. The client is configurable to simply ask for only sequential, or
explicitly parallel services, or to let \diet choose the best (by default, two
processors are used and the scheduling algorithm is Round-Robin). We
consequently give the MPI code which is called from the batch \sed, which
realizes the concatenation of two files sent by the client. Note that the user
{\it must change} some paths in the \sed codes, according to the site where he
deploys \diet. 

% We reproduce the codes
% here.

% \newpage

% %\parbox[b]{.5\textwidth}
% {
%   \tiny
%   \lstinputlisting[title={\bf Synchronous client code},language=c,label=client.c,basewidth={.5em,.4em},fontadjust]{Data/examples_Batch_client.c}
% }

% %\newpage

% {%\twocolumn
%   \tiny
%   \lstinputlisting[title={\bf Batch server code},language=c,basewidth={.5em,.4em},fontadjust]{Data/examples_Batch_batch_server.c}
% }

% {%\twocolumn
%   \tiny
% %  \lstinputlisting[title={\bf Parallel server code},language=c,basewidth={.5em,.4em},fontadjust]{@CMAKE_SOURCE_DIR@/src/examples/Batch/parallel_server.c}
% }

% {%\twocolumn
%   \tiny
%   \lstinputlisting[title={\bf Sequential server code},language=c,basewidth={.5em,.4em},fontadjust]{Data/examples_Batch_sequential_server.c}
% }

% \onecolumn

%%% Local Variables:
%%% mode: latex
%%% ispell-local-dictionary: "american"
%%% mode: flyspell
%%% fill-column: 79
%%% End: