File: plugin.tex

package info (click to toggle)
diet 2.8.0-1
links: PTS, VCS
area: main
in suites: wheezy
size: 38,140 kB
sloc: ansic: 65,575; cpp: 58,570; xml: 365; sh: 83; makefile: 29
file content (438 lines) | stat: -rw-r--r-- 20,160 bytes
%**
%*  @file  plugin.tex
%*  @brief   DIET User's Manual plugin scheduler chapter file 
%*  @author  - Alan SU (Alan.SU@ens-lyon.fr) 
%*  @section Licence 
%*    |LICENCE|

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \documentclass{article}

\newenvironment{code}
{\begin{list}{}{\setlength{\leftmargin}{1em}}\item\bfseries\tt}
{\end{list}}

\newenvironment{tinycode}
{\begin{list}{}{\setlength{\leftmargin}{1em}}\item\tiny\bfseries\tt}
{\end{list}}

%% \begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Scheduling in \diet}
\label{ch:plugin}

\section{Introduction}

We introduce a \emph{plugin scheduling} facility, designed to allow \diet
service developers to define application-specific performance measures and to
implement corresponding scheduling strategies. This section describes the
default scheduling policy in \diet and the interface to the plugin scheduling
facility.

\section{Default Scheduling Strategy}\label{sect:default_sched}

The \diet scheduling subsystem is based on the notion that, for the sake of
system efficacy and scalability, the work of determining the appropriate
schedule for a parallel workload should be distributed across the computational
platform.  When a task in such a parallel workload is submitted to the system
for processing, each Server Daemon (\sed) provides a \emph{performance
  estimate}~-- a collection of data pertaining to the capabilities of a
particular server in the context of a particular client request~-- for that
task.  These estimates are passed to the server's parent agent; agents then
sort these responses in a manner that optimizes certain performance criteria.
Effectively, candidate {\sed}s are identified through a distributed scheduling
algorithm based on pairwise comparisons between these performance estimations;
upon receiving server responses from its children, each agent performs a local
scheduling operation called \emph{server response aggregation}.  The end result
of the agent's aggregation phase is a list of server responses (from servers in
the subtree rooted at said agent), sorted according to the aggregation method
in effect. By default, the aggregation phase implements the following ordered
sequence of tests:

\begin{enumerate}
\item \textbf{Least recently used}: In the absence of application- and
  platform-specific performance data, the \diet scheduler attempts to
  probabilistically achieve load balance by assigning client requests based on
  the time they last finished to compute.  Essentially each server records a
  timestamp indicating the last time at which it was assigned a job for
  execution.  Each time a request is received, the \sed computes the time elapsed
  since its last execution, and among the responses it receives, \diet agents
  select {\sed}s with a longer elapsed time.
\item \textbf{Random}: If the {\sed} is unable to store timestamps, the \diet
  scheduler will chose randomly when comparing two otherwise equivalent {\sed}
  performance estimations.
\end{enumerate}

In principle, this scheduling policy prioritizes servers that are able to
provide useful performance prediction information. 
In general, this approach works well when all servers in a
given \diet hierarchy are capable of making such estimations. However, in
platforms composed of {\sed}s with varying capabilities, load imbalances may
occur: since \diet systematically prioritizes server responses containing scheduling data,
servers that do not respond with such performance data will never be chosen.

We have designed a plugin scheduler facility to enable the application
developer to tailor the \diet scheduling to the targeted application. This
functionality provides the application developer the means to extend the notion
of a performance estimation to include metrics that are application-specific,
and to instruct \diet how to treat those data in the aggregation phase.  We
describe these interfaces in the following sections.


\section{Plugin Scheduler Interface}

Distributed applications are varied and often exhibit performance behavior
specific to the domain from which they arise. Consequently,
application-specific scheduling approaches are often necessary to achieve
high-performance execution. We propose an extensible framework to build
\emph{plugin schedulers}, enabling application developers to specify
performance estimation metrics that are tailored to their individual needs.

%% This section introduces the principal components of the basic plugin
%% scheduler framework.

\subsection{Estimation Metric Vector}\label{sect:estvector}

The type \texttt{estVector\_t} represents an \emph{estimation vector},
logically a structure that can manage a dynamic collection of performance
estimation values. It contains values that represent the performance profile
provided by a {\sed} in response to a \diet service request. This collection of
values may include either standard performance measures that are available
through \diet, or developer-defined values that are meaningful solely in the
context of the application being developed.

\subsection{Standard Estimation Tags}\label{sect:estTags}

To access to the different fields of the \texttt{estVector\_t}, it is necessary
to specify the tag that correspond to a specific information type.
Table~\ref{t:tags} describes this correspondence. Some tags represent a list of
values, one has to use the \texttt{diet\_est\_array\_*} functions to have
access to them. In Table~\ref{t:tags},  the second column marks these
multi-value tags.

The tag \textit{ALLINFOS} is a special: his field is always empty, but it
allows to fill the vector with all known tags  by the particular collector.
 
\begin{table}[h]
 \tiny
 \centering
 \begin{tabular}[c]{|l|c|l|}\hline
%TAGS lines
 \textit{TCOMP        }&& the predicted time  to solve a problem \\[5pt]
  \hline
  \textit{TIMESINCELASTSOLVE} &   & time since last solve has been made (sec) \\[5pt]
  \hline
  \textit{FREECPU      }&& amount of free CPU power between 0 and 1 \\[5pt]
  \hline
  \textit{FREEMEM      }&& amount of free  memory (Mb) \\[5pt]
  \hline
  \textit{NBCPU        }&& number of available processors  \\[5pt]
  \hline
  \textit{CPUSPEED     }&x& frequency of CPUs (MHz) \\[5pt]
  \hline
  \textit{TOTALMEM     }&& total memory size (Mb)  \\[5pt]
  \hline
  \textit{AVGFREECPU   }&& average amount of free CPU power in [0..1] \\[5pt]
  \hline
  \textit{BOGOMIPS     }&x& CPUs' bogomips \\[5pt]
  \hline
  \textit{CACHECPU     }&x& cache size CPUs (Kb) \\[5pt]
  \hline
  \textit{TOTALSIZEDISK}&& size of the partition (Mb)\\[5pt]
  \hline
  \textit{FREESIZEDISK }&& amount of free place on partition (Mb)\\[5pt]
  \hline
  \textit{DISKACCESREAD}&& average time to read on disk (Mb/sec) \\[5pt]
  \hline
  \textit{DISKACCESWRITE}&& average time to write to disk (sec) \\[5pt]
  \hline
  \textit{ALLINFOS     }&x& [empty] fill all possible fields \\[5pt]
  \hline
  \textit{PARAL\_NB\_FREE\_RESOURCES\_IN\_DEFAULT\_QUEUE} & & number
  of idle resources \\[5pt] 
  \hline
 \end{tabular}
 \caption{Estimation tags}
 \label{t:tags}
\end{table}

\subsubsection{Standard Performance Metrics}

To access to the existing default performance estimation routines (as described
in Chapter~\ref{chapter:performance}), the following functions are available to
facilitate the construction of custom performance estimation functions:
\begin{itemize}
\item The time elapsed since the last execution (to enable the ``least recently used''
  scheduler) is stored in an estimation metric vector by calling
  \begin{tabbing}
    \texttt{int diet\_estimate\_lastexec(}\=\texttt{estVector\_t ev,} \\
    \> \texttt{const diet\_profile\_t* const profilePtr)};
  \end{tabbing}
  with an appropriate value for \texttt{ev} and the \texttt{profilePtr}
  corresponding to the current \diet request.
\item The number of waiting jobs when using the maximum concurrent jobs limit
  is stored in an estimation metric vector by calling
  \begin{tabbing}
    \texttt{int diet\_estimate\_waiting\_jobs(}\=\texttt{estVector\_t ev)};
  \end{tabbing}
\item CoRI allows to access in an easy way to basic performance
  prediction. See Chapter~\ref{sec:CORI} to know more about the use of it.
\end{itemize}

In the future, we plan to expand the suite of default estimation metrics to
include dynamic internal \diet system state information (\eg queue lengths).

\subsubsection{Developer-defined Performance Metrics}

Application developers may also define performance values to be included in a
\sed response to a client request.  For example, a \diet \sed that provides a
service to query particular databases may need to include information about
which databases are currently resident in its disk cache, in order that an
appropriate server be identified for each client request.  To store such
values, the \sed developer should first choose a unique integer identifier,
referred to as the \emph{tag} to denote each logical datum to be stored.
Values are associated with tags using the following interface:
\begin{code}
int diet\_est\_set(estVector\_t ev, int userTag, double value);
\end{code}
The \texttt{ev} parameter is the estimation vector where the value will be
stored, the \texttt{userTag} parameter denotes the chosen tag, and
\texttt{value} indicates the value to be associated with the tag.  Tagged data
are used to effect scheduling policies by defining custom server response
aggregation methods, described in Section~\ref{sect:agg_methods}.

\subsection{Estimation Function}\label{sect:est_fn}

The default behavior of a {\sed} when a service request arrives from its parent
agent is to store the following information in the request profile:
\begin{enumerate}
\item \textbf{Elapsed time since last execution}: To implement the default
  round-robin behavior in absence of scheduling facilities, each {\sed}
  stores a timestamp of its last execution.  When a service request arrives,
  the difference between that timestamp and the current time is added to the
  performance estimate.
\end{enumerate}
This is accomplished by using the \texttt{diet\_estimate\_lastexec} functions
described in Section~\ref{sect:estvector}.

To implement a plugin scheduler, we define an interface that admits
customizable performance estimation routines:
\begin{tabbing}
  \texttt{typedef void (* diet\_perfmetric\_t)(}
    \=\texttt{diet\_profile\_t*,} \\
    \>\texttt{estVector\_t);} \\
\end{tabbing}
\begin{tabbing}
  \texttt{diet\_perfmetric\_t} \\
  \texttt{diet\_service\_use\_perfmetric(diet\_perfmetric\_t perfmetric\_fn);}\\
\end{tabbing}
%% \begin{code}
%%   diet\_perfmetric\_t\\
%%   diet\_service\_use\_perfmetric(diet\_perfmetric\_t perfmetric\_fn);\\
%% \end{code}
Thus, the type \texttt{diet\_perfmetric\_t} is a function pointer that takes as
arguments a performance estimation (represented by the \texttt{estVector\_t}
object) and a \diet service request profile.  The application developer can
associate such a function, or \emph{performance estimation routine}, with \diet
services via the \texttt{diet\_service\_use\_perfmetric} interface.  This
interface returns the previously registered performance estimation routine, if
one was defined (and \texttt{NULL} otherwise).  At this point, a service added
using the \texttt{diet\_service\_table\_add} function will be associated with
the declared performance estimation routine.  Additionally, a performance
estimation routine so specified will be associated with \emph{all} services
added into the service table until another call to the
\texttt{diet\_service\_use\_perfmetric} interface is made.  In the performance
estimation routine, the {\sed} developer should store in the provided
estimation vector any performance data used in the server response aggregation
methods (described in the next section).

\subsection{Aggregation Methods}\label{sect:agg_methods}

At the time a \diet service is defined, an \emph{aggregation method}~-- the
logical mechanism by which {\sed} responses are sorted~-- is associated with
the service; the default behavior was described in
Section~\ref{sect:default_sched}.

If application-specific data \emph{are} supplied (\ie the estimation function
has been redefined), an alternative method for aggregation is needed.
Currently, a basic \emph{priority scheduler} has been implemented, enabling an
application developer to specify a series of performance values that are to be
optimized in succession.  A developer may implement a priority scheduler using
the following interface:
\begin{code}
\begin{tabbing}
diet\_aggregator\_desc\_t* \\
diet\_profile\_desc\_aggregator(diet\_profile\_desc\_t* profile); \\
\\
int diet\_aggregator\_set\_type(\=diet\_aggregator\_desc\_t* agg, \\
\> diet\_aggregator\_type\_t atype); \\
\\
int diet\_aggregator\_priority\_max(\=diet\_aggregator\_desc\_t* agg, \\
\> diet\_est\_tag\_t tag); \\
\\
int diet\_aggregator\_priority\_min(\=diet\_aggregator\_desc\_t* agg, \\
\> diet\_est\_tag\_t tag); \\
\\
int diet\_aggregator\_priority\_maxuser(\=diet\_aggregator\_desc\_t* agg, \\
\> int val); \\
\\
int diet\_aggregator\_priority\_minuser(\=diet\_aggregator\_desc\_t* agg, \\
\> int val); \\
\end{tabbing}
\end{code}
The \texttt{diet\_profile\_desc\_aggregator} and
\texttt{diet\_aggregator\_set\_type} functions fetch and configure the
aggregator corresponding to a \diet service profile, respectively.  In
particular, a priority scheduler is declared by invoking the latter function
with \texttt{DIET\_AGG\_PRIORITY} as the \texttt{agg} parameter.  Recall that
from the point of view of an agent, the aggregation phase is essentially a
sorting of the server responses from its children.  A priority scheduler
logically uses a series of user-specified tags to perform the pairwise server
comparisons needed to construct the sorted list of server responses.

To define the tags and the order in which they should be compared, four
functions are introduced.  These functions, of the form
\texttt{diet\_aggregator\_priority\_*}, serve to identify the estimation values
to be optimized during the aggregation phase.  The \texttt{\_min} and
\texttt{\_max} forms indicate that a standard performance metric (\eg time
elapsed since last execution, from the \texttt{diet\_estimate\_lastexec}
function) is to be either minimized or maximized, respectively.  Similarly, the
\texttt{\_minuser} and \texttt{\_maxuser} forms indicate the analogous
operations on user-supplied estimation values.  Calls to these functions
indicate the order of \textbf{precedence} of the tags.

Each time two server responses need to be compared, the values associated with
the tags specified in the priority aggregator are retrieved. In the specified
order, pairs of corresponding values are successively compared, passing to the
next tag only if the values for the current tag are identical. If one server
response contains a value for the metric currently being compared, and another
does not, the response with a valid value will be selected. If at any point
during the treatment of tags \emph{both} responses lack the necessary tag, the
comparison is declared indeterminate.  This process continues until one
response is declared superior to the other, or all tags in the priority
aggregator are exhausted (and the responses are judged equivalent).


\section{Example}

An example is present in the \diet distribution to illustrate the usage
of the plugin scheduler functionality; this code is available in the directory
\begin{code}
src/examples/plugin\_example/
\end{code}
A \diet server and client corresponding to a simulation of a database research
application are provided. If the construction of examples was enabled during
\diet configuration, two binaries \texttt{server} and \texttt{client} will be
built in this directory. Having deployed a \diet agent hierarchy, the server
may be instantiated:
\begin{code}
  \$ server <SeD\_config> <DB> [ <DB> ... ]
\end{code}
where \texttt{<DB>} are string(s) that represent the existence of a particular
database at the {\sed}'s site.  A client would pose a query against a set of
databases:
\begin{code}
  \$ client <client\_config> <DB> [ <DB> ... ]
\end{code}
The application uses the plugin scheduling facility to prioritize the existence
of databases in selecting a server, and thus, the expected result is that one
of the {\sed}s with the fewest number of database mismatches will be selected.

In the \texttt{main} function of the \texttt{server.c} file, the following
block of code (a)~specifies the use of the priority aggregator for this
service, (b)~declares a performance estimation function to supply the necessary
data at request-time, and (c)~defines the order of precedence of the
performance values (\ie minimizing the number of database mismatches, and then
maximizing the elapsed execution time).
\begin{verbatim}
  {
    /* new section of the profile: aggregator */
    diet_aggregator_desc_t *agg;
    agg = diet_profile_desc_aggregator(profile);

    /* for this service, use a priority scheduler */
    diet_aggregator_set_type(agg, DIET_AGG_PRIORITY);          /* (a) */

    /* install our custom performance function */
    diet_service_use_perfmetric(performanceFn);                /* (b) */

    /* define the precedence order */
    diet_aggregator_priority_minuser(agg, 0);                  /* (c) */
    diet_aggregator_priority_max(agg, EST_TIMESINCELASTSOLVE); /* (c) */
  }
\end{verbatim}
The performance function \texttt{performanceFn} is defined as follows:
\begin{verbatim}
static void performanceFn(diet_profile_t* pb, estVector_t perfValues);

[...]

/*
** performanceFn: the performance function to use in the DIET
**   plugin scheduling facility
*/
static void
performanceFn(diet_profile_t* pb, estVector_t perfValues)
{
  const char *target;
  int numMismatch;

  /* string value must be fetched from description; value is NULL */
  target = (diet_paramstring_get_desc(diet_parameter(pb, 0)))->param;
  numMismatch = computeMismatches(target);

  /*
  ** store the mismatch value in the user estimate space,
  ** using tag value 0
  */
  diet_est_set(perfValues, 0, numMismatch);

  /* also store the timestamp since last execution */
  diet_estimate_lastexec(perfValues, pb);
}
\end{verbatim}
The function \texttt{computeMismatches} (defined earlier in \texttt{server.c})
calculates the number of requested databases that are not present on the {\sed}
making the evaluation. Together, these two code segments serve to customize the
generation of performance information and the treatment of these data in the
context of the simulated database search. Finally, it should be noted that the
existence of a plugin scheduler is completely transparent to the client, and
thus client code need not be changed.

%
% Scheduling at agents level
%
\input{agent_scheduler.tex}

\section{Future Work}

We have two primary efforts planned for extensions to the plugin scheduler.
\begin{itemize}
\item \textbf{Additional information services}: We plan to add functionalities
  to enable the application developer to access and use data concerning the
  internal state of the \diet server (\eg the current length of request
  queues).  As other performance measurement and evaluation tools are developed
  both within and external to the \diet project (see
  Chapter~\ref{chapter:performance}), some tools are already available to
  enable such  information to be incorporated in the context of the plugin
  scheduler.
\item \textbf{Enhanced aggregation methods}: The plugin scheduler implemented
  in the current release enables the \diet system to account for user-defined
  factors in the server selection process.  However, the priority aggregation
  method is fairly rudimentary and lacks the power to express many imaginable
  comparison mechanisms.  We plan to investigate methods to embed code into
  \diet agents (\eg a simple expression interpreter) in a manner that is secure
  and that preserves performance.
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \end{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Local Variables:
%%% mode: latex
%%% ispell-local-dictionary: "american"
%%% mode: flyspell
%%% fill-column: 79
%%% End: