1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438
|
%**
%* @file plugin.tex
%* @brief DIET User's Manual plugin scheduler chapter file
%* @author - Alan SU (Alan.SU@ens-lyon.fr)
%* @section Licence
%* |LICENCE|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \documentclass{article}
\newenvironment{code}
{\begin{list}{}{\setlength{\leftmargin}{1em}}\item\bfseries\tt}
{\end{list}}
\newenvironment{tinycode}
{\begin{list}{}{\setlength{\leftmargin}{1em}}\item\tiny\bfseries\tt}
{\end{list}}
%% \begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Scheduling in \diet}
\label{ch:plugin}
\section{Introduction}
We introduce a \emph{plugin scheduling} facility, designed to allow \diet
service developers to define application-specific performance measures and to
implement corresponding scheduling strategies. This section describes the
default scheduling policy in \diet and the interface to the plugin scheduling
facility.
\section{Default Scheduling Strategy}\label{sect:default_sched}
The \diet scheduling subsystem is based on the notion that, for the sake of
system efficacy and scalability, the work of determining the appropriate
schedule for a parallel workload should be distributed across the computational
platform. When a task in such a parallel workload is submitted to the system
for processing, each Server Daemon (\sed) provides a \emph{performance
estimate}~-- a collection of data pertaining to the capabilities of a
particular server in the context of a particular client request~-- for that
task. These estimates are passed to the server's parent agent; agents then
sort these responses in a manner that optimizes certain performance criteria.
Effectively, candidate {\sed}s are identified through a distributed scheduling
algorithm based on pairwise comparisons between these performance estimations;
upon receiving server responses from its children, each agent performs a local
scheduling operation called \emph{server response aggregation}. The end result
of the agent's aggregation phase is a list of server responses (from servers in
the subtree rooted at said agent), sorted according to the aggregation method
in effect. By default, the aggregation phase implements the following ordered
sequence of tests:
\begin{enumerate}
\item \textbf{Least recently used}: In the absence of application- and
platform-specific performance data, the \diet scheduler attempts to
probabilistically achieve load balance by assigning client requests based on
the time they last finished to compute. Essentially each server records a
timestamp indicating the last time at which it was assigned a job for
execution. Each time a request is received, the \sed computes the time elapsed
since its last execution, and among the responses it receives, \diet agents
select {\sed}s with a longer elapsed time.
\item \textbf{Random}: If the {\sed} is unable to store timestamps, the \diet
scheduler will chose randomly when comparing two otherwise equivalent {\sed}
performance estimations.
\end{enumerate}
In principle, this scheduling policy prioritizes servers that are able to
provide useful performance prediction information.
In general, this approach works well when all servers in a
given \diet hierarchy are capable of making such estimations. However, in
platforms composed of {\sed}s with varying capabilities, load imbalances may
occur: since \diet systematically prioritizes server responses containing scheduling data,
servers that do not respond with such performance data will never be chosen.
We have designed a plugin scheduler facility to enable the application
developer to tailor the \diet scheduling to the targeted application. This
functionality provides the application developer the means to extend the notion
of a performance estimation to include metrics that are application-specific,
and to instruct \diet how to treat those data in the aggregation phase. We
describe these interfaces in the following sections.
\section{Plugin Scheduler Interface}
Distributed applications are varied and often exhibit performance behavior
specific to the domain from which they arise. Consequently,
application-specific scheduling approaches are often necessary to achieve
high-performance execution. We propose an extensible framework to build
\emph{plugin schedulers}, enabling application developers to specify
performance estimation metrics that are tailored to their individual needs.
%% This section introduces the principal components of the basic plugin
%% scheduler framework.
\subsection{Estimation Metric Vector}\label{sect:estvector}
The type \texttt{estVector\_t} represents an \emph{estimation vector},
logically a structure that can manage a dynamic collection of performance
estimation values. It contains values that represent the performance profile
provided by a {\sed} in response to a \diet service request. This collection of
values may include either standard performance measures that are available
through \diet, or developer-defined values that are meaningful solely in the
context of the application being developed.
\subsection{Standard Estimation Tags}\label{sect:estTags}
To access to the different fields of the \texttt{estVector\_t}, it is necessary
to specify the tag that correspond to a specific information type.
Table~\ref{t:tags} describes this correspondence. Some tags represent a list of
values, one has to use the \texttt{diet\_est\_array\_*} functions to have
access to them. In Table~\ref{t:tags}, the second column marks these
multi-value tags.
The tag \textit{ALLINFOS} is a special: his field is always empty, but it
allows to fill the vector with all known tags by the particular collector.
\begin{table}[h]
\tiny
\centering
\begin{tabular}[c]{|l|c|l|}\hline
%TAGS lines
\textit{TCOMP }&& the predicted time to solve a problem \\[5pt]
\hline
\textit{TIMESINCELASTSOLVE} & & time since last solve has been made (sec) \\[5pt]
\hline
\textit{FREECPU }&& amount of free CPU power between 0 and 1 \\[5pt]
\hline
\textit{FREEMEM }&& amount of free memory (Mb) \\[5pt]
\hline
\textit{NBCPU }&& number of available processors \\[5pt]
\hline
\textit{CPUSPEED }&x& frequency of CPUs (MHz) \\[5pt]
\hline
\textit{TOTALMEM }&& total memory size (Mb) \\[5pt]
\hline
\textit{AVGFREECPU }&& average amount of free CPU power in [0..1] \\[5pt]
\hline
\textit{BOGOMIPS }&x& CPUs' bogomips \\[5pt]
\hline
\textit{CACHECPU }&x& cache size CPUs (Kb) \\[5pt]
\hline
\textit{TOTALSIZEDISK}&& size of the partition (Mb)\\[5pt]
\hline
\textit{FREESIZEDISK }&& amount of free place on partition (Mb)\\[5pt]
\hline
\textit{DISKACCESREAD}&& average time to read on disk (Mb/sec) \\[5pt]
\hline
\textit{DISKACCESWRITE}&& average time to write to disk (sec) \\[5pt]
\hline
\textit{ALLINFOS }&x& [empty] fill all possible fields \\[5pt]
\hline
\textit{PARAL\_NB\_FREE\_RESOURCES\_IN\_DEFAULT\_QUEUE} & & number
of idle resources \\[5pt]
\hline
\end{tabular}
\caption{Estimation tags}
\label{t:tags}
\end{table}
\subsubsection{Standard Performance Metrics}
To access to the existing default performance estimation routines (as described
in Chapter~\ref{chapter:performance}), the following functions are available to
facilitate the construction of custom performance estimation functions:
\begin{itemize}
\item The time elapsed since the last execution (to enable the ``least recently used''
scheduler) is stored in an estimation metric vector by calling
\begin{tabbing}
\texttt{int diet\_estimate\_lastexec(}\=\texttt{estVector\_t ev,} \\
\> \texttt{const diet\_profile\_t* const profilePtr)};
\end{tabbing}
with an appropriate value for \texttt{ev} and the \texttt{profilePtr}
corresponding to the current \diet request.
\item The number of waiting jobs when using the maximum concurrent jobs limit
is stored in an estimation metric vector by calling
\begin{tabbing}
\texttt{int diet\_estimate\_waiting\_jobs(}\=\texttt{estVector\_t ev)};
\end{tabbing}
\item CoRI allows to access in an easy way to basic performance
prediction. See Chapter~\ref{sec:CORI} to know more about the use of it.
\end{itemize}
In the future, we plan to expand the suite of default estimation metrics to
include dynamic internal \diet system state information (\eg queue lengths).
\subsubsection{Developer-defined Performance Metrics}
Application developers may also define performance values to be included in a
\sed response to a client request. For example, a \diet \sed that provides a
service to query particular databases may need to include information about
which databases are currently resident in its disk cache, in order that an
appropriate server be identified for each client request. To store such
values, the \sed developer should first choose a unique integer identifier,
referred to as the \emph{tag} to denote each logical datum to be stored.
Values are associated with tags using the following interface:
\begin{code}
int diet\_est\_set(estVector\_t ev, int userTag, double value);
\end{code}
The \texttt{ev} parameter is the estimation vector where the value will be
stored, the \texttt{userTag} parameter denotes the chosen tag, and
\texttt{value} indicates the value to be associated with the tag. Tagged data
are used to effect scheduling policies by defining custom server response
aggregation methods, described in Section~\ref{sect:agg_methods}.
\subsection{Estimation Function}\label{sect:est_fn}
The default behavior of a {\sed} when a service request arrives from its parent
agent is to store the following information in the request profile:
\begin{enumerate}
\item \textbf{Elapsed time since last execution}: To implement the default
round-robin behavior in absence of scheduling facilities, each {\sed}
stores a timestamp of its last execution. When a service request arrives,
the difference between that timestamp and the current time is added to the
performance estimate.
\end{enumerate}
This is accomplished by using the \texttt{diet\_estimate\_lastexec} functions
described in Section~\ref{sect:estvector}.
To implement a plugin scheduler, we define an interface that admits
customizable performance estimation routines:
\begin{tabbing}
\texttt{typedef void (* diet\_perfmetric\_t)(}
\=\texttt{diet\_profile\_t*,} \\
\>\texttt{estVector\_t);} \\
\end{tabbing}
\begin{tabbing}
\texttt{diet\_perfmetric\_t} \\
\texttt{diet\_service\_use\_perfmetric(diet\_perfmetric\_t perfmetric\_fn);}\\
\end{tabbing}
%% \begin{code}
%% diet\_perfmetric\_t\\
%% diet\_service\_use\_perfmetric(diet\_perfmetric\_t perfmetric\_fn);\\
%% \end{code}
Thus, the type \texttt{diet\_perfmetric\_t} is a function pointer that takes as
arguments a performance estimation (represented by the \texttt{estVector\_t}
object) and a \diet service request profile. The application developer can
associate such a function, or \emph{performance estimation routine}, with \diet
services via the \texttt{diet\_service\_use\_perfmetric} interface. This
interface returns the previously registered performance estimation routine, if
one was defined (and \texttt{NULL} otherwise). At this point, a service added
using the \texttt{diet\_service\_table\_add} function will be associated with
the declared performance estimation routine. Additionally, a performance
estimation routine so specified will be associated with \emph{all} services
added into the service table until another call to the
\texttt{diet\_service\_use\_perfmetric} interface is made. In the performance
estimation routine, the {\sed} developer should store in the provided
estimation vector any performance data used in the server response aggregation
methods (described in the next section).
\subsection{Aggregation Methods}\label{sect:agg_methods}
At the time a \diet service is defined, an \emph{aggregation method}~-- the
logical mechanism by which {\sed} responses are sorted~-- is associated with
the service; the default behavior was described in
Section~\ref{sect:default_sched}.
If application-specific data \emph{are} supplied (\ie the estimation function
has been redefined), an alternative method for aggregation is needed.
Currently, a basic \emph{priority scheduler} has been implemented, enabling an
application developer to specify a series of performance values that are to be
optimized in succession. A developer may implement a priority scheduler using
the following interface:
\begin{code}
\begin{tabbing}
diet\_aggregator\_desc\_t* \\
diet\_profile\_desc\_aggregator(diet\_profile\_desc\_t* profile); \\
\\
int diet\_aggregator\_set\_type(\=diet\_aggregator\_desc\_t* agg, \\
\> diet\_aggregator\_type\_t atype); \\
\\
int diet\_aggregator\_priority\_max(\=diet\_aggregator\_desc\_t* agg, \\
\> diet\_est\_tag\_t tag); \\
\\
int diet\_aggregator\_priority\_min(\=diet\_aggregator\_desc\_t* agg, \\
\> diet\_est\_tag\_t tag); \\
\\
int diet\_aggregator\_priority\_maxuser(\=diet\_aggregator\_desc\_t* agg, \\
\> int val); \\
\\
int diet\_aggregator\_priority\_minuser(\=diet\_aggregator\_desc\_t* agg, \\
\> int val); \\
\end{tabbing}
\end{code}
The \texttt{diet\_profile\_desc\_aggregator} and
\texttt{diet\_aggregator\_set\_type} functions fetch and configure the
aggregator corresponding to a \diet service profile, respectively. In
particular, a priority scheduler is declared by invoking the latter function
with \texttt{DIET\_AGG\_PRIORITY} as the \texttt{agg} parameter. Recall that
from the point of view of an agent, the aggregation phase is essentially a
sorting of the server responses from its children. A priority scheduler
logically uses a series of user-specified tags to perform the pairwise server
comparisons needed to construct the sorted list of server responses.
To define the tags and the order in which they should be compared, four
functions are introduced. These functions, of the form
\texttt{diet\_aggregator\_priority\_*}, serve to identify the estimation values
to be optimized during the aggregation phase. The \texttt{\_min} and
\texttt{\_max} forms indicate that a standard performance metric (\eg time
elapsed since last execution, from the \texttt{diet\_estimate\_lastexec}
function) is to be either minimized or maximized, respectively. Similarly, the
\texttt{\_minuser} and \texttt{\_maxuser} forms indicate the analogous
operations on user-supplied estimation values. Calls to these functions
indicate the order of \textbf{precedence} of the tags.
Each time two server responses need to be compared, the values associated with
the tags specified in the priority aggregator are retrieved. In the specified
order, pairs of corresponding values are successively compared, passing to the
next tag only if the values for the current tag are identical. If one server
response contains a value for the metric currently being compared, and another
does not, the response with a valid value will be selected. If at any point
during the treatment of tags \emph{both} responses lack the necessary tag, the
comparison is declared indeterminate. This process continues until one
response is declared superior to the other, or all tags in the priority
aggregator are exhausted (and the responses are judged equivalent).
\section{Example}
An example is present in the \diet distribution to illustrate the usage
of the plugin scheduler functionality; this code is available in the directory
\begin{code}
src/examples/plugin\_example/
\end{code}
A \diet server and client corresponding to a simulation of a database research
application are provided. If the construction of examples was enabled during
\diet configuration, two binaries \texttt{server} and \texttt{client} will be
built in this directory. Having deployed a \diet agent hierarchy, the server
may be instantiated:
\begin{code}
\$ server <SeD\_config> <DB> [ <DB> ... ]
\end{code}
where \texttt{<DB>} are string(s) that represent the existence of a particular
database at the {\sed}'s site. A client would pose a query against a set of
databases:
\begin{code}
\$ client <client\_config> <DB> [ <DB> ... ]
\end{code}
The application uses the plugin scheduling facility to prioritize the existence
of databases in selecting a server, and thus, the expected result is that one
of the {\sed}s with the fewest number of database mismatches will be selected.
In the \texttt{main} function of the \texttt{server.c} file, the following
block of code (a)~specifies the use of the priority aggregator for this
service, (b)~declares a performance estimation function to supply the necessary
data at request-time, and (c)~defines the order of precedence of the
performance values (\ie minimizing the number of database mismatches, and then
maximizing the elapsed execution time).
\begin{verbatim}
{
/* new section of the profile: aggregator */
diet_aggregator_desc_t *agg;
agg = diet_profile_desc_aggregator(profile);
/* for this service, use a priority scheduler */
diet_aggregator_set_type(agg, DIET_AGG_PRIORITY); /* (a) */
/* install our custom performance function */
diet_service_use_perfmetric(performanceFn); /* (b) */
/* define the precedence order */
diet_aggregator_priority_minuser(agg, 0); /* (c) */
diet_aggregator_priority_max(agg, EST_TIMESINCELASTSOLVE); /* (c) */
}
\end{verbatim}
The performance function \texttt{performanceFn} is defined as follows:
\begin{verbatim}
static void performanceFn(diet_profile_t* pb, estVector_t perfValues);
[...]
/*
** performanceFn: the performance function to use in the DIET
** plugin scheduling facility
*/
static void
performanceFn(diet_profile_t* pb, estVector_t perfValues)
{
const char *target;
int numMismatch;
/* string value must be fetched from description; value is NULL */
target = (diet_paramstring_get_desc(diet_parameter(pb, 0)))->param;
numMismatch = computeMismatches(target);
/*
** store the mismatch value in the user estimate space,
** using tag value 0
*/
diet_est_set(perfValues, 0, numMismatch);
/* also store the timestamp since last execution */
diet_estimate_lastexec(perfValues, pb);
}
\end{verbatim}
The function \texttt{computeMismatches} (defined earlier in \texttt{server.c})
calculates the number of requested databases that are not present on the {\sed}
making the evaluation. Together, these two code segments serve to customize the
generation of performance information and the treatment of these data in the
context of the simulated database search. Finally, it should be noted that the
existence of a plugin scheduler is completely transparent to the client, and
thus client code need not be changed.
%
% Scheduling at agents level
%
\input{agent_scheduler.tex}
\section{Future Work}
We have two primary efforts planned for extensions to the plugin scheduler.
\begin{itemize}
\item \textbf{Additional information services}: We plan to add functionalities
to enable the application developer to access and use data concerning the
internal state of the \diet server (\eg the current length of request
queues). As other performance measurement and evaluation tools are developed
both within and external to the \diet project (see
Chapter~\ref{chapter:performance}), some tools are already available to
enable such information to be incorporated in the context of the plugin
scheduler.
\item \textbf{Enhanced aggregation methods}: The plugin scheduler implemented
in the current release enables the \diet system to account for user-defined
factors in the server selection process. However, the priority aggregation
method is fairly rudimentary and lacks the power to express many imaginable
comparison mechanisms. We plan to investigate methods to embed code into
\diet agents (\eg a simple expression interpreter) in a manner that is secure
and that preserves performance.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \end{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Local Variables:
%%% mode: latex
%%% ispell-local-dictionary: "american"
%%% mode: flyspell
%%% fill-column: 79
%%% End:
|