1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Copyright (c)2015-2016 by The University of Queensland
% http://www.uq.edu.au
%
% Primary Business: Queensland, Australia
% Licensed under the Apache License, version 2.0
% http://www.apache.org/licenses/LICENSE-2.0
%
% Development from 2014 by Centre for Geoscience Computing (GeoComp)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%!TEX root = user.tex
\chapter{Escript subworlds}
\label{CHAP:subworld}
\section{Introduction}
By default, when \escript is used with multiple processes\footnote{That is, when MPI is employed
(probably in a cluster setting).
This discussion is not concerned with using multiple threads within a single process (OpenMP).
The functionality described here is compatible with OpenMP but orthogonal to it (in the same sense that MPI and OpenMP are
orthogonal).}
, it will use all the resources available to it to solve each PDE.
With high resolution domains, this is what you want.
However, in some cases, for example inversion, it may be desirable to solve a number of lower resolution problems but the
total computational work is still significant.
In such cases, \escript now allows you to subdivide \escript's processes into smaller groups which can be assigned separate
tasks.
For example, if escript is started with 8 MPI processes\footnote{see the \texttt{run-escript} launcher or
your system's mpirun for details.}, then these could be subdivided into 2 groups of 4 processes, 4 groups of 2 or 8 groups of 1 node
\footnote{1 group of 8 processes is also possible, but perhaps not particularly useful aside from debugging scripts.}.
\subsection{Concepts}
\begin{description}
\item[Job] describes a task which you wish to perform, the task is represented by a python function.
There are some restrictions on the type of parameters which can be passed to the function.
The same function can be executed many times (possibly with different parameters) but each execution is
handled by a separate Job instance.
\item[Domain object] the representation of the discretized domain the PDEs are defined on.
Behind the scenes, it also contains information about the processes which share information about the domain.
Normally in escript, this will be all of them but for subworlds this can change.
\item[Subworld] the context in which a job executes.
All the processes which will be executing a particular job belong to the same subworld.
Each subworld supplies the domain which jobs running on it will use as well as some other variables.
Python scripts will not normally interact with subworlds directly.
\item[SplitWorld] a partition of available processes into subworlds; and the python interface to that partition.
For example:
\begin{python}
sw=SplitWorld(4)
\end{python}
will create 4 equal sized subworlds.
If there are not enough processes to do this or the number of processes is not divisible by the
number of subworlds, then an exception will be raised.
Note that creating a SplitWorld object does not prevent escript from solving PDEs normally.
You can have scripts which use both modes of operation.
Each subworld has its own domain object but those objects must represent the same content (and be created in the same way).
To do this call \texttt{buildDomains}:
\begin{python}
buildDomains(sw, Rectangle, 10, 10, optimize=True)
\end{python}
describes the creation of a domain which would normally be made constructed with:
\begin{python}
Rectangle(10, 10, optimize=True)
\end{python}.
Note that you do not create domains and pass them into the SplitWorld\footnote{
This would not work because such a domain would expect to use all the processes rather than a subset of them.}.
Also note that \texttt{buildDomains} (and the later \texttt{addJob} and \texttt{addVariable calls)} are not methods of SplitWorld\footnote{
This has to do with a limitation of the \texttt{boost::python} library used in the development of \escript.}.
The fact that each subworld has its own domain means that objects build directly (or indirectly) on domains
can't be passed directly into, out of or between subworlds.
This includes: Data, FunctionSpace, LinearPDE.
To submit a job for execution, use one of the following calls:
\begin{python}
addJob(sw, FunctionJob, myfunction, par1=value1, par2=value2)
addJobPerWorld(sw, FunctionJob, myfunction, par1=value1, par2=value2)
\end{python}
The first call adds a single job to execute on an available subworld.
The second creates a job instance on each subworld\footnote{
Note that this is not the same as calling \texttt{addJob} $n$ times where $n$ is the number of subworlds.
It is better not to make assumptions about how SplitWorld distributes jobs to subworlds.
}.
One use for this would be to perform some setup on each subworld.
The keyword parameters are illustrative only.
Since task functions run by jobs must have the parameter list \texttt{(self, **kwargs)}, keyword arguments are the best way
to get extra information into your function.
To execute submitted jobs:
\begin{python}
sw.runJobs()
\end{python}
If any of the jobs raised an exception (or there was some other problem), then an exception will be raised in the
top level python script.
This is intended to help diagnose the fault, it is not intended to allow resuming of the calculation under script control\footnote{
Some things preventing this:
\begin{itemize}
\item Only one of the exceptions will be reported (if multiple jobs raise, you will only see one message).
\item The exception does not contain the payload and type of the original exception.
\item We do not guarantee that all jobs have been attempted if an exception is raised.
\item Variables may be reset and so values may be lost.
\end{itemize}
}.
\item[Variable] a value declared to persist between jobs and the main way to get complex information into and out of a job.
The task function for a job can read/write to variables with the \texttt{importValue}/\texttt{exportValue} calls.
\begin{python}
def mywork(self, **kwargs):
x=self.importValue("x")
v=kwargs['translate']
self.exportValue('nx', x+v)
\end{python}
At present, \escript supports three types of variables:
\begin{python}
addVariable(sw, name, makeLocalOnly)
addVariable(sw, name, makeScalarReducer, op) # op=="SET" or "SUM"
addVariable(sw, name, makeDataReducer, op)
\end{python}
.
These calls are made in the top level python script and ensure that all subworlds know about all the possible variables.
The difference between the types is how they interact with other subworlds and what sort of values they can store.
\begin{itemize}
\item ``LocalOnly'' variables can store any python object but each subworld has its own value independent of the others.
Because they could be storing anything (and it is not guaranteed to be consistent across worlds), we do not provide an
interface to extract these variables from subworlds.
\item ``ScalarReducer'' variables store a floating point value.
Any values which are exported by jobs during a \texttt{runJobs} call are gathered together and combined (reduced) and then
shared with all interested worlds\footnote{
In this version, this will be all subworlds, but the capability exists to be
more selective.}.
There are currently two possible reduction operators: \emph{SET} and \emph{SUM}.
SUM adds all exported values together.
SET only accepts one value per \texttt{runJobs} call and raises an exception if more than one set is attempted.
It is possible to read these variables outside the splitword with:
\begin{python}
z=sw.getDoubleVariable(name)
\end{python}
\item ``DataReducer'' variables store Data objects and support the same SET and SUM operations as ScalarReducer.
We do not provide an interface to extract Data objects from subworlds.
\end{itemize}
Note that all the normal save/load functions will work inside subworlds so while Data objects can't be passed out of SplitWorlds,
they can be saved from inside the subworlds.
\end{description}
\section{Example}
\begin{python}
from esys.escript import *
from esys.escript.linearPDEs import Poisson
from esys.ripley import Rectangle
# Each node is in its own world
sw=SplitWorld(getMPISizeWorld())
buildDomains(sw, Rectangle, 100, 100)
#describe the work we want to do
# In this case we solve a Poisson equation
def task(self, **kwargs):
v=kwargs['v']
dom=self.domain
pde=Poisson(dom)
x=dom.getX()
gammaD=whereZero(x[0])+whereZero(x[1])
pde.setValue(f=v, q=gammaD)
soln=pde.getSolution()
soln.dump('soln%d.ncdf'%v)
# Now we add some jobs
for i in range(1,20):
addJob(sw, FunctionJob, task, v=i)
# Run them
sw.runJobs()
\end{python}
\section{Classes and Functions}
\begin{methoddesc}[SplitWorld]{SplitWorld}{n}
Returns a SplitWorld which contains $n$ subworlds; will raise an exception if this is not possible.
\end{methoddesc}
\begin{funcdesc}{addVariable}{splitword, name, constructor, args}
Adds a variable to each of the subworlds built by the function \var{constructor} with arguments \var{args}.
\end{funcdesc}
\begin{funcdesc}{makeLocalOnly}{}
Used to create a variable to store a python object which will be local to each subworld.
These values will not be transferred between or out of subworlds.
An example use case is calculating a list of values once on each world and caching them for use by later jobs.
\end{funcdesc}
\begin{funcdesc}{makeScalarReducer}{name, op}
Used to create a variable to share and combine floating point values.
The operation can currently be ``SET''(allows only one assignment for each \texttt{runJobs} call) or ``SUM''.
\end{funcdesc}
\begin{funcdesc}{makeDataReducer}{name, op}
Used to create a variable to share and combine Data values.
The operation can currently be ``SET''(allows only one assignment for each \texttt{runJobs} call) or ``SUM''.
\end{funcdesc}
\begin{funcdesc}{buildDomains}{splitworld, constructor, args}
Indicates how subworlds should construct their domains.
\emph{Note that the splitworld code does not support multi-resolution ripley domains yet.}
\end{funcdesc}
\begin{funcdesc}{addJob}{splitworld, FunctionJob, function, args}
Submit a job to run \var{function} with \var{args} to be executed in an available subworld.
\end{funcdesc}
\begin{funcdesc}{addJobPerWorld}{splitworld, FunctionJob, function, args}
Submit a job to run \var{function} with \var{args} to be executed in each subworld.
Individual jobs can use properties of \var{self} such as \member{swid} or \member{jobid} to distinguish between
themselves.
\end{funcdesc}
\subsection{SplitWorld methods}
All of these methods are only to be called by the top level python script.
Do not attempt to use them inside a job.
\begin{methoddesc}[SplitWorld]{removeVariable}{name}
Removes a variable and its value from all subworlds.
\end{methoddesc}
\begin{methoddesc}[SplitWorld]{clearVariable}{name}
Clears the value of the named variable on all worlds.
The variable no longer has a value but a new value can be exported for it.
\end{methoddesc}
\begin{methoddesc}[SplitWorld]{getVarList}{}
Return a python list of pairs \texttt{[name, hasvalue]} (one for each variable).
\var{hasvalue} is True if the variable currently has a value.
Mainly useful for debugging.
\end{methoddesc}
\begin{methoddesc}[SplitWorld]{getDoubleVariable}{name}
Extracts a floating point value of the named variable to the top level python script.
If the named variable does not support this an exception will be raised.
\end{methoddesc}
\begin{methoddesc}[SplitWorld]{copyVariable}{source, dest}
Copies\footnote{ This is a deep copy for Data objects.} the value into the named variable.
This avoids the need to create jobs merely to importValue+exportValue into the new name.
\end{methoddesc}
\begin{methoddesc}[SplitWorld]{getSubWorldCount}{}
Returns the number of subworlds.
\end{methoddesc}
\begin{methoddesc}[SplitWorld]{getSubWorldID}{}
Returns the id of the local world.
\end{methoddesc}
\subsection{FunctionJob methods and members}
A FunctionJob instance will be the \var{self} parameter in your task function.
There are other features of the class but the following are relevant to task functions.
\begin{memberdesc}[FunctionJob]{jobid}
Each job is given an id from an increasing sequence.
\end{memberdesc}
\begin{memberdesc}[FunctionJob]{swcount}
The number of subworlds in the SplitWorld.
\end{memberdesc}
\begin{memberdesc}[FunctionJob]{swid}
The id of the subworld this job is running in.
\end{memberdesc}
\begin{methoddesc}[FunctionJob]{importValue}{name}
Retrieves the value for the named variable.
This is a shallow copy so modifications made in the function may affect the variable (LocalOnly).
Do not abuse this, use \texttt{exportValue} instead.
\end{methoddesc}
\begin{methoddesc}[FunctionJob]{exportValue}{name, value}
Contributes a new value for the named variable.
\end{methoddesc}
|