1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
|
/* StarPU --- Runtime system for heterogeneous multicore architectures.
*
* Copyright (C) 2009-2022 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
*
* StarPU is free software; you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation; either version 2.1 of the License, or (at
* your option) any later version.
*
* StarPU is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*
* See the GNU Lesser General Public License in COPYING.LGPL for more details.
*/
/*
* NOTE: XXX: also update simgrid versions in 101_building.doxy !!
*/
/*! \page SimGridSupport SimGrid Support
StarPU can use Simgrid in order to simulate execution on an arbitrary
platform. This was tested with SimGrid from 3.11 to 3.16, and 3.18 to
3.30. SimGrid version 3.25 needs to be configured with -Denable_msg=ON .
Other versions may have compatibility issues. 3.17 notably does not build at
all. MPI simulation does not work with version 3.22.
If you have installed simgrid by hand, make sure to set \c PKG_CONFIG_PATH to
the path where \c simgrid.pc was installed:
\verbatim
$ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/where/simgrid/installed/lib/ppkgconfig/simgrid.pc
\endverbatim
\section Preparing Preparing Your Application For Simulation
There are a few technical details which need to be handled for an application to
be simulated through SimGrid.
If the application uses <c>gettimeofday</c> to make its
performance measurements, the real time will be used, which will be bogus. To
get the simulated time, it has to use starpu_timing_now() which returns the
virtual timestamp in us.
For some technical reason, the application's .c file which contains \c main() has
to be recompiled with \c starpu_simgrid_wrap.h, which in the SimGrid case will <c># define main()</c>
into <c>starpu_main()</c>, and it is \c libstarpu which will provide the real \c main() and
will call the application's \c main(). Including \c starpu.h will already
include \c starpu_simgrid_wrap.h, so usually you would not need to include
\c starpu_simgrid_wrap.h explicitly, but if for some reason including the whole
\c starpu.h header is not possible, you can include \c starpu_simgrid_wrap.h
explicitly.
To be able to test with crazy data sizes, one may want to only allocate
application data if the macro \c STARPU_SIMGRID is not defined. Passing a <c>NULL</c> pointer to
\c starpu_data_register functions is fine, data will never be read/written to by
StarPU in SimGrid mode anyway.
To be able to run the application with e.g. CUDA simulation on a system which
does not have CUDA installed, one can fill the starpu_codelet::cuda_funcs with \c (void*)1, to
express that there is a CUDA implementation, even if one does not actually
provide it. StarPU will not actually run it in SimGrid mode anyway by default
(unless the ::STARPU_CODELET_SIMGRID_EXECUTE or ::STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT
flags are set in the codelet)
\snippet simgrid.c To be included. You should update doxygen if you see this text.
\section Calibration Calibration
The idea is to first compile StarPU normally, and run the application,
so as to automatically benchmark the bus and the codelets.
\verbatim
$ ./configure && make
$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
[starpu][_starpu_load_history_based_model] Warning: model matvecmult
is not calibrated, forcing calibration for this run. Use the
STARPU_CALIBRATE environment variable to control this.
$ ...
$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
TEST PASSED
\endverbatim
Note that we force to use the scheduler <c>dmda</c> to generate
performance models for the application. The application may need to be
run several times before the model is calibrated.
\section Simulation Simulation
Then, recompile StarPU, passing \ref enable-simgrid "--enable-simgrid"
to <c>configure</c>. Make sure to keep all other <c>configure</c> options
the same, and notably options such as <c>--enable-maxcudadev</c>.
\verbatim
$ ./configure --enable-simgrid
\endverbatim
To specify the location of SimGrid, you can either set the environment
variables \c SIMGRID_CFLAGS and \c SIMGRID_LIBS, or use the \c configure
options \ref with-simgrid-dir "--with-simgrid-dir",
\ref with-simgrid-include-dir "--with-simgrid-include-dir" and
\ref with-simgrid-lib-dir "--with-simgrid-lib-dir", for example
\verbatim
$ ./configure --with-simgrid-dir=/opt/local/simgrid
\endverbatim
You can then re-run the application.
\verbatim
$ make
$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
TEST FAILED !!!
\endverbatim
It is normal that the test fails: since the computation are not actually done
(that is the whole point of SimGrid), the result is wrong, of course.
If the performance model is not calibrated enough, the following error
message will be displayed
\verbatim
$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
[starpu][_starpu_load_history_based_model] Warning: model matvecmult
is not calibrated, forcing calibration for this run. Use the
STARPU_CALIBRATE environment variable to control this.
[starpu][_starpu_simgrid_execute_job][assert failure] Codelet
matvecmult does not have a perfmodel, or is not calibrated enough
\endverbatim
The number of devices can be chosen as usual with \ref STARPU_NCPU,
\ref STARPU_NCUDA, and \ref STARPU_NOPENCL, and the amount of GPU memory
with \ref STARPU_LIMIT_CUDA_MEM, \ref STARPU_LIMIT_CUDA_devid_MEM,
\ref STARPU_LIMIT_OPENCL_MEM, and \ref STARPU_LIMIT_OPENCL_devid_MEM.
\section SimulationOnAnotherMachine Simulation On Another Machine
The SimGrid support even permits to perform simulations on another machine, your
desktop, typically. To achieve this, one still needs to perform the Calibration
step on the actual machine to be simulated, then copy them to your desktop
machine (the <c>$STARPU_HOME/.starpu</c> directory). One can then perform the
Simulation step on the desktop machine, by setting the environment
variable \ref STARPU_HOSTNAME to the name of the actual machine, to
make StarPU use the performance models of the simulated machine even
on the desktop machine.
If the desktop machine does not have CUDA or OpenCL, StarPU is still able to
use SimGrid to simulate execution with CUDA/OpenCL devices, but the application
source code will probably disable the CUDA and OpenCL codelets in that
case. Since during SimGrid execution, the functions of the codelet are actually
not called by default, one can use dummy functions such as the following to
still permit CUDA or OpenCL execution.
\section SimulationExamples Simulation Examples
StarPU ships a few performance models for a couple of systems: \c attila,
\c mirage, \c idgraf, and \c sirocco. See Section \ref SimulatedBenchmarks for the details.
\section FakeSimulations Simulations On Fake Machines
It is possible to build fake machines which do not exist, by modifying the
platform file in <c>$STARPU_HOME/.starpu/sampling/bus/machine.platform.xml</c>
by hand: one can add more CPUs, add GPUs (but the performance model file has to
be extended as well), change the available GPU memory size, PCI memory bandwidth, etc.
\section TweakingSimulation Tweaking Simulation
The simulation can be tweaked, to be able to tune it between a very accurate
simulation and a very simple simulation (which is thus close to scheduling
theory results), see the \ref STARPU_SIMGRID_TRANSFER_COST, \ref STARPU_SIMGRID_CUDA_MALLOC_COST,
\ref STARPU_SIMGRID_CUDA_QUEUE_COST, \ref STARPU_SIMGRID_TASK_SUBMIT_COST,
\ref STARPU_SIMGRID_FETCHING_INPUT_COST and \ref STARPU_SIMGRID_SCHED_COST environment variables.
\section SimulationMPIApplications MPI Applications
StarPU-MPI applications can also be run in SimGrid mode. smpi currently requires
that StarPU be build statically only, so <c>--disable-shared</c> needs to be
passed to <c>./configure</c>.
The application needs to be compiled with \c smpicc, and run using the
<c>starpu_smpirun</c> script, for instance:
\verbatim
$ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mpi/tests/pingpong
\endverbatim
Where \c cluster.xml is a SimGrid-MPI platform description, and \c hostfile the
list of MPI nodes to be used. StarPU currently only supports homogeneous MPI
clusters: for each MPI node it will just replicate the architecture referred by
\ref STARPU_HOSTNAME.
So as to use FxT traces, libfxt also needs to be built statically, <b>and</b>
with dynamic linking flags, i.e. with
\verbatim
CFLAGS=-fPIC ./configure --enable-static
\endverbatim
\section SimulationDebuggingApplications Debugging Applications
By default, SimGrid uses its own implementation of threads, which prevents \c gdb
from being able to inspect stacks of all threads. To be able to fully debug an
application running with SimGrid, pass the <c>--cfg=contexts/factory:thread</c>
option to the application, to make SimGrid use system threads, which \c gdb will be
able to manipulate as usual.
It is also worth noting SimGrid 3.21's new parameter
<c>--cfg=simix/breakpoint</c> which allows to put a breakpoint at a precise
(deterministic!) timing of the execution. If for instance in an execution
trace we see that something odd is happening at time 19000ms, we can use
<c>--cfg=simix/breakpoint:19.000</c> and \c SIGTRAP will be raised at that point,
which will thus interrupt execution within \c gdb, allowing to inspect e.g.
scheduler state, etc.
\section SimulationMemoryUsage Memory Usage
Since kernels are not actually run and data transfers are not actually
performed, the data memory does not actually need to be allocated. This allows
for instance to simulate the execution of applications processing very big data
on a small laptop.
The application can for instance pass <c>1</c> (or whatever bogus pointer)
to starpu data registration functions, instead of allocating data. This will
however require the application to take care of not trying to access the data,
and will not work in MPI mode, which performs transfers.
Another way is to pass the \ref STARPU_MALLOC_SIMULATION_FOLDED flag to the
starpu_malloc_flags() function. This will make it allocate a memory area which
one can read/write, but optimized so that this does not actually consume
memory. Of course, the values read from such area will be bogus, but this allows
the application to keep e.g. data load, store, initialization as it is, and also
work in MPI mode.
Note however that notably Linux kernels refuse obvious memory overcommitting by
default, so a single allocation can typically not be bigger than the amount of
physical memory, see https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
This prevents for instance from allocating a single huge matrix. Allocating a
huge matrix in several tiles is not a problem, however. <c>sysctl
vm.overcommit_memory=1</c> can also be used to allow such overcommit.
Note however that this folding is done by remapping the same file several times,
and Linux kernels will also refuse to create too many memory areas. <c>sysctl
vm.max_map_count</c> can be used to check and change the default (65535). By
default, StarPU uses a 1MiB file, so it hopefully fits in the CPU cache. This
however limits the amount of such folded memory to a bit below 64GiB. The
\ref STARPU_MALLOC_SIMULATION_FOLD environment variable can be used to increase the
size of the file.
*/
|