1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
|
/*
This file is part of MADNESS.
Copyright (C) 2015 Stony Brook University
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
For more information please contact:
Robert J. Harrison
Oak Ridge National Laboratory
One Bethel Valley Road
P.O. Box 2008, MS-6367
email: harrisonrj@ornl.gov
tel: 865-241-3937
fax: 865-572-0680
*/
/**
\file gstart_load_balance.dox
\brief Getting started with MADNESS load and memory balancing.
\addtogroup gstart_load_balance
\todo Verify that the code snippets here aren't stale. They're imported from a 2010 document.
Load and memory balancing is a critical issue on the current generation of shared and distributed memory computers. Many terascale and petascale computers have no virtual memory capabilities on the compute nodes, so memory management is very important.
Poor distribution of work (load imbalance) is the largest reason for inefficient parallel execution within MADNESS. Poor data distribution (data imbalance) contributes to load imbalance and also leads to out-of-memory problems due to one or more processes having too much data. Thus, we are interested in a uniform distribution of both work and data.
Many operations in MADNESS are entirely data driven (i.e., computation occurs in the processor that "owns" the data) since there is insufficient work to justify moving data between processes (e.g., computing the inner product between functions). However, a few expensive operations can have work shipped to other processors.
There are presently three load balancing mechanisms within MADNESS:
- static and driven by the distribution of data,
- dynamic via random assignment of work, and
- dynamic via work stealing (currently only in prototype).
.
Until the work stealing becomes production quality we must exploit the first two forms. The random work assignment is controlled by options in the \c FunctionDefaults class:
- `FunctionDefaults::set_apply_randomize(bool)` controls the use of randomization in applying integral (convolution) operators. It is typically beneficial when computing to medium/high precision.
- `FunctionDefaults::set_project_randomize(bool)` controls the use of randomization in projecting from an analytic form (i.e., C++) into the discontinuous spectral element basis. It is typically beneficial unless there is already a good static data distribution. Since these options are straightforward to enable, this example focuses on static data redistribution.
.
The process map (an instance of \c WorldDCPmapInterface) controls mapping of data to processors and it is actually quite easy to write your own (e.g., see \c WorldDCDefaultPmap or \c LevelPmap) that ensure uniform data distribution. However, you also seek to incorporate estimates of the computational cost into the distribution. The class \c LBDeuxPmap (deux since it is the second such class) in `trunk/src/madness/mra/lbdeux.h` does this by examining the functions you request and using provided weights to estimate the computational cost.
\todo I'm sure I'll raise a point about this later, but can we get rid of the "deux" in the load balancing classes? As I recall, "une" was deleted some time ago.
Communication costs are proportional to the number of broken links in the tree. Since some operations work in the scaling function basis, some in the multiwavelet basis, and some in non-standard form, there is an element of empiricism in getting best performance from most algorithms.
The example code in `trunk/src/examples/dataloadbal.cc` illustrates how the discussions in this section can be applied.
Previous: \ref gstart_io; Next: \ref gstart_think_madness
*/
|