1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212
|
.. -*- rst -*-
Copyright (c) 2022-2023 Nanook Consulting. All rights reserved.
Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved.
$COPYRIGHT$
Additional copyrights may follow
$HEADER$
.. The following line is included so that Sphinx won't complain
about this file not being directly included in some toctree
Overloading and Oversubscribing
===============================
This section explores the difference between the terms "overloading"
and "oversubscribing". Users are often confused by the difference
between these two scenarios. As such, this section provides a number
of scenarios to help illustrate the differences.
* ``--map-by :OVERSUBSCRIBE`` allow more processes on a node than
allocated
* ``--bind-to <object>:overload-allowed`` allows for binding more than
one process in relation to a CPU
The important thing to remember with *oversubscribing* is that it can
be defined separately from the actual number of CPUs on a node. This
allows the mapper to place more or fewer processes per node than
CPUs. By default, PRRTE uses cores to determine slots in the absence
of such information provided in the hostfile or by the resource
manager (except in the case of the ``--host`` as described in
the section on that command line option.
The important thing to remember with *overloading* is that it is
defined as binding more processes than CPUs. By default, PRRTE uses
cores as a means of counting the number of CPUs. However, the user can
adjust this. For example when using the ``:HWTCPUS`` qualifier to the
``--map-by`` option PRRTE will use hardware threads as a means of
counting the number of CPUs.
For the following examples consider a node with:
* 2 processor packages,
* 10 cores per package, and
* 8 hardware threads per core.
Consider the node from above with the hostfile below:
.. code::
$ cat myhostfile
node01 slots=32
node02 slots=32
The ``slots`` token tells PRRTE that it can place up to 32 processes
before *oversubscribing* the node.
If we run the following:
.. code::
prun --np 34 --hostfile myhostfile --map-by core --bind-to core hostname
It will return an error at the binding time indicating an
*overloading* scenario.
The mapping mechanism assigns 32 processes to ``node01`` matching the
``slots`` specification in the hostfile. The binding mechanism will bind
the first 20 processes to unique cores leaving it with 12 processes
that it cannot bind without overloading one of the cores (putting more
than one process on the core).
Using the ``overload-allowed`` qualifier to the ``--bind-to core``
option tells PRRTE that it may assign more than one process to a core.
If we run the following:
.. code::
prun --np 34 --hostfile myhostfile --map-by core --bind-to core:overload-allowed hostname
This will run correctly placing 32 processes on ``node01``, and 2
processes on ``node02``. On ``node01`` two processes are bound to
cores 0-11 accounting for the overloading of those cores.
Alternatively, we could use hardware threads to give binding a lower
level CPU to bind to without overloading.
If we run the following:
.. code::
prun --np 34 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
This will run correctly placing 32 processes on ``node01``, and 2
processes on ``node02``. On ``node01`` two processes are mapped to
cores 0-11 but bound to different hardware threads on those cores (the
logical first and second hardware thread). Thus no hardware threads
are overloaded at binding time.
In both of the examples above the node is not oversubscribed at
mapping time because the hostfile set the oversubscription limit to
``slots=32`` for each node. It is only after we exceed that limit that
PRRTE will throw an oversubscription error.
Consider next if we ran the following:
.. code::
prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
This will return an error at mapping time indicating an
oversubscription scenario. The mapping mechanism will assign all of
the available slots (64 across 2 nodes) and be left two processes to
map. The only way to map those processes is to exceed the number of
available slots putting the job into an oversubscription scenario.
You can force PRRTE to oversubscribe the nodes by using the
``:OVERSUBSCRIBE`` qualifier to the ``--map-by`` option as seen in the
example below:
.. code::
prun --np 66 --hostfile myhostfile \
--map-by core:HWTCPUS:OVERSUBSCRIBE --bind-to hwthread hostname
This will run correctly placing 34 processes on ``node01`` and 32 on
``node02``. Each process is bound to a unique hardware thread.
Overloading vs. Oversubscription: Package Example
-------------------------------------------------
Let's extend these examples by considering the package level.
Consider the same node as before, but with the hostfile below:
.. code::
$ cat myhostfile
node01 slots=22
node02 slots=22
The lowest level CPUs are "cores" and we have 20 total (10 per
package).
If we run:
.. code::
prun --np 20 --hostfile myhostfile --map-by package \
--bind-to package:REPORT hostname
Then 10 processes are mapped to each package, and bound at the package
level. This is not overloading since we have 10 CPUs (cores)
available in the package at the hardware level.
However, if we run:
.. code::
prun --np 21 --hostfile myhostfile --map-by package \
--bind-to package:REPORT hostname
Then 11 processes are mapped to the first package and 10 to the second
package. At binding time we have an overloading scenario because
there are only 10 CPUs (cores) available in the package at the
hardware level. So the first package is overloaded.
Overloading vs. Oversubscription: Hardware Threads Example
----------------------------------------------------------
Similarly, if we consider hardware threads.
Consider the same node as before, but with the hostfile below:
.. code::
$ cat myhostfile
node01 slots=165
node02 slots=165
The lowest level CPUs are "hwthreads" (because we are going to use the
``:HWTCPUS`` qualifier) and we have 160 total (80 per package).
If we re-run (from the package example) and add the ``:HWTCPUS``
qualifier:
.. code::
prun --np 21 --hostfile myhostfile --map-by package:HWTCPUS \
--bind-to package:REPORT hostname
Without the ``:HWTCPUS`` qualifier this would be overloading (as we
saw previously). The mapper places 11 processes on the first package
and 10 to the second package. The processes are still bound to the
package level. However, with the ``:HWTCPUS`` qualifier, it is not
overloading since we have 80 CPUs (hwthreads) available in the package
at the hardware level.
Alternatively, if we run:
.. code::
prun --np 161 --hostfile myhostfile --map-by package:HWTCPUS \
--bind-to package:REPORT hostname
Then 81 processes are mapped to the first package and 80 to the second
package. At binding time we have an overloading scenario because
there are only 80 CPUs (hwthreads) available in the package at the
hardware level. So the first package is overloaded.
|