1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373
|
<!--#include virtual="header.txt"-->
<!-- Copyright (C) 2013 Bull S. A. S.
Bull, Rue Jean Jaures, B.P.68, 78340, Les Clayes-sous-Bois. -->
<h1>Profiling Using HDF5 User Guide</h1>
<h2 id="contents">Contents<a class="slurm_link" href="#contents"></a></h2>
<ul>
<li><a href="#Overview">Overview</a></li>
<li><a href="#Administration">Administration</a></li>
<li><a href="#Profiling">Profiling Jobs</a></li>
<li><a href="#HDF5">HDF5</a></li>
<li><a href="#DataSeries">Data Structure</a></li>
</ul>
<h2 id="Overview">Overview<a class="slurm_link" ref="#Overview"></a></h2>
<p>The acct_gather_profile/hdf5 plugin allows Slurm to coordinate collecting
data on jobs it runs on a cluster that is more detailed than is practical to
include in its database. The data comes from periodically sampling various
performance data either collected by Slurm, the operating system, or
component software. The plugin will record the data from each source
as a <b>Time Series</b> and also accumulate totals for each statistic for
the job.</p>
<p>Time Series are energy data collected by an acct_gather_energy plugin,
I/O data from a network interface collected by an acct_gather_interconnect
plugin, I/O data from parallel file systems such as Lustre collected by an
acct_gather_filesystem plugin, and task performance data such as local disk I/O,
cpu consumption, and memory use from a jobacct_gather plugin.
Data from other sources may be added in the future.</p>
<p>The data is collected into a file on a shared file system for each step on
each allocated node of a job and then merged into an HDF5 file.
Individual files on a shared file system was chosen because it is possible
that the data is voluminous so solutions that pass data to the Slurm control
daemon via RPC may not scale to very large clusters or jobs with
many allocated nodes.</p>
<h2 id="Administration">Administration
<a class="slurm_link" href="#Administration"></a>
</h2>
<h3 id="shared">Shared File System<a class="slurm_link" href="#shared"></a></h3>
<div style="margin-left: 20px;">
<p>The HDF5 Profile Plugin requires a common shared file system on all
the compute nodes. While a job is running, the plugin writes a
file into this file system for each step of the job on each node. When
the job ends, the merge process is launched and the node-step files
are combined into one HDF5 file for the job.</p>
<p>The root of the directory structure is declared in the <b>ProfileHDF5Dir</b>
option in the acct_gather.conf file. The directory will be created by
Slurm if it doesn't exist. Each user will have
their own directory created in the ProfileHDF5Dir which contains
the HDF5 files. All the directories and files are created by the
SlurmdUser which is usually root. The user specific directories, as well
as the files inside, are chowned to the user running the job so they
can access the files. Since user root is usually creating these
files/directories a root squashed file system will not work for
the ProfileHDF5Dir.</p>
<p>Each user that creates a profile will have a subdirectory in the profile
directory that has read/write permission only for the user.</p>
</div>
<h3 id="config">Configuration parameters
<a class="slurm_link" href="#config"></a>
</h3>
<p><div style="margin-left: 20px;">
<p>The profile plugin is enabled in the
<a href="slurm.conf.html">slurm.conf</a> file and it is internally
configured in the
<a href="acct_gather.conf.html">acct_gather.conf</a> file.</p>
</div>
<div style="margin-left: 20px;">
<br>
<h3 id="slurm_conf">slurm.conf parameters
<a class="slurm_link" href="#slurm_conf"></a>
</h3>
<div style="margin-left: 20px;">
<dl>
<dt><b>AcctGatherProfileType</b>=acct_gather_profile/hdf5</dt>
<dd>Enables the HDF5 plugin.</dd>
<dt><b>JobAcctGatherFrequency</b>=<seconds></dt>
<dd>Sets the sampling frequency for data types.</dd>
</dl>
</div>
</div>
<div style="margin-left: 20px;">
<br>
<h3 id="acct_gather">acct_gather.conf parameters
<a class="slurm_link" href="#acct_gather"></a>
</h3>
<div style="margin-left: 20px;">
<p>These parameters are directly used by the HDF5 Profile Plugin.</p>
<dl>
<dt><b>ProfileHDF5Dir</b>=<path></dt>
<dd>This parameter is the path to the shared folder into which the
acct_gather_profile plugin will write detailed data as an HDF5 file.
The directory is assumed to be on a file system shared by the controller and
all compute nodes. This is a required parameter.</dd>
<dt><b>ProfileHDF5Default</b>=[options]</dt>
<dd>A comma-delimited list of data types to be collected for each job
submission. Use this option with caution. A node-step file will be created on
every node for every step of every job. They will not automatically be merged
into job files. (Even job files for large numbers of small jobs would fill the
file system.) This option is intended for test environments where you
might want to profile a series of jobs but do not want to have to
add the --profile option to the launch scripts.
The options are described below and in the man pages for acct_gather.conf,
srun, salloc and sbatch commands.</dd>
</dl>
</div>
</div>
<div style="margin-left: 20px;">
<br>
<h3 id="time">Time Series Control Parameters
<a class="slurm_link" href="#time"></a>
</h3>
<div style="margin-left: 20px;">
<p>Other plugins add time series data to the HDF5 collection. They typically
have a default polling frequency specified in slurm.conf in the
JobAcctGatherFrequency parameter. The polling frequency can be overridden
using the --acctg-freq
<a href="srun.html">srun</a> parameter.
They are both of the form task=sec,energy=sec,filesystem=sec,network=sec.<p>
<p>The IPMI energy plugin also needs the EnergyIPMIFrequency value set
in the acct_gather.conf file. This sets the rate at which the plugin samples
the external sensors. This value should be the same as the energy=sec in
either JobAcctGatherFrequency or --acctg-freq.</p>
<p>Note that the IPMI and profile sampling are not synchronous.
The profile sample simply takes the last available IPMI sample value.
If the profile energy sample is more frequent than the IPMI sample rate,
the IPMI value will be repeated. If the profile energy sample is greater
than the IPMI rate, IPMI values will be lost.</p>
<p>Also note that smallest effective IPMI (EnergyIPMIFrequency) sample rate
for 2013 era Intel processors is 3 seconds.</p>
</div>
</div>
<h2 id="Profiling">Profiling Jobs
<a class="slurm_link" href="#Profiling"></a>
</h2>
<h3 id="collection">Data Collection
<a class="slurm_link" href="#collection"></a>
</h3>
<p>The --profile option on salloc|sbatch|srun controls whether data is
collected and what type of data is collected. If --profile is not specified
no data collected unless the <B>ProfileHDF5Default</B>
option is used in acct_gather.conf. --profile on the command line overrides
any value specified in the configuration file.</p>
<div style="margin-left: 20px;">
<dl>
<dt><b>--profile</b>=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
</dt>
<dd>Enables detailed data collection by the acct_gather_profile plugin.
Detailed data are typically time-series that are stored in a HDF5 file for
the job.</dd>
<dd>
<DL>
<dt><B>All</B></dt>
<DD>All data types are collected. (Cannot be combined with other values.)
</DD>
<dt><B>None</B></dt>
<DD>No data types are collected. This is the default. (Cannot be
combined with other values.)
</DD>
<dt><B>Energy</B></dt>
<DD>Energy data is collected.</DD>
<dt><B>Filesystem</B></dt>
<DD>Filesystem data is collected. Currently only
Lustre filesystem is supported.</DD>
<dt><B>Network</B></dt>
<DD>Network (InfiniBand) data is collected.</DD>
<dt><B>Task</B></dt>
<DD>Task (I/O, Memory, ...) data is collected.</DD>
</DL>
</dd>
</dl>
</div>
<h3 id="consolidation">Data Consolidation
<a class="slurm_link" href="#consolidation"></a>
</h3>
<p>The node-step files are merged into one HDF5 file for the job using the
<a href="sh5util.html">sh5util</a>.</p>
<p>If the job is started with sbatch, the command line may added to the normal
launch script, For example:</p>
<pre>
sbatch -n1 -d$SLURM_JOB_ID --wrap="sh5util -j $SLURM_JOB_ID"
</pre>
<h3 id="extraction">Data Extraction
<a class="slurm_link" href="#extraction"></a>
</h3>
<p>The <a href="sh5util.html">sh5util</a> program can also be used to extract
specific data from the HDF5 file and write it in <i>comma separated value (csv)</i>
form for importation into other analysis tools such as spreadsheets.</p>
<h2 id="HDF5">HDF5<a class="slurm_link" href="#HDF5"></a></h2>
<p>HDF5 is a well known structured data set that allows heterogeneous but
related data to be stored in one file.
(.i.e. sections for energy statistics, network I/O, Task data, etc.)
Its internal structure resembles a
file system with <b>groups</b> being similar to <i>directories</i> and
<b>data sets</b> being similar to <i>files</i>. It also allows <b>attributes</b>
to be attached to groups to store application defined properties.</p>
<p>There are commodity programs, notably
<a href="http://www.hdfgroup.org/hdf-java-html/hdfview/index.html">
HDFView</a>, for viewing and manipulating these files.
<p>Below is a screen shot from HDFView expanding the job tree and showing the
attributes for a specific task.</p>
<br>
<img src="hdf5_task_attr.png" width="275" height="275" >
<h2 id="DataSeries">Data Structure
<a class="slurm_link" href="#DataSeries"></a>
</h2>
<table>
<tr>
<td><img src="hdf5_job_outline.png" width="205" height="570"></td>
<td style="vertical-align: top;">
<div style="margin-left: 5px;">
<p>In the job file, there will be a group for each <b>step</b> of the job.
Within each step, there will be a group for nodes, and a group for tasks.</p>
</div>
<ul>
<li>
The <b>nodes</b> group will have a group for each node in the step allocation.
For each node group, there is a sub-group for Time Series and another
for Totals.
<ul>
<li>
The <b>Time Series</b> group
contains a group/dataset containing the time series for each collector.
</li>
<li>
The <b>Totals</b> group contains a group/dataset that has corresponding
Minimum, Average, Maximum, and Sum Total for each item in the time series.
</li>
</ul>
<li>
The <b>Tasks</b> group will only contain a subgroup for each task.
It primarily contains an attribute stating the node on which the task was
executed. This set of groups is essentially a cross reference table.
</li>
</ul>
</td></tr>
</table>
<h3 id="energy">Energy Data<a class="slurm_link" href="#energy"></a></h3>
<p><b>AcctGatherEnergyType</b>=acct_gather_energy/ipmi<p>
is required in slurm.conf to collect energy data.
Appropriately set energy=freq in either JobAcctGatherFrequency in slurm.conf
or in --acctg-freq on the command line.
Also appropriately set EnergyIPMIFrequency in acct_gather.conf.</p>
<p>Each data sample in the Energy Time Series contains the following data items.
</p>
<DL>
<dt><B>Date Time</B></dt>
<DD>Time of day at which the data sample was taken. This can be used to
correlate activity with other sources such as logs.</DD>
<dt><B>Time</B></dt>
<DD>Elapsed time since the beginning of the step.</DD>
<dt><B>Power</B></dt>
<DD>Power consumption during the interval.</DD>
<dt><B>CPU Frequency</B></dt>
<DD>CPU Frequency at time of sample in kilohertz.</DD>
</DL>
<h3 id="filesystem">Filesystem Data
<a class="slurm_link" href="#filesystem"></a>
</h3>
<p><b>AcctGatherFilesystemType</b>=acct_gather_filesystem/lustre<p>
is required in slurm.conf to collect task data.
Appropriately set Filesystem=freq in either JobAcctGatherFrequency in slurm.conf
or in --acctg-freq on the command line.</p>
<p>Each data sample in the Filesystem Time Series contains the following data items.
</p>
<DL>
<dt><B>Date Time</B></dt>
<DD>Time of day at which the data sample was taken. This can be used to
correlate activity with other sources such as logs.</DD>
<dt><B>Time</B></dt>
<DD>Elapsed time since the beginning of the step.</DD>
<dt><B>Reads</B></dt>
<DD>Number of read operations.</DD>
<dt><B>Megabytes Read</B></dt>
<DD>Number of megabytes read.</DD>
<dt><B>Writes</B></dt>
<DD>Number of write operations.</DD>
<dt><B>Megabytes Write</B></dt>
<DD>Number of megabytes written.</DD>
</DL>
<h3 id="network">Network (Infiniband Data)
<a class="slurm_link" href="#network"></a>
</h3>
<p><b>AcctGatherInterconnectType</b>=acct_gather_interconnect/ofed<p>
is required in slurm.conf to collect task data.
Appropriately set network=freq in either JobAcctGatherFrequency in slurm.conf
or in --acctg-freq on the command line.</p>
<p>Each data sample in the Network Time Series contains the following
data items.</p>
<DL>
<dt><B>Date Time</B></dt>
<DD>Time of day at which the data sample was taken. This can be used to
correlate activity with other sources such as logs.</DD>
<dt><B>Time</B></dt>
<DD>Elapsed time since the beginning of the step.</DD>
<dt><B>Packets In</B></dt>
<DD>Number of packets coming in.</DD>
<dt><B>Megabytes Read</B></dt>
<DD>Number of megabytes coming in through the interface.</DD>
<dt><B>Packets Out</B></dt>
<DD>Number of packets going out.</DD>
<dt><B>Megabytes Write</B></dt>
<DD>Number of megabytes going out through the interface.</DD>
</DL>
<h3 id="task">Task Data<a class="slurm_link" href="#task"></a></h3>
<p><b>JobAcctGatherType</b>=jobacct_gather/linux<p>
is required in slurm.conf to collect task data.
Appropriately set task=freq in either JobAcctGatherFrequency in slurm.conf
or in --acctg-freq on the command line.</p>
<p>Each data sample in the Task Time Series contains the following data
items.</p>
<DL>
<dt><B>Date Time</B></dt>
<DD>Time of day at which the data sample was taken. This can be used to
correlate activity with other sources such as logs.</DD>
<dt><B>Time</B></dt>
<DD>Elapsed time since the beginning of the step.</DD>
<dt><B>CPU Frequency</B></dt>
<DD>CPU Frequency at time of sample.</DD>
<dt><B>CPU Time</B></dt>
<DD>Seconds of CPU time used during the sample.</DD>
<dt><B>CPU Utilization</B></dt>
<DD>CPU Utilization during the interval.</DD>
<dt><B>RSS</B></dt>
<DD>Value of RSS at time of sample.</DD>
<dt><B>VM Size</B></dt>
<DD>Value of VM Size at time of sample.</DD>
<dt><B>Pages</B></dt>
<DD>Pages used in sample.</DD>
<dt><B>Read Megabytes</B></dt>
<DD>Number of megabytes read from local disk.</DD>
<dt><B>Write Megabytes</B></dt>
<DD>Number of megabytes written to local disk.</DD>
</DL>
<p style="text-align:center;">Last modified 17 October 2022</p>
<!--#include virtual="footer.txt"-->
|