File: tres.shtml

package info (click to toggle)
slurm-wlm 24.11.5-4
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 51,508 kB
  • sloc: ansic: 529,598; exp: 64,795; python: 17,051; sh: 10,365; javascript: 6,528; makefile: 4,116; perl: 3,762; pascal: 131
file content (184 lines) | stat: -rw-r--r-- 6,761 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
<!--#include virtual="header.txt"-->

<h1>Trackable RESources (TRES)</h1>

<p>A TRES is a resource that can be tracked for usage or used to enforce
  limits against.  A TRES is a combination of a Type and a Name.
  Types are predefined.

Current TRES Types are:
</p>
<ul>
<li>BB (burst buffers)</li>
<li>Billing</li>
<li>CPU</li>
<li>Energy</li>
<li>FS (filesystem)</li>
<li>GRES</li>
<li>IC (interconnect)</li>
<li>License</li>
<li>Mem (Memory)</li>
<li>Node</li>
<li>Pages</li>
<li>VMem (Virtual Memory/Size)</li>
</ul>

<p>
  The Billing TRES is calculated from a partition's TRESBillingWeights. Though
  TRES weights on a partition may be defined as doubles, the Billing TRES values
  for a job are stored as integers. This is not the case when calculating a
  job's fairshare where the value is treated as a double.
</p>

<p>
  Valid 'FS' TRES are 'disk' (local disk) and 'lustre'.  These are primarily
  there for reporting usage, not limiting access.
</p>

<p>
  Valid 'IC' TRES is 'ofed'.  These are primarily there for reporting usage, not
  limiting access.
</p>

<h2 id="conf">slurm.conf settings<a class="slurm_link" href="#conf"></a></h2>
<ul>
<li><b>AccountingStorageTRES</b>
<p>Used to define which TRES are
  to be tracked on the system. By default Billing, CPU, Energy, Memory, Node,
  FS/Disk, Pages and VMem are tracked. These default TRES cannot be disabled,
  but only appended to. The following example:
</p>
  <pre>AccountingStorageTRES=gres/gpu,license/iop1</pre>
<p>
  will track billing, cpu, energy, memory, nodes, fs/disk, pages and vmem along
  with a GRES called gpu, as well as a license called iop1. Whenever these
  resources are used on the cluster they are recorded. TRES are automatically
  set up in the database on the start of the slurmctld.
</p>

<p> The TRES that require associated names are BB, GRES, and
  License.  As seen in the above example, GRES and License are typically
  different on each system.  The BB TRES is named the same as
  the burst buffer plugin being used. In the above example we are using the
  <i>Cray</i> burst buffer plugin.
</p>

<p> When including a specific GRES with a subtype, it is also recommended to
include its generic type, otherwise a request with only the generic one won't
be accounted for. For example, if we want to account for gres/gpu:tesla,
we would also include gres/gpu for accounting gpus in requests like
<i>srun --gres=gpu:1</i>.
</p>
<pre>AccountingStorageTRES=gres/gpu,gres/gpu:tesla</pre>
</li>

<p><b>NOTE</b>: Setting gres/gpu will also set gres/gpumem and gres/gpuutil.
gres/gpumem and gres/gpuutil can be set individually when gres/gpu is not set.
</p>

<li><b>PriorityWeightTRES</b>
<p>A comma separated list of TRES Types and weights that sets the
  degree that each TRES Type contributes to the job's priority.</p>
<pre>PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000</pre>

<p>Applicable only if PriorityType=priority/multifactor and if
AccountingStorageTRES is configured with each TRES Type.
The default values are 0.</p>

<p>The Billing TRES is not available for priority calculations because the
number isn't generated until after the job has been allocated resources &mdash;
since the number can change for different partitions.<p>
</li>

<li><b>TRESBillingWeights</b>
<p>For each partition this option is used to define the billing
  weights of each TRES type that will be used in calculating the usage
  of a job.
</p>

<p>Billing weights are specified as a comma-separated list of
<i>TRES=Weight</i> pairs.
</p>

<p>Any TRES Type is available for billing. Note that the base unit for memory
and burst buffers is megabytes.
</p>

<p>By default the billing of TRES is calculated as the sum of all TRES types
multiplied by their corresponding billing weight.
</p>

<p>The weighted amount of a resource can be adjusted by adding a suffix of
K,M,G,T or P after the billing weight. For example, a memory weight of "mem=.25"
on a job allocated 8GB will be billed 2048 (8192MB *.25) units. A memory weight
of "mem=.25G" on the same job will be billed 2 (8192MB * (.25/1024)) units.
</p>

<p>When a job is allocated 1 CPU and 8 GB of memory on a partition configured
with:
</p>

<pre>TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0,license/licA=1.5"</pre>

<p>the billable TRES will be:
</p>
<pre>(1*1.0) + (8*0.25) + (0*2.0) + (0*1.5) = 3.0</pre>

<p>If <i>PriorityFlags=MAX_TRES</i> is configured, the billable TRES is
  calculated as the MAX of individual TRESs on a node (e.g. cpus, mem,
  gres) plus the sum of all global TRESs (e.g. licenses). Using the
  same example above, the billable TRES will be:
</p>
<pre>MAX(1*1.0, 8*0.25, 0*2.0) + (0*1.5) = 2.0</pre>

<p>If TRESBillingWeights is not defined then the job is billed against the total
number of allocated CPUs.
</p>

<p><b>NOTE</b>: TRESBillingWeights is only used when calculating fairshare and
doesn't affect job priority directly as it is currently not used for the size of
the job. If you want TRESs to play a role in the job's priority then refer to
the PriorityWeightTRES option.
</p>

<p><b>NOTE</b>: As with PriorityWeightTRES only TRES defined in
  AccountingStorageTRES are available for TRESBillingWeights.
</p>

<p><b>NOTE</b>: Jobs can be limited based off of the calculated TRES billing
value. See <a href="resource_limits.html">Resource Limits</a> documentation for
more information.
</p>

<p><b>NOTE</b>: If a Billing TRES is defined as a weight, it is ignored.</p>
</li>

</ul>

<h2 id="sacct">sacct<a class="slurm_link" href="#sacct"></a></h2>
<p>sacct can be used to view the TRES of each job by adding "tres" to the
  --format option.
</p>

<h2 id="sacctmgr">sacctmgr<a class="slurm_link" href="#sacctmgr"></a></h2>
<p>sacctmgr is used to view the various TRES available globally in the
  system. <i>sacctmgr show tres</i> will do this.
</p>

<h2 id="sreport">sreport<a class="slurm_link" href="#sreport"></a></h2>
<p>sreport reports on different TRES. Simply using the comma separated input
  option <i>--tres=</i> will have sreport generate reports available
  for the requested TRES types.  More information about these reports
  can be found on the <a href="sreport.html">sreport manpage</a>.
</p>
<p>
  In <i>sreport</i>, the "Reported" Billing TRES is calculated from the largest
  Billing TRES of each node multiplied by the time frame. For example, if a node
  is part of multiple partitions and each has a different TRESBillingWeights
  defined the Billing TRES for the node will be the highest of the partitions.
  If TRESBillingWeights is not defined on any partition for a node then the
  Billing TRES will be equal to the number of CPUs on the node.
</p>
<p style="text-align:center;">Last modified 16 August 2024</p>

<!--#include virtual="footer.txt"-->