File: rdt.md

package info (click to toggle)
golang-github-intel-goresctrl 0.3.0-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 2,064 kB
  • sloc: makefile: 19; sh: 15
file content (306 lines) | stat: -rw-r--r-- 11,333 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# Intel RDT (Resource Director Technology)

## Background

Intel® RDT provides capabilities for cache and memory allocation and
monitoring. In Linux system the functionality is exposed to the user space via
the [resctrl](https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt)
filesystem. Cache and memory allocation in RDT is handled by using resource
control groups or classes of service (CLOSes). Resource allocation is specified
on the group level and each task (process/thread) is assigned to one group. In
the context of goresctrl the term 'RDT class' is used instead of 'resource
control group' or 'CLOS'.

Goresctrl supports all available RDT technologies, i.e. L2 and L3 Cache
Allocation (CAT) with Code and Data Prioritization (CDP) and Memory Bandwidth
Allocation (MBA) plus Cache Monitoring (CMT) and Memory Bandwidth Monitoring
(MBM).

## API

The API is described in
[pkg.go.dev](https://pkg.go.dev/github.com/intel/goresctrl/pkg/rdt).

# Configuration

## RDT Classes

Goresctrl provides hiearachical approach for managing RDT resources. The RDT
configuration is a two-level hierarchy consisting of partitions and classes: a
set of partitions each having a set of classes.

### Partitions

A partition consists of available resources and classes that share the
resources. Resources include portions of caches (L2 and L3) and memory
bandwidth (MB). Cache partitioning is exclusive: cache portions of two
partitions are not allowed to overlap. However, by design of the underlying
technology, MB allocations are not exclusive. Thus, it is possible to assign
all partitions 100% of memory bandwidth, for example.

### Classes

Classes represent the actual RDT classes processes are assigned to. In
contrast to partitions, cache allocation between classes under a specific
partition may overlap (and they usually do).

Requirements for class specifications:

- Names of classes must be unique accross all partitions
- Total number of classes (CLOSes) supported by the underlying hardware must
  not be exceeded.
  - **NOTE:** resctrl root and possible groups managed outside goresctrl are also
  accounted against this limit.
- Reserved name `DEFAULT` or an empty string refer to the resctrl root

## Configuration format

```yaml
# Common options
options:
  l2:
    # Set to false if L2 CAT must be available (Default is true).
    optional: [true|false]
  l3:
    # Set to false if L3 CAT must be available (Default is true).
    optional: [true|false]
  mb:
    # Set to false if MBA must be available (Default is true).
    optional: [true|false]
partitions:
  <partition-name>:
    # L2 CAT configuration of the partition
    l2Allocation:
      <cache-ids>:
        # L2 allocation spec used when CDP is not enabled, or, if CDP is
        # enabled but separate code and data specs are not specified
        unified: <cat-allocation-spec>
        # L2 allocation spec for the code path when CDP is enabled (optional)
        code: <cat-allocation-spec>
        # L2 allocation spec for the data path when CDP is enabled (optional)
        data: <cat-allocation-spec>
    # L3 CAT configuration of the partition
    l3Allocation:
      <cache-ids>:
        # L3 allocation spec used when CDP is not enabled, or, if CDP is
        # enabled but separate code and data specs are not specified
        unified: <cat-allocation-spec>
        # L3 allocation spec for the code path when CDP is enabled (optional)
        code: <cat-allocation-spec>
        # L3 allocation spec for the data path when CDP is enabled (optional)
        data: <cat-allocation-spec>
    # MBA configuration of the partition
    mbAllocation:
      # MB allocation spec
      <cache-ids>: <mb-allocation-spec>
    classes:
      <class-name>:
        l2Allocation:
          <cache-ids>:
            # L2 allocation spec used when CDP is not enabled, or, if CDP is
            # enabled but separate code and data specs are not specified
            unified: <cat-allocation-spec>
            # L2 allocation spec for the code path when CDP is enabled (optional)
            code: <cat-allocation-spec>
            # L2 allocation spec for the data path when CDP is enabled (optional)
            data: <cat-allocation-spec>
        l3Allocation:
          <cache-ids>:
            # L3 allocation spec used when CDP is not enabled, or, if CDP is
            # enabled but separate code and data specs are not specified
            unified: <cat-allocation-spec>
            # L3 allocation spec for the code path when CDP is enabled (optional)
            code: <cat-allocation-spec>
            # L3 allocation spec for the data path when CDP is enabled (optional)
            data: <cat-allocation-spec>
        mbAllocation:
          # MB allocation spec of the class
          <cache-ids>: <mb-allocation-spec>

        # Settings for the Kubernetes helper functions. Have no effect on the resctrl
        # configuration and control interface.
        kubernetes:
          # Set to true to deny assigning to this class via container annotation
          denyContainerAnnotation: [true|false]
          # Set to true to deny assigning to this class via pod annotation
          denyPodAnnotation: [true|false]
```

| Field | Format | Example | Description |
| ----- | ------ | ------- | ----------- |
| `<partition-name>` | string | `exclusive` | Name of a higher level RDT partition.
| `<class-name>`     | string | `guaranteed` | Name of an RDT class, mapping to a directory in the resctrl fs. Reserved name `DEFAULT` or an empty string can be used to refer to the root class.
| `<cache-ids>`      | cpuset (string) | `0,2,4,8-11` | Set of cache ids. Special value 'all' denotes a default used for cache "all the reset".
| `<cat-allocation-spec>` | percentage (string) | `"60%"` | Cache allocation spec, may be specified as relative (percentage) or absolute (bitmask). An absolute bitmask must be contiguous.
                          | hex bitmask (string) | `"0xf0"` |
                          | bit numbers (string) | `"0-3"` |
| `<mb-allocation-spec>` | list of strings | `[50%, 1000MBps]` | Memory bandwidth allocation spec, separarate values for percentage and MBps based allocation. The *MBps* value is in effect when resctrl is mounted with `-o mba_MBps`.

## Short forms

The configuration accepts shortforms in order to allow easier and more readable
configuration of the common and simple use cases.

1. Separate unified/code/data specs can be omitted, when no separate CDP config
   is desired, i.e.

      ```
          <cache-ids>: <cat-allocation-spec>

      ```

   is equal to

      ```
          <cache-ids>:
            unified: <cat-allocation-spec>
      ```

1. `<cache-ids>` may be omitted if no cache id specific configuration (and no
   CDP config for CAT) is desired, i.e.

      ```
        l3Allocation: "60%"
        mbAllocation: ["50%"]
      ```

   is equal to

      ```
        l3Allocation:
          all:
            unified: "60%"
        mbAllocation:
          all: ["50%"]
      ```

## Examples

Below is a config snippet that would allocate (ca.) 60% of the L3 cache lines
exclusively to the guaranteed class. The remaining 40% L3 is for burstable and
besteffort, Besteffort getting only 50% of this. guaranteed class gets full
memory bandwidth whereas the other classes are throttled to 50%.

```yaml
options:
  l2:
    optional: true
  l3:
    optional: true
  mb:
    optional: true
partitions:
  exclusive:
    # Allocate 80% of all L2 cache IDs to the "exclusive" partition
    l2Allocation: "80%"
    # Allocate 60% of all L3 cache IDs to the "exclusive" partition
    l3Allocation: "60%"
    mbAllocation: ["100%"]
    classes:
      guaranteed:
        # Allocate all of the partitions cache lines and memory bandwidth to "guaranteed"
        l2Allocation: "100%"
        l3Allocation: "100%"
        # The class will get 100% by default
        #mbAllocation: ["100%"]
  shared:
    # Allocate 20% of L2 and 40% L3 cache IDs to the "shared" partition
    # These will NOT overlap with the cache lines allocated for "exclusive" partition
    l2Allocation: "20%"
    l3Allocation: "40%"
    mbAllocation: ["50%"]
    classes:
      burstable:
        # Allow "burstable" to use all cache lines of the "shared" partition
        l2Allocation: "100%"
        l3Allocation: "100%"
        # The class will get 100% by default
        #mbAllocation: ["100%"]
      besteffort:
        # Allow "besteffort" to use all L2 but only half of the L3 cache
        # lines of the "shared" partition.
        # These will overlap with those used by "burstable"
        l2Allocation: "100%"
        l3Allocation: "50%"
        # The class will get 100% by default
        #mbAllocation: ["100%"]
      DEFAULT:
        # Also configure the resctrl root that all processes in the system are
        # placed in by default
        l2Allocation: "50%"
        l3Allocation: "30%"
        # The class will get 100% by default
        #mbAllocation: ["100%"]
```

The configuration also supports far more fine-grained control, e.g. per
cache-ID configuration (i.e. different cache ids, or sockets, having different
allocation) and Code and Data Prioritization (CDP) allowing different cache
allocation for code and data paths.

```yaml
...
    partitions:
      exclusive:
        l3Allocation: "60%"
        mbAllocation: ["100%"]
        classes:
          # Automatically gets 100% of what was allocated for the partition
          guaranteed:
      shared:
        l3Allocation:
          # 'all' denotes the default and must be specified
          all: "40%"
          # Specific cache allocation for cache-ids 2 and 3
          2-3: "20%"
        mbAllocation: ["100%"]
        classes:
          burstable:
            l3Allocation:
              all:
                unified: "100%"
                code: "100%"
                data: "80%"
              mbAllocation:
                all: ["80%"]
                2-3: ["50%"]
...
...
```

In addition, if the hardware details are known, raw bitmasks or bit numbers
(`0x1f` or '0-4`) can be used instead of percentages in order to be able to
configure cache allocations exactly as required. The bits in this case
correspond to those in /sys/fs/resctrl/ bitmasks. You can also mix relative
(percentage) and absolute (bitmask) allocations. For cases where the resctrl
filesystem is mounted with `-o mba_MBps` Memory bandwidth must be specifed in
MBps.

```yaml
...
    partitions:
      exclusive:
        # Specify bitmask in bit numbers
        l3Allocation: "8-19"
        # MBps value takes effect when resctrl mount option mba_MBps is used
        mbAllocation: ["100%", "100000MBps"]
        classes:
          # Automatically gets 100% of what was allocated for the partition
          guaranteed:
      shared:
        # Explicit bitmask
        l3Allocation: "0xff"
        mbAllocation: ["50%", "2000MBps"]
        classes:
          # burstable gets 100% of what was allocated for the partition
          burstable:
          besteffort:
            l3Allocation: "50%"
            # besteffort gets 50% of the 50% (i.e. 25% of total) or 1000MBps
            mbAllocation: ["50%", "1000MBps"]
```

## Dynamic Configuration

RDT supports dynamic configuration i.e. the parameters of existing classes may
changed on-the-fly.