File: Benchmark_O.test.md

package info (click to toggle)
swiftlang 6.0.3-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,519,992 kB
  • sloc: cpp: 9,107,863; ansic: 2,040,022; asm: 1,135,751; python: 296,500; objc: 82,456; f90: 60,502; lisp: 34,951; pascal: 19,946; sh: 18,133; perl: 7,482; ml: 4,937; javascript: 4,117; makefile: 3,840; awk: 3,535; xml: 914; fortran: 619; cs: 573; ruby: 573
file content (279 lines) | stat: -rw-r--r-- 10,223 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
<!--
REQUIRES: OS=macosx
REQUIRES: benchmark
REQUIRES: CMAKE_GENERATOR=Ninja
-->
# `Benchmark_O` Tests

The `Benchmark_O` binary is used directly from command line as well as a
subcomponent invoked from higher-level scripts (eg. [`Benchmark_Driver`][BD]).
These script therefore depend on the supported command line options and the
format of its console output. The following [`lit` tests][Testing] also serve
as a verification of this public API to prevent its accidental breakage.

[BD]: https://github.com/apple/swift/blob/main/benchmark/scripts/Benchmark_Driver
[Testing]: https://github.com/apple/swift/blob/main/docs/Testing.md

Note: Following tests use *Existential.* as an example of a benchmarks that are
excluded from the default "pre-commit" list because they are marked `skip` and
the default skip-tags (`unstable,skip`) will exclude them.  The *Ackermann* and
*AngryPhonebook* are alphabetically the first two benchmarks in the test suite
(used to verify running by index). If these assumptions change, the test must be
adapted.

## List Format
````
RUN: %Benchmark_O --list | %FileCheck %s \
RUN:                      --check-prefix LISTPRECOMMIT \
RUN:                      --check-prefix LISTTAGS
LISTPRECOMMIT: #,Test,[Tags]
LISTPRECOMMIT-NOT: Existential.
LISTPRECOMMIT: {{[0-9]+}},AngryPhonebook
LISTTAGS-SAME: ,[
LISTTAGS-NOT: TestsUtils.BenchmarkCategory.
LISTTAGS-SAME: String, api, validation
LISTTAGS-SAME: ]
````

Verify `Existential.` benchmarks are listed when skip-tags are explicitly empty
and that they are marked `skip`:

````
RUN: %Benchmark_O --list --skip-tags= | %FileCheck %s --check-prefix LISTALL
LISTALL: AngryPhonebook
LISTALL: Existential.
LISTALL-SAME: skip
````

## Benchmark Selection
The logic for filtering tests based on specified names, indices and tags
is shared between the default "run" and `--list` commands. It is tested on
the list command, which is much faster, because it runs no benchmarks.
It provides us with ability to do a "dry run".

Run benchmark by name (even if its tags match the skip-tags) or test number:

````
RUN: %Benchmark_O Existential.Mutating.Ref1 --list \
RUN:              | %FileCheck %s --check-prefix NAMEDSKIP
NAMEDSKIP: Existential.Mutating.Ref1

RUN: %Benchmark_O 1 --list | %FileCheck %s --check-prefix RUNBYNUMBER
RUNBYNUMBER: Ackermann
````

Composition of `tags` and `skip-tags`:

````
RUN: %Benchmark_O --list --tags=Dictionary,Array \
RUN:             | %FileCheck %s --check-prefix ANDTAGS
ANDTAGS: TwoSum
ANDTAGS-NOT: Array2D
ANDTAGS-NOT: DictionarySwap

RUN: %Benchmark_O --list --tags=algorithm --skip-tags=validation \
RUN:             | %FileCheck %s --check-prefix TAGSANDSKIPTAGS
TAGSANDSKIPTAGS: Ackermann
TAGSANDSKIPTAGS: DictOfArraysToArrayOfDicts
TAGSANDSKIPTAGS: Fibonacci
TAGSANDSKIPTAGS: RomanNumbers

RUN: %Benchmark_O --list --tags=algorithm \
RUN:              --skip-tags=validation,Dictionary,String \
RUN:             | %FileCheck %s --check-prefix ORSKIPTAGS
ORSKIPTAGS: Ackermann
ORSKIPTAGS-NOT: DictOfArraysToArrayOfDicts
ORSKIPTAGS: Fibonacci
ORSKIPTAGS-NOT: RomanNumbers
````

Alphabetic sorting of tests

````
RUN: %Benchmark_O --list \
RUN:             | %FileCheck %s --check-prefix ALPHASORT
ALPHASORT: COWArrayGuaranteedParameterOverhead
ALPHASORT: COWTree
ALPHASORT: ChainedFilterMap
ALPHASORT: Chars
ALPHASORT: FatCompactMap

````

Substring filters using + and - prefix

````
RUN: %Benchmark_O --list -.A +Angry -Small AngryPhonebook.ASCII2.Small \
RUN:             | %FileCheck %s --check-prefix FILTERS
FILTERS: AngryPhonebook.ASCII2.Small
FILTERS-NOT: AngryPhonebook.Armenian
FILTERS-NOT: AngryPhonebook.Cyrillic.Small
FILTERS: AngryPhonebook.Cyrillic
FILTERS: AngryPhonebook.Strasse
````

## Running Benchmarks
By default, each real benchmark execution takes about a second per sample.
To minimise the test time, multiple checks are combined into one run.

````
RUN: %Benchmark_O AngryPhonebook --num-iters=1 \
RUN:                             --sample-time=0.000001 --min-samples=7 \
RUN:              | %FileCheck %s --check-prefix NUMITERS1 \
RUN:                              --check-prefix LOGHEADER \
RUN:                              --check-prefix LOGBENCH
LOGHEADER-LABEL: #,TEST,SAMPLES,MIN(μs),MAX(μs),MEAN(μs),SD(μs),MEDIAN(μs)
LOGBENCH: {{[0-9]+}},
NUMITERS1: AngryPhonebook,7
NUMITERS1-NOT: 0,0,0,0,0
LOGBENCH-SAME: ,{{[0-9]+}},{{[0-9]+}},{{[0-9]+}},{{[0-9]+}},{{[0-9]+}}
````

### Reporting Quantiles
The default benchmark result reports statistics of a normal distribution —
mean and standard deviation. Unfortunately the samples from our benchmarks are
*not normally distributed*. To get a better picture of the underlying
probability distribution, we support reporting
[quantiles](https://en.wikipedia.org/wiki/Quantile).

````
RUN: %Benchmark_O 0 --quantile=4 | %FileCheck %s --check-prefix FIVENUMSUMMARY
FIVENUMSUMMARY: #,TEST,SAMPLES,MIN(μs),Q1(μs),Q2(μs),Q3(μs),MAX(μs)
RUN: %Benchmark_O 0 --quantile=20 | %FileCheck %s --check-prefix VENTILES
VENTILES: #,TEST,SAMPLES,MIN(μs),V1(μs),V2(μs),V3(μs),V4(μs),V5(μs),V6(μs),
VENTILES: V7(μs),V8(μs),V9(μs),VA(μs),VB(μs),VC(μs),VD(μs),VE(μs),VF(μs),VG(μs),
VENTILES: VH(μs),VI(μs),VJ(μs),MAX(μs)
````

### Reporting Measurement Metadata
The presence of optional argument `--meta`, controls logging of measurement
metadata at the end of the benchmark summary.

* PAGES – number of memory pages used
* ICS – number of involuntary context switches
* YIELD – number of voluntary yields

````
RUN: %Benchmark_O 0 --quantile=1 --meta | %FileCheck %s --check-prefix META
META: #,TEST,SAMPLES,MIN(μs),MAX(μs),PAGES,ICS,YIELD
RUN: %Benchmark_O 0 --quantile=1 --meta --memory \
RUN:              | %FileCheck %s --check-prefix MEMMETA
MEMMETA: #,TEST,SAMPLES,MIN(μs),MAX(μs),MAX_RSS(B),PAGES,ICS,YIELD
````

### Verbose Mode
Reports detailed information during measurement, including configuration
details, environmental statistics (memory used and number of context switches)
and all individual samples. We'll reuse this test to check arguments that
modify the reported columns: `--memory`, `--quantile` and `--delta` to end with
*one less* number in the benchmark summary, compared to normal format. Given
that we are taking only 2 samples, the MEDIAN and MAX will be the same number.
With the `--delta` option this means that 𝚫MAX is zero, so the penultimate
number will be omitted from the output, giving us 2 consecutive delimiters (,,).

````
RUN: %Benchmark_O 1 Ackermann 1 AngryPhonebook \
RUN:              --verbose --num-samples=2 --memory --quantile=2 --delta \
RUN:              | %FileCheck %s --check-prefix RUNJUSTONCE \
RUN:                              --check-prefix CONFIG \
RUN:                              --check-prefix LOGVERBOSE \
RUN:                              --check-prefix MEASUREENV \
RUN:                              --check-prefix LOGFORMAT \
RUN:                              --check-prefix YIELDCOUNT
CONFIG: NumSamples: 2
CONFIG: Tests Filter: ["1", "Ackermann", "1", "AngryPhonebook"]
CONFIG: Tests to run: Ackermann, AngryPhonebook
LOGFORMAT: #,TEST,SAMPLES,MIN(μs),𝚫MEDIAN,𝚫MAX,MAX_RSS(B)
LOGVERBOSE-LABEL: Running Ackermann
LOGVERBOSE: Collecting 2 samples.
LOGVERBOSE: Measuring with scale {{[0-9]+}}.
LOGVERBOSE: Sample 0,{{[0-9]+}}
LOGVERBOSE: Sample 1,{{[0-9]+}}
MEASUREENV: MAX_RSS {{[0-9]+}} - {{[0-9]+}} = {{[0-9]+}} ({{[0-9]+}} pages)
MEASUREENV: ICS {{[0-9]+}} - {{[0-9]+}} = {{[0-9]+}}
MEASUREENV: VCS {{[0-9]+}} - {{[0-9]+}} = {{[0-9]+}}
YIELDCOUNT: yieldCount 1
RUNJUSTONCE-LABEL: 1,Ackermann
RUNJUSTONCE-NOT: 1,Ackermann
LOGFORMAT: ,{{[0-9]+}},{{[0-9]+}},,{{[0-9]*}},{{[0-9]+}}
LOGVERBOSE-LABEL: Running AngryPhonebook
LOGVERBOSE: Collecting 2 samples.
````

Verify the specified delimiter is used when logging to console. The non-verbose
variant of this invocation is used from [`Benchmark_Driver`][BD] to get the list
of all tests. That's why it is *crucial* to tests this integration point.

````
RUN: %Benchmark_O --list --skip-tags= --delim=$'\t' --verbose \
RUN:              | %FileCheck %s --check-prefix LOGVERBOSEDELIM
LOGVERBOSEDELIM: Delimiter: "\t"
LOGVERBOSEDELIM: #	Test	[Tags]
````

## Error Handling

````
RUN: not %Benchmark_O --bogus 2>&1 \
RUN:              | %FileCheck %s --check-prefix ARGPARSE
ARGPARSE: error: unsupported argument '--bogus'

RUN: not %Benchmark_O --sample-time \
RUN:         2>&1 | %FileCheck %s --check-prefix NOVALUE
NOVALUE: error: missing value for '--sample-time'

RUN: not %Benchmark_O --sample-time= \
RUN:         2>&1 | %FileCheck %s --check-prefix EMPTYVAL
EMPTYVAL: error: missing value for '--sample-time'

RUN: not %Benchmark_O --sample-time=NaN \
RUN:         2>&1 | %FileCheck %s --check-prefix NANVALUE
NANVALUE: error: 'NaN' is not a valid 'Double' for '--sample-time'

RUN: not %Benchmark_O --num-iters \
RUN:         2>&1 | %FileCheck %s --check-prefix NUMITERS
NUMITERS: error: missing value for '--num-iters'

RUN: not %Benchmark_O --num-samples \
RUN:         2>&1 | %FileCheck %s --check-prefix NUMSAMPLES
NUMSAMPLES: error: missing value for '--num-samples'

RUN: not %Benchmark_O --sleep \
RUN:         2>&1 | %FileCheck %s --check-prefix SLEEP
SLEEP: error: missing value for '--sleep'

RUN: not %Benchmark_O --delim \
RUN:         2>&1 | %FileCheck %s --check-prefix DELIM
DELIM: error: missing value for '--delim'

RUN: not %Benchmark_O --tags=bogus \
RUN:         2>&1 | %FileCheck %s --check-prefix BADTAG
BADTAG: error: 'bogus' is not a valid 'BenchmarkCategory'

RUN: not %Benchmark_O --skip-tags=bogus \
RUN:         2>&1 | %FileCheck %s --check-prefix BADSKIPTAG
BADSKIPTAG: error: 'bogus' is not a valid 'BenchmarkCategory'

````

Measuring memory use of a test with our method is valid only for single test.

````
RUN: %Benchmark_O 1 2 --memory --list \
RUN:         2>&1 | %FileCheck %s --check-prefix WARNMEMORY
WARNMEMORY: warning:
````

## Usage

````
RUN: %Benchmark_O --help | %FileCheck %s --check-prefix OPTIONS
OPTIONS: usage: Benchmark_O [--argument=VALUE] [TEST [TEST ...]]
OPTIONS: optional arguments:
OPTIONS: --help
OPTIONS-SAME: show this help message and exit
OPTIONS: --verbose
OPTIONS: --delim
OPTIONS: --tags
OPTIONS: --list
````