File: debugging-allocations.md

package info (click to toggle)
swiftlang 6.0.3-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,519,992 kB
  • sloc: cpp: 9,107,863; ansic: 2,040,022; asm: 1,135,751; python: 296,500; objc: 82,456; f90: 60,502; lisp: 34,951; pascal: 19,946; sh: 18,133; perl: 7,482; ml: 4,937; javascript: 4,117; makefile: 3,840; awk: 3,535; xml: 914; fortran: 619; cs: 573; ruby: 573
file content (435 lines) | stat: -rw-r--r-- 30,743 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
# How to debug allocation regressions

Usually, your journey will start by detecting that an allocation test regressed. So let's first start by looking at how to run the allocation counter tests.

## Running allocation counter tests

The allocation counter tests are usually run by NIO's integration test framework which lives in `IntegrationTests/`. In there, you first want to identify the test suite and test case that runs all the allocation counter tests. For the `swift-nio` repository that is

    IntegrationTests/tests_04_performance/test_01_allocation_counts.sh

Next to the test case (`test_01_allocation_counts.sh`) that runs the allocation counter tests should be a resource directory which contains the individual allocation counter tests. For `swift-nio` that is

    IntegrationTests/tests_04_performance/test_01_resources

From here on, this document expects that you are within this directory, so you might want to start with

    cd IntegrationTests/tests_04_performance/test_01_resources/

To run all the allocation tests, simply invoke the runner script in there

    ./run-nio-alloc-counter-tests.sh

The invocation of this script will take a while because it has to build all of SwiftNIO in release mode first, then it compiles the integration tests and runs them all, multiple times...

### Understanding the output of the allocation counter tests

The output of this script will look something like

```
- /Users/johannes/devel/swift-nio/IntegrationTests/tests_04_performance/test_01_resources/test_future_lots_of_callbacks.swift
test_future_lots_of_callbacks.remaining_allocations: 0
test_future_lots_of_callbacks.total_allocations: 75001
test_future_lots_of_callbacks.total_allocated_bytes: 4138056
DEBUG: [["remaining_allocations": 0, "total_allocations": 75001, "total_allocated_bytes": 4138056], ["total_allocations": 75001, "remaining_allocations": 0, "total_allocated_bytes": 4138056], ["total_allocated_bytes": 4138056, "total_allocations": 75001, "remaining_allocations": 0], ["total_allocated_bytes": 4138056, "total_allocations": 75001, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocations": 75001, "total_allocated_bytes": 4138056], ["total_allocations": 75001, "remaining_allocations": 0, "total_allocated_bytes": 4138056], ["total_allocated_bytes": 4138056, "total_allocations": 75001, "remaining_allocations": 0], ["total_allocations": 75001, "total_allocated_bytes": 4138056, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocations": 75001, "total_allocated_bytes": 4138056], ["total_allocations": 75001, "total_allocated_bytes": 4138056, "remaining_allocations": 0]]
```

with this kind of block repeated for each allocation counter test. Let's go through and understand each line. The first line is the name of the specific allocation test. The most relevant part is the file name (`test_future_lots_of_callbacks.swift`). For this test, we seem to be testing how many allocations futures with many callbacks are doing.

```
- /Users/johannes/devel/swift-nio/IntegrationTests/tests_04_performance/test_01_resources/test_future_lots_of_callbacks.swift
```

Next, we see

```
test_future_lots_of_callbacks.remaining_allocations: 0
test_future_lots_of_callbacks.total_allocations: 75001
test_future_lots_of_callbacks.total_allocated_bytes: 4138056
```

which are the aggregate values. The first line says `remaining_allocations: 0`, which means that this allocation test didn't leak, which is good! Then, we see

    test_future_lots_of_callbacks.total_allocations: 75001

which means that by the end of the test, we saw 75001 allocations in total. As a rule, we usually run the workload of every allocation test 1000 times. That means you want to divide 75001 by the 1000 runs to get the number of allocations per run. In other words: each iteration of this allocation test allocated 75 times. The extra one allocation is just some noise that we have to ignore: these are usually inserted by operations in the Swift runtime that need to initialize some state. In many (especially multi-threaded test cases) there is some noise, which is why we run the workload repeatedly, which makes it easy to tell the signal from the noise.

Finally, we see

    test_future_lots_of_callbacks.total_allocated_bytes: 4138056

Which is the total number of bytes allocated by this test.


Last of all, we see

```
DEBUG: [["remaining_allocations": 0, "total_allocations": 75001, "total_allocated_bytes": 4138056], ["total_allocations": 75001, "remaining_allocations": 0, "total_allocated_bytes": 4138056], ["total_allocated_bytes": 4138056, "total_allocations": 75001, "remaining_allocations": 0], ["total_allocated_bytes": 4138056, "total_allocations": 75001, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocations": 75001, "total_allocated_bytes": 4138056], ["total_allocations": 75001, "remaining_allocations": 0, "total_allocated_bytes": 4138056], ["total_allocated_bytes": 4138056, "total_allocations": 75001, "remaining_allocations": 0], ["total_allocations": 75001, "total_allocated_bytes": 4138056, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocations": 75001, "total_allocated_bytes": 4138056], ["total_allocations": 75001, "total_allocated_bytes": 4138056, "remaining_allocations": 0]]
```

which is literally just the exact numbers of each of the 10 runs we considered. Note, the lines are prefixed with `DEBUG:` to make analysis of the output easier, it does _not_ mean we're running in debug mode. If you see a lot of fluctuation between the runs, then the allocation test doesn't allocate deterministically which would be something you want to fix. In the example above, we're totally stable to the last byte of allocated memory, so we have nothing to worry about with this test.

## Important notes

The exact allocation counts are _not_ directly comparable between different operating systems and even different versions of the same operating system. Every new Swift version will also introduce differences (hopefully decreasing over time).

This means that if you get hit by a CI failure, you will have to first reproduce the regression locally by running the allocation counter tests once on `main` and then again on your failing branch.

## Debugging a regression in a particular test

Because it takes quite a while to run the whole allocation counter test suite, there are some shortcuts you definitely want to know about. First, let's have a look at how we can run just single test.


### Running a single test

Running a single test is very straightforward, pass the test's `.swift` file to the runner script. For example:

    ./run-nio-alloc-counter-tests.sh test_future_lots_of_callbacks.swift

You will notice that it now runs faster but also that it still compiles SwiftNIO for every single time you run the program. So that's still rather slow. If you look closely at the output you will notice that it prints messages such as

```
Fetching /private/tmp/.nio_alloc_counter_tests_5jMMhk/swift-nio
Fetching /private/tmp/.nio_alloc_counter_tests_5jMMhk/HookedFunctions
Fetching /private/tmp/.nio_alloc_counter_tests_5jMMhk/AtomicCounter
```

If you change your working directory to common directory above (`/private/tmp/.nio_alloc_counter_tests_5jMMhk`), then you will find a regular Swift package there and you can just re-run the allocation counter tests using

```
cd /tmp/.nio_alloc_counter_tests_5jMMhk
swift run -c release
```

which should run very quickly. But _do not_ forget to pass `-c release`, otherwise the results will be meaningless.

### Debugging with other tools

Often, you will want to use external tools such as Instruments or `dtrace` to analyse why a performance test regressed. Unfortunately, that is a little harder than you might expect because to implement the allocation tests, the allocation counter test framework actually [hijacks](https://github.com/apple/swift-nio/blob/main/IntegrationTests/tests_04_performance/test_01_resources/README.md) functions like `malloc` and `free`. The counting of allocations happens in the hijacked or 'hooked' functions.
To use an external memory analysis tool, it's best to not do the hooking and fortunately it's very straightforward to switch the hooking off.

To disable the hooking, run the runner script with `-- -n` followed by the file name of the test you would like to debug. For example

    ./run-nio-alloc-counter-tests.sh -- -n test_future_lots_of_callbacks.swift

When you run this, you will notice that every allocation counter test now outputs

```
Fetching /private/tmp/.nio_alloc_counter_tests_0UDNXS/HookedFunctions
Fetching /private/tmp/.nio_alloc_counter_tests_0UDNXS/swift-nio
Fetching /private/tmp/.nio_alloc_counter_tests_0UDNXS/AtomicCounter
[...]
test_future_lots_of_callbacks.total_allocations: 0
test_future_lots_of_callbacks.remaining_allocations: 0
test_future_lots_of_callbacks.total_allocated_bytes: 0
DEBUG: [["total_allocations": 0, "remaining_allocations": 0, "total_allocated_bytes": 0], ["remaining_allocations": 0, "total_allocations": 0, "total_allocated_bytes": 0], ["total_allocations": 0, "total_allocated_bytes": 0, "remaining_allocations": 0], ["total_allocated_bytes": 0, "total_allocations": 0, "remaining_allocations": 0], ["total_allocations": 0, "total_allocated_bytes": 0, "remaining_allocations": 0], ["total_allocated_bytes": 0, "remaining_allocations": 0, "total_allocations": 0], ["total_allocations": 0, "total_allocated_bytes": 0, "remaining_allocations": 0], ["total_allocated_bytes": 0, "remaining_allocations": 0, "total_allocations": 0], ["remaining_allocations": 0, "total_allocated_bytes": 0, "total_allocations": 0], ["total_allocated_bytes": 0, "total_allocations": 0, "remaining_allocations": 0]]
```

See all the zeroes? Those are all zero because the hooking is disabled. In this mode the output itself is not very useful but you can now switch to using other tools like Intruments or `dtrace` to debug your allocations.

As explained before, you can change your working directory in the temporary directory (for example `cd /private/tmp/.nio_alloc_counter_tests_0UDNXS`) and invoke `swift run -c release` from there to run the allocation counter benchmark. Note that if you disabled hooking (by passing `-- -n`) it will still print only zeroes.

### Debugging with Instruments

1. Start Instruments
2. Choose the "Allocations" instrument
3. Tap the executable selector in the toolbar
4. Click "Choose Target..."
5. Tick "Show Hidden Files"
6. Press Cmd+Shift+G
7. Enter the temporary directory path of a non-hooked version of the allocation counter tests (for example `/private/tmp/.nio_alloc_counter_tests_0UDNXS`)
8. Select `.build/release/`
9. Find the binary that is named like your allocation counter test (eg. `test_lots_of_future_callbacks`)
10. Click "Choose"
11. Press the record button

Now you should have a full Instruments allocations trace of the test run. To make sense of what's going on, switch the allocation lifespan from "Created & Persisted" to "Created & Destroyed". After that you should be able to see the type of the allocation and how many times it got allocated. Clicking on the little arrow next to the type name will reveal each individual allocation of this type and the responsible stack trace.

### Debugging with `malloc-aggregation.d` (macOS only)

SwiftNIO ships a `dtrace` script called `dev/malloc-aggregation.d` which can give you an aggregation of all allocations by stack trace. This means that you will see the stack trace of each allocation that happened and how many times this particular stack trace allocated.

Here's an example of how to use this script:

```
sudo ~/path/to/swift-nio/dev/malloc-aggregation.d -c .build/release/test_future_lots_of_callbacks
```

The output will look something like

```
[...]
              libsystem_malloc.dylib`malloc
              libswiftCore.dylib`swift_slowAlloc+0x19
              libswiftCore.dylib`swift_allocObject+0x27
              test_1_reqs_1000_conn`SelectableEventLoop.run()+0x231
              test_1_reqs_1000_conn`closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:selectorFactory:initializer:)+0x106
              test_1_reqs_1000_conn`partial apply for closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:selectorFactory:initializer:)+0x2d
              test_1_reqs_1000_conn`thunk for @escaping @callee_guaranteed (@guaranteed NIOThread) -> ()+0xf
              test_1_reqs_1000_conn`partial apply for thunk for @escaping @callee_guaranteed (@guaranteed NIOThread) -> ()+0x11
              test_1_reqs_1000_conn`closure #1 in static ThreadOpsPosix.run(handle:args:detachThread:)+0x1e4
              libsystem_pthread.dylib`_pthread_start+0x94
              libsystem_pthread.dylib`thread_start+0xf
           231000

```

repeated many times. Each block means that the specific stack trace is responsible for some number of allocations: in the example above, `231000` allocations. Remember that the allocation counter tests run each test 11 times (1 warm up run and 10 measured runs), so this means that in each test run, the above stack trace allocated 21000 times. Also remember that we usually run the workload we want to test 1000 times. That means that in the real world, the above stack trace accounts for 21 out of the total number of allocations (231000 allocations / 11 runs / 1000 executions per run = 21).

When you are looking at allocations in the setup of the test the numbers may be split into one allocation set of 10000 and another of 1000 - for measured code vs warm up run.

The output from `malloc-aggreation.d` can also be diffed using the `stackdiff-dtrace.py` script. This can be helpful to track down where additional allocations were made. The `stdout` from `malloc-aggregation.d` for the two runs to compare should be written to files, and then passsed to `stackdiff-dtrace.py`:

```bash
~/path/to/swift-nio/dev/stackdiff-dtrace.py stack_aggregation.old stack_aggregation.new
```

Where `stack_aggregation.old` and `stack_aggregation.new` are the two outputs from `malloc-aggregation.d` to compare. The script will output the aggregated stacks which appear: 

- only in `stack_aggregation.old`,
- only in `stack_aggregation.new`, and
- any differences in numbers for stacks that appear in both.

Similar stack traces will be aggregated into a headline number for matching but where different the full stack traces which made up the collection will be printed indented.

Allocations of less than 1000 in number are ignored - as the tests all run multiples of 1000 runs any values below this are almost certainly from setup in libraries we are using rather than as a consequence of our measured run.

Let's look at some output.

```
### only in AFTER
11000
libsystem_malloc.dylib`malloc
libswiftCore.dylib`swift_slowAlloc
libswiftCore.dylib`swift_allocObject
test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR
test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TR
test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA
test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA.18
test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR

    11000
        libsystem_malloc.dylib`malloc
        libswiftCore.dylib`swift_slowAlloc+0x19
        libswiftCore.dylib`swift_allocObject+0x27
        test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR+0x54
        test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TR+0x20
        test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA+0x11
        test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA.18+0x9
        test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR+0x2e
        test_1_reqs_1000_conn`specialized applyNext #1 () in ChannelOptions.Storage.applyAllChannelOptions(to:)+0x1a9
        test_1_reqs_1000_conn`closure #2 in ServerBootstrap.bind0(makeServerChannel:_:)+0xf8
        test_1_reqs_1000_conn`partial apply for closure #2 in ServerBootstrap.bind0(makeServerChannel:_:)+0x39
        test_1_reqs_1000_conn`partial apply for thunk for @escaping @callee_guaranteed () -> (@owned EventLoopFuture<Channel>, @error @owned Error)+0x14
        test_1_reqs_1000_conn`thunk for @escaping @callee_guaranteed () -> (@owned EventLoopFuture<Channel>, @error @owned Error)partial apply+0x9
        test_1_reqs_1000_conn`closure #1 in EventLoop.submit<A>(_:)+0x3c
        test_1_reqs_1000_conn`partial apply for thunk for @escaping @callee_guaranteed () -> ()+0x11
        test_1_reqs_1000_conn`partial apply for thunk for @escaping @callee_guaranteed () -> (@out ())+0x11
        test_1_reqs_1000_conn`partial apply for closure #3 in SelectableEventLoop.run()+0x11
        test_1_reqs_1000_conn`thunk for @callee_guaranteed () -> (@error @owned Error)+0xc
        test_1_reqs_1000_conn`partial apply for thunk for @callee_guaranteed () -> (@error @owned Error)+0x11
        test_1_reqs_1000_conn`thunk for @callee_guaranteed () -> (@error @owned Error)partial apply+0x9
[...]

### only in BEFORE
11000
libsystem_malloc.dylib`malloc
libswiftCore.dylib`swift_slowAlloc
libswiftCore.dylib`swift_allocObject
test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR
test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TR
test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA
test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA.33
test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR

    11000
        libsystem_malloc.dylib`malloc
        libswiftCore.dylib`swift_slowAlloc+0x19
        libswiftCore.dylib`swift_allocObject+0x27
        test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR+0x54
        test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TR+0x20
        test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA+0x11
        test_1_reqs_1000_conn`$s3NIO7Channel_pypypAA15EventLoopFutureCyytGIegnno_Ieggo_AaB_pxq_q0_r1_lyypypAEIsegnnr_Iegnr_TRTA.33+0x9
        test_1_reqs_1000_conn`$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR+0x2e
        test_1_reqs_1000_conn`specialized applyNext #1 () in ChannelOptions.Storage.applyAllChannelOptions(to:)+0x1a9
        test_1_reqs_1000_conn`closure #2 in ServerBootstrap.bind0(makeServerChannel:_:)+0xf8
        test_1_reqs_1000_conn`partial apply for closure #2 in ServerBootstrap.bind0(makeServerChannel:_:)+0x39
        test_1_reqs_1000_conn`partial apply for thunk for @escaping @callee_guaranteed () -> (@owned EventLoopFuture<Channel>, @error @owned Error)+0x14
        test_1_reqs_1000_conn`thunk for @escaping @callee_guaranteed () -> (@owned EventLoopFuture<Channel>, @error @owned Error)partial apply+0x9
        test_1_reqs_1000_conn`closure #1 in EventLoop.submit<A>(_:)+0x3c
        test_1_reqs_1000_conn`partial apply for thunk for @escaping @callee_guaranteed () -> ()+0x11
        test_1_reqs_1000_conn`partial apply for thunk for @escaping @callee_guaranteed () -> (@out ())+0x11
        test_1_reqs_1000_conn`partial apply for closure #3 in SelectableEventLoop.run()+0x11
        test_1_reqs_1000_conn`thunk for @callee_guaranteed () -> (@error @owned Error)+0xc
        test_1_reqs_1000_conn`partial apply for thunk for @callee_guaranteed () -> (@error @owned Error)+0x11
        test_1_reqs_1000_conn`thunk for @callee_guaranteed () -> (@error @owned Error)partial apply+0x9
[...]
```

These stacks are only *slighly* different; differing only in the name similar to `$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR`. They're otherwise the same so we can ignore them and look at more output.

```
### only in AFTER
[... (previously dismissed stacktrace)]

2000
libsystem_malloc.dylib`malloc
libswiftCore.dylib`swift_slowAlloc
libswiftCore.dylib`swift_allocObject
test_1_reqs_1000_conn`doRequests(group:number:)
test_1_reqs_1000_conn`closure #1 in run(identifier:)
test_1_reqs_1000_conn`partial apply for closure #1 in measureOne #1 (throwAway:_:) in measureAll(_:)
test_1_reqs_1000_conn`partial apply for thunk for @callee_guaranteed () -> (@error @owned Error)
test_1_reqs_1000_conn`thunk for @callee_guaranteed () -> (@error @owned Error)partial apply

    1000
        libsystem_malloc.dylib`malloc
        libswiftCore.dylib`swift_slowAlloc+0x19
        libswiftCore.dylib`swift_allocObject+0x27
        test_1_reqs_1000_conn`doRequests(group:number:)+0x66
[...]
        
    1000
        libsystem_malloc.dylib`malloc
        libswiftCore.dylib`swift_slowAlloc+0x19
        libswiftCore.dylib`swift_allocObject+0x27
        test_1_reqs_1000_conn`doRequests(group:number:)+0x1c4
[...]

### only in BEFORE
[... (previously dismissed stacktrace)]

### different numbers
before: 10000, after: 20000
  AFTER
    10000
        libsystem_malloc.dylib`malloc
        libswiftCore.dylib`swift_slowAlloc+0x19
        libswiftCore.dylib`swift_allocObject+0x27
        test_1_reqs_1000_conn`doRequests(group:number:)+0x66
[...]
```

Now we see there's another stacktrace in the `AFTER` section which has no corresponding stacktrace in `BEFORE`. From the stack we can see it's originating from `doRequests(group:number:)`. In this instance we were working on options applied in this function so it appears we have added allocations.  We have also increased the number of allocations which are reported in the different numbers section where stack traces have been paired but with different numbers of allocations.

### Debugging with 'heaptrack' (Linux)

Unfortunately we don't have dtrace or Instruments on Linux, but there is the [heaptrack](https://github.com/KDE/heaptrack) tool which supports diff analysis of two versions.

Tested on Ubuntu 20.04 (and likely working on other distributions) Install it with:

```
sudo apt-get install heaptrack
```

Then using the instructions previously, you need to build the test without hooks such that heaptrack can hook instead, e.g.:

```
cd IntegrationTests/tests_04_performance/test_01_resources/
./run-nio-alloc-counter-tests.sh -- -n test_1000_autoReadGetAndSet.swift 
```

and then cd to the temp directory from the output (find the tmp directory as above), e.g.:

```
cd /tmp/.nio_alloc_counter_tests_5jMMhk
```

Finally, run the binary with heaptrack two times — first we do it for `main` to get a baseline:
```
ubuntu@ip-172-31-25-161 /t/.nio_alloc_counter_tests_GRusAy> SWIFTNIO_URING_DISABLED=1 heaptrack .build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
heaptrack output will be written to "/tmp/.nio_alloc_counter_tests_GRusAy/heaptrack.test_1000_autoReadGetAndSet.84341.gz"
starting application, this might take some time...
SWIFTNIO_URING_DISABLED set, disabling liburing.
test_1000_autoReadGetAndSet.total_allocations: 0
test_1000_autoReadGetAndSet.total_allocated_bytes: 0
test_1000_autoReadGetAndSet.remaining_allocations: 0
DEBUG: [["total_allocations": 0, "total_allocated_bytes": 0, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocated_bytes": 0, "total_allocations": 0], ["total_allocations": 0, "total_allocated_bytes": 0, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocated_bytes": 0, "total_allocations": 0], ["remaining_allocations": 0, "total_allocated_bytes": 0, "total_allocations": 0], ["remaining_allocations": 0, "total_allocations": 0, "total_allocated_bytes": 0], ["total_allocated_bytes": 0, "total_allocations": 0, "remaining_allocations": 0], ["total_allocated_bytes": 0, "total_allocations": 0, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocations": 0, "total_allocated_bytes": 0], ["total_allocations": 0, "remaining_allocations": 0, "total_allocated_bytes": 0]]
free(): invalid pointer
Aborted (core dumped)
heaptrack stats:
    allocations:              319347
    leaked allocations:       107
    temporary allocations:    68
Heaptrack finished! Now run the following to investigate the data:

  heaptrack --analyze "/tmp/.nio_alloc_counter_tests_GRusAy/heaptrack.test_1000_autoReadGetAndSet.84341.gz"
```
Then run it a second time for the feature branch:
```
ubuntu@ip-172-31-25-161 /t/.nio_alloc_counter_tests_ARusXy> heaptrack .build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
heaptrack output will be written to "/tmp/.nio_alloc_counter_tests_GRusAy/heaptrack.test_1000_autoReadGetAndSet.84372.gz"
starting application, this might take some time...
test_1000_autoReadGetAndSet.remaining_allocations: 0
test_1000_autoReadGetAndSet.total_allocations: 0
test_1000_autoReadGetAndSet.total_allocated_bytes: 0
DEBUG: [["remaining_allocations": 0, "total_allocations": 0, "total_allocated_bytes": 0], ["total_allocated_bytes": 0, "total_allocations": 0, "remaining_allocations": 0], ["total_allocated_bytes": 0, "remaining_allocations": 0, "total_allocations": 0], ["total_allocations": 0, "total_allocated_bytes": 0, "remaining_allocations": 0], ["total_allocations": 0, "remaining_allocations": 0, "total_allocated_bytes": 0], ["remaining_allocations": 0, "total_allocations": 0, "total_allocated_bytes": 0], ["total_allocations": 0, "total_allocated_bytes": 0, "remaining_allocations": 0], ["remaining_allocations": 0, "total_allocated_bytes": 0, "total_allocations": 0], ["remaining_allocations": 0, "total_allocations": 0, "total_allocated_bytes": 0], ["remaining_allocations": 0, "total_allocated_bytes": 0, "total_allocations": 0]]
free(): invalid pointer
Aborted (core dumped)
heaptrack stats:
    allocations:              673989
    leaked allocations:       117
    temporary allocations:    341011
Heaptrack finished! Now run the following to investigate the data:

  heaptrack --analyze "/tmp/.nio_alloc_counter_tests_GRusAy/heaptrack.test_1000_autoReadGetAndSet.84372.gz"
ubuntu@ip-172-31-25-161 /t/.nio_alloc_counter_tests_GRusAy> 
```
Here we could see that we had 673989 allocations in the feature branch version and 319347 in `main`, so clearly a regression.

Finally, we can analyze the output as a diff from these runs using `heaptrack_print`:

```
heaptrack_print -T -d heaptrack.test_1000_autoReadGetAndSet.84341.gz heaptrack.test_1000_autoReadGetAndSet.84372.gz | swift demangle
```
`-T` gives us the temporary allocations (as it in this case was not a leak, but a transient alloaction - if you have leaks remove `-T`).

The output can be quite long, but in this case as we look for transient allocations, scroll down to:

```
MOST TEMPORARY ALLOCATIONS
307740 temporary allocations of 290324 allocations in total (106.00%) from
swift_slowAlloc
  in /home/ubuntu/bin/usr/lib/swift/linux/libswiftCore.so
43623 temporary allocations of 44553 allocations in total (97.91%) from:
    swift_allocObject
      in /home/ubuntu/bin/usr/lib/swift/linux/libswiftCore.so
    NIO.ServerBootstrap.(bind0 in _C131C0126670CF68D8B594DDFAE0CE57)(makeServerChannel: (NIO.SelectableEventLoop, NIO.EventLoopGroup) throws -> NIO.ServerSocketChannel, _: (NIO.EventLoop, NIO.ServerSocketChannel) -> NIO.EventLoopFuture<()>) -> NIO.EventLoopFuture<NIO.Channel>
      at /home/ubuntu/swiftnio/swift-nio/Sources/NIO/Bootstrap.swift:295
      in /tmp/.nio_alloc_counter_tests_GRusAy/.build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
    merged NIO.ServerBootstrap.bind(host: Swift.String, port: Swift.Int) -> NIO.EventLoopFuture<NIO.Channel>
      in /tmp/.nio_alloc_counter_tests_GRusAy/.build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
    NIO.ServerBootstrap.bind(host: Swift.String, port: Swift.Int) -> NIO.EventLoopFuture<NIO.Channel>
      in /tmp/.nio_alloc_counter_tests_GRusAy/.build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
    Test_test_1000_autoReadGetAndSet.run(identifier: Swift.String) -> ()
      at /tmp/.nio_alloc_counter_tests_GRusAy/Sources/Test_test_1000_autoReadGetAndSet/file.swift:24
      in /tmp/.nio_alloc_counter_tests_GRusAy/.build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
    main
      at Sources/bootstrap_test_1000_autoReadGetAndSet/main.c:18
      in /tmp/.nio_alloc_counter_tests_GRusAy/.build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
22208 temporary allocations of 22276 allocations in total (99.69%) from:
    swift_allocObject
      in /home/ubuntu/bin/usr/lib/swift/linux/libswiftCore.so
    generic specialization <Swift.UnsafeBufferPointer<Swift.Int8>> of Swift._copyCollectionToContiguousArray<A where A: Swift.Collection>(A) -> Swift.ContiguousArray<A.Element>
      in /home/ubuntu/bin/usr/lib/swift/linux/libswiftCore.so
    Swift.String.utf8CString.getter : Swift.ContiguousArray<Swift.Int8>
      in /home/ubuntu/bin/usr/lib/swift/linux/libswiftCore.so
    NIO.URing.getEnvironmentVar(Swift.String) -> Swift.String?
      at /home/ubuntu/swiftnio/swift-nio/Sources/NIO/LinuxURing.swift:291
      in /tmp/.nio_alloc_counter_tests_GRusAy/.build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
    NIO.URing._debugPrint(@autoclosure () -> Swift.String) -> ()
      at /home/ubuntu/swiftnio/swift-nio/Sources/NIO/LinuxURing.swift:297
...
22196 temporary allocations of 22276 allocations in total (99.64%) from:
```

And here we could fairly quickly see that the transient extra allocations was due to extra debug printing and querying of environment variables:

```
NIO.URing.getEnvironmentVar(Swift.String) -> Swift.String?
  at /home/ubuntu/swiftnio/swift-nio/Sources/NIO/LinuxURing.swift:291
  in /tmp/.nio_alloc_counter_tests_GRusAy/.build/x86_64-unknown-linux-gnu/release/test_1000_autoReadGetAndSet
NIO.URing._debugPrint(@autoclosure () -> Swift.String) -> ()
```

And this code will be removed before final integration of the feature branch, so the diff will go away.