File: kernel_delay.rst

package info (click to toggle)
openvswitch 3.5.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 97,848 kB
  • sloc: sh: 1,643,930; ansic: 313,386; python: 27,939; xml: 21,526; makefile: 546; javascript: 191
file content (629 lines) | stat: -rw-r--r-- 31,483 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
Troubleshooting Open vSwitch: Is the kernel to blame?
=====================================================
Often, when troubleshooting Open vSwitch (OVS) in the field, you might be left
wondering if the issue is really OVS-related, or if it's a problem with the
kernel being overloaded. Messages in the log like
``Unreasonably long XXXXms poll interval`` might suggest it's OVS, but from
experience, these are mostly related to an overloaded Linux Kernel.
The kernel_delay.py tool can help you quickly identify if the focus of your
investigation should be OVS or the Linux kernel.


Introduction
------------
``kernel_delay.py`` consists of a Python script that uses the BCC [#BCC]_
framework to install eBPF probes. The data the eBPF probes collect will be
analyzed and presented to the user by the Python script. Some of the presented
data can also be captured by the individual scripts included in the BBC [#BCC]_
framework.

kernel_delay.py has two modes of operation:

- In **time mode**, the tool runs for a specific time and collects the
  information.
- In **trigger mode**, event collection can be started and/or stopped based on
  a specific eBPF probe. Currently, the following probes are supported:
  - USDT probes
  - Kernel tracepoints
  - kprobe
  - kretprobe
  - uprobe
  - uretprobe


In addition, the option, ``--sample-count``, exists to specify how many
iterations you would like to do. When using triggers, you can also ignore
samples if they are less than a number of nanoseconds with the
``--trigger-delta`` option. The latter might be useful when debugging Linux
syscalls which take a long time to complete. More on this later. Finally, you
can configure the delay between two sample runs with the ``--sample-interval``
option.

Before getting into more details, you can run the tool without any options
to see what the output looks like. Notice that it will try to automatically
get the process ID of the running ``ovs-vswitchd``. You can overwrite this
with the ``--pid`` option.

.. code-block:: console

  $ sudo ./kernel_delay.py
  # Start sampling @2023-06-08T12:17:22.725127 (10:17:22 UTC)
  # Stop sampling @2023-06-08T12:17:23.224781 (10:17:23 UTC)
  # Sample dump @2023-06-08T12:17:23.224855 (10:17:23 UTC)
  TID        THREAD           <RESOURCE SPECIFIC>
  ---------- ---------------- ----------------------------------------------------------------------------
       27090 ovs-vswitchd     [SYSCALL STATISTICS]
                  <EDIT: REMOVED DATA FOR ovs-vswitchd THREAD>

       31741 revalidator122   [SYSCALL STATISTICS]
                  NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                  poll                      7           5       184,193,176       184,191,520
                  recvmsg                  47         494       125,208,756           310,331
                  futex                   202           8        18,768,758         4,023,039
                  sendto                   44          10           375,861           266,867
                  sendmsg                  46           4            43,294            11,213
                  write                     1           1             5,949             5,949
                  getrusage                98           1             1,424             1,424
                  read                      0           1             1,292             1,292
                  TOTAL( - poll):                     519       144,405,334

                  [THREAD RUN STATISTICS]
                  SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                       6       136,764,071             1,480       115,146,424

                  [THREAD READY STATISTICS]
                  SCHED_CNT           TOTAL ns            MAX ns
                       7            11,334             6,636

                  [THREAD STOPPED STATISTICS]
                  STOP_CNT            TOTAL ns            MAX ns
                       3         3,045,728,323     1,015,739,474

                  [HARD IRQ STATISTICS]
                  NAME                       COUNT          TOTAL ns            MAX ns
                  eno8303-rx-1                   1             3,586             3,586
                  TOTAL:                         1             3,586

                  [SOFT IRQ STATISTICS]
                  NAME                 VECT_NR       COUNT          TOTAL ns            MAX ns
                  net_rx                     3           1            17,699            17,699
                  sched                      7           6            13,820             3,226
                  rcu                        9          16            13,586             1,554
                  timer                      1           3            10,259             3,815
                  TOTAL:                                26            55,364


By default, the tool will run for half a second in `time mode`. To extend this
you can use the ``--sample-time`` option.


What will it report
-------------------
The above sample output separates the captured data on a per-thread basis.
For this, it displays the thread's id (``TID``) and name (``THREAD``),
followed by resource-specific data. Which are:

- ``SYSCALL STATISTICS``
- ``THREAD RUN STATISTICS``
- ``THREAD READY STATISTICS``
- ``THREAD STOPPED STATISTICS``
- ``HARD IRQ STATISTICS``
- ``SOFT IRQ STATISTICS``
- ``UPCALL STATISTICS``

The following sections will describe in detail what statistics they report.


``SYSCALL STATISTICS``
~~~~~~~~~~~~~~~~~~~~~~
``SYSCALL STATISTICS`` tell you which Linux system calls got executed during
the measurement interval. This includes the number of times the syscall was
called (``COUNT``), the total time spent in the system calls (``TOTAL ns``),
and the worst-case duration of a single call (``MAX ns``).

It also shows the total of all system calls, but it excludes the poll system
call, as the purpose of this call is to wait for activity on a set of sockets,
and usually, the thread gets swapped out.

Note that it only counts calls that started and stopped during the
measurement interval!


``THREAD RUN STATISTICS``
~~~~~~~~~~~~~~~~~~~~~~~~~
``THREAD RUN STATISTICS`` tell you how long the thread was running on a CPU
during the measurement interval.

Note that these statistics only count events where the thread started and
stopped running on a CPU during the measurement interval. For example, if
this was a PMD thread, you should see zero ``SCHED_CNT`` and ``TOTAL_ns``.
If not, there might be a misconfiguration.


``THREAD READY STATISTICS``
~~~~~~~~~~~~~~~~~~~~~~~~~~~
``THREAD READY STATISTICS`` tell you the time between the thread being ready
to run and it actually running on the CPU.

Note that these statistics only count events where the thread was getting
ready to run and started running during the measurement interval.


``THREAD STOPPED STATISTICS``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``THREAD STOPPED STATISTICS`` reveal the number of instances where the thread
has been scheduled out while in the running state due to its transition to
the TASK_STOPPED state.

This behavior can be replicated by manually placing the thread in the stopped
state and subsequently resuming it. For instance:

.. code-block:: console

  # kill -STOP $(pidof ovs-vswitchd); \
    sleep 1; \
    kill -CONT $(pidof ovs-vswitchd);

Note that these statistics only count events where the thread was running at
the time it was put to stopped state.


``HARD IRQ STATISTICS``
~~~~~~~~~~~~~~~~~~~~~~~
``HARD IRQ STATISTICS`` tell you how much time was spent servicing hard
interrupts during the threads run time.

It shows the interrupt name (``NAME``), the number of interrupts (``COUNT``),
the total time spent in the interrupt handler (``TOTAL ns``), and the
worst-case duration (``MAX ns``).


``SOFT IRQ STATISTICS``
~~~~~~~~~~~~~~~~~~~~~~~
``SOFT IRQ STATISTICS`` tell you how much time was spent servicing soft
interrupts during the threads run time.

It shows the interrupt name (``NAME``), vector number (``VECT_NR``), the
number of interrupts (``COUNT``), the total time spent in the interrupt
handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).


``UPCALL STATISTICS``
~~~~~~~~~~~~~~~~~~~~~
The ``UPCALL STATISTICS`` section provides information on the number of
upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
the specific return codes are recorded. Statistics are presented both as
a total and on a per-CPU basis.


The ``--syscall-events`` option
-------------------------------
In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
the tool can also report each individual syscall. This can be a usefull
second step if the ``SYSCALL_STATISTICS`` show high latency numbers.

All you need to do is add the ``--syscall-events`` option, with or without
the additional ``DURATION_NS`` parameter. The ``DUTATION_NS`` parameter
allows you to exclude events that take less than the supplied time.

The ``--skip-syscall-poll-events`` option allows you to exclude poll
syscalls from the report.

Below is an example run, note that the resource-specific data is removed
to highlight the syscall events:

.. code-block:: console

  $ sudo ./kernel_delay.py  --syscall-events 50000 --skip-syscall-poll-events
  # Start sampling @2023-06-13T17:10:46.460874 (15:10:46 UTC)
  # Stop sampling @2023-06-13T17:10:46.960727 (15:10:46 UTC)
  # Sample dump @2023-06-13T17:10:46.961033 (15:10:46 UTC)
  TID        THREAD           <RESOURCE SPECIFIC>
  ---------- ---------------- ----------------------------------------------------------------------------
     3359686 ipf_clean2       [SYSCALL STATISTICS]
     ...
     3359635 ovs-vswitchd     [SYSCALL STATISTICS]
     ...
     3359697 revalidator12    [SYSCALL STATISTICS]
     ...
     3359698 revalidator13    [SYSCALL STATISTICS]
     ...
     3359699 revalidator14    [SYSCALL STATISTICS]
     ...
     3359700 revalidator15    [SYSCALL STATISTICS]
     ...

  # SYSCALL EVENTS:
         ENTRY (ns)           EXIT (ns)        TID COMM             DELTA (us)  SYSCALL
    ------------------- ------------------- ---------- ---------------- ----------  ----------------
       2161821694935486    2161821695031201    3359699 revalidator14            95  futex
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode+0x9 [kernel]
        do_syscall_64+0x68 [kernel]
        entry_SYSCALL_64_after_hwframe+0x72 [kernel]
        __GI___lll_lock_wait+0x30 [libc.so.6]
        ovs_mutex_lock_at+0x18 [ovs-vswitchd]
        [unknown] 0x696c003936313a63
       2161821695276882    2161821695333687    3359698 revalidator13            56  futex
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode+0x9 [kernel]
        do_syscall_64+0x68 [kernel]
        entry_SYSCALL_64_after_hwframe+0x72 [kernel]
        __GI___lll_lock_wait+0x30 [libc.so.6]
        ovs_mutex_lock_at+0x18 [ovs-vswitchd]
        [unknown] 0x696c003134313a63
       2161821695275820    2161821695405733    3359700 revalidator15           129  futex
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode+0x9 [kernel]
        do_syscall_64+0x68 [kernel]
        entry_SYSCALL_64_after_hwframe+0x72 [kernel]
        __GI___lll_lock_wait+0x30 [libc.so.6]
        ovs_mutex_lock_at+0x18 [ovs-vswitchd]
        [unknown] 0x696c003936313a63
       2161821695964969    2161821696052021    3359635 ovs-vswitchd             87  accept
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode_prepare+0x161 [kernel]
        syscall_exit_to_user_mode+0x9 [kernel]
        do_syscall_64+0x68 [kernel]
        entry_SYSCALL_64_after_hwframe+0x72 [kernel]
        __GI_accept+0x4d [libc.so.6]
        pfd_accept+0x3a [ovs-vswitchd]
        [unknown] 0x7fff19f2bd00
        [unknown] 0xe4b8001f0f

As you can see above, the output also shows the stackback trace. You can
disable this using the ``--stack-trace-size 0`` option.

As you can see above, the backtrace does not show a lot of useful information
due to the BCC [#BCC]_ toolkit not supporting dwarf decoding. So to further
analyze system call backtraces, you could use perf. The following perf
script can do this for you (refer to the embedded instructions):

https://github.com/chaudron/perf_scripts/blob/master/analyze_perf_pmd_syscall.py


Using triggers
--------------
The tool supports start and, or stop triggers. This will allow you to capture
statistics triggered by a specific event. The following combinations of
stop-and-start triggers can be used.

If you only use ``--start-trigger``, the inspection start when the trigger
happens and runs until the ``--sample-time`` number of seconds has passed.
The example below shows all the supported options in this scenario.

.. code-block:: console

  $ sudo ./kernel_delay.py --start-trigger up:bridge_run --sample-time 4 \
                           --sample-count 4 --sample-interval 1


If you only use ``--stop-trigger``, the inspection starts immediately and
stops when the trigger happens.  The example below shows all the supported
options in this scenario.

.. code-block:: console

  $ sudo ./kernel_delay.py --stop-trigger upr:bridge_run \
                           --sample-count 4 --sample-interval 1


If you use both ``--start-trigger`` and ``--stop-trigger`` triggers, the
statistics are captured between the two first occurrences of these events.
The example below shows all the supported options in this scenario.

.. code-block:: console

  $ sudo ./kernel_delay.py --start-trigger up:bridge_run \
                           --stop-trigger upr:bridge_run \
                           --sample-count 4 --sample-interval 1 \
                           --trigger-delta 50000

What triggers are supported? Note that what ``kernel_delay.py`` calls triggers,
BCC [#BCC]_, calls events; these are eBPF tracepoints you can attach to.
For more details on the supported tracepoints, check out the BCC
documentation [#BCC_EVENT]_.

The list below shows the supported triggers and their argument format:

**USDT probes:**
  [u|usdt]:{provider}:{probe}
**Kernel tracepoint:**
  [t:trace]:{system}:{event}
**kprobe:**
  [k:kprobe]:{kernel_function}
**kretprobe:**
  [kr:kretprobe]:{kernel_function}
**uprobe:**
  [up:uprobe]:{function}
**uretprobe:**
  [upr:uretprobe]:{function}

Here are a couple of trigger examples, more use-case-specific examples can be
found in the *Examples* section.

.. code-block:: console

  --start|stop-trigger u:udpif_revalidator:start_dump
  --start|stop-trigger t:openvswitch:ovs_dp_upcall
  --start|stop-trigger k:ovs_dp_process_packet
  --start|stop-trigger kr:ovs_dp_process_packet
  --start|stop-trigger up:bridge_run
  --start|stop-trigger upr:bridge_run


Examples
--------
This section will give some examples of how to use this tool in real-world
scenarios. Let's start with the issue where Open vSwitch reports
``Unreasonably long XXXXms poll interval`` on your revalidator threads. Note
that there is a blog available explaining how the revalidator process works
in OVS [#REVAL_BLOG]_.

First, let me explain this log message. It gets logged if the time delta
between two ``poll_block()`` calls is more than 1 second. In other words,
the process was spending a lot of time processing stuff that was made
available by the return of the ``poll_block()`` function.

Do a run with the tool using the existing USDT revalidator probes as a start
and stop trigger (Note that the resource-specific data is removed from the none
revalidator threads):

.. code-block:: console

  $ sudo ./kernel_delay.py --start-trigger u:udpif_revalidator:start_dump --stop-trigger u:udpif_revalidator:sweep_done
  # Start sampling (trigger@791777093512008) @2023-06-14T14:52:00.110303 (12:52:00 UTC)
  # Stop sampling (trigger@791778281498462) @2023-06-14T14:52:01.297975 (12:52:01 UTC)
  # Triggered sample dump, stop-start delta 1,187,986,454 ns @2023-06-14T14:52:01.298021 (12:52:01 UTC)
  TID        THREAD           <RESOURCE SPECIFIC>
  ---------- ---------------- ----------------------------------------------------------------------------
     1457761 handler24        [SYSCALL STATISTICS]
                              NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                              sendmsg                  46        6110       123,274,761            41,776
                              recvmsg                  47      136299        99,397,508            49,896
                              futex                   202          51         7,655,832         7,536,776
                              poll                      7        4068         1,202,883             2,907
                              getrusage                98        2034           586,602             1,398
                              sendto                   44           9           213,682            27,417
                              TOTAL( - poll):                  144503       231,128,385

                              [THREAD RUN STATISTICS]
                              SCHED_CNT           TOTAL ns            MIN ns            MAX ns

                              [THREAD READY STATISTICS]
                              SCHED_CNT           TOTAL ns            MAX ns
                                       1             1,438             1,438

                              [SOFT IRQ STATISTICS]
                              NAME                 VECT_NR       COUNT          TOTAL ns            MAX ns
                              sched                      7          21            59,145             3,769
                              rcu                        9          50            42,917             2,234
                              TOTAL:                                71           102,062
     1457733 ovs-vswitchd     [SYSCALL STATISTICS]
     ...
     1457792 revalidator55    [SYSCALL STATISTICS]
                              NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                              futex                   202          73       572,576,329        19,621,600
                              recvmsg                  47         815       296,697,618           405,338
                              sendto                   44           3            78,302            26,837
                              sendmsg                  46           3            38,712            13,250
                              write                     1           1             5,073             5,073
                              TOTAL( - poll):                     895       869,396,034

                              [THREAD RUN STATISTICS]
                              SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                      48       394,350,393             1,729       140,455,796

                              [THREAD READY STATISTICS]
                              SCHED_CNT           TOTAL ns            MAX ns
                                      49            23,650             1,559

                              [SOFT IRQ STATISTICS]
                              NAME                 VECT_NR       COUNT          TOTAL ns            MAX ns
                              sched                      7          14            26,889             3,041
                              rcu                        9          28            23,024             1,600
                              TOTAL:                                42            49,913


Above you see from the start of the output that the trigger took more than a
second (1,187,986,454 ns), which is already know, by looking at the output of
the ``ovs-vsctl upcall/show`` command.

From the *revalidator55*'s ``SYSCALL STATISTICS`` statistics you can see it
spent almost 870ms handling syscalls, and there were no poll() calls being
executed. The ``THREAD RUN STATISTICS`` statistics here are a bit misleading,
as it looks like OVS only spent 394ms on the CPU. But earlier, it was mentioned
that this time does not include the time being on the CPU at the start or stop
of an event. What is exactly the case here, because USDT probes were used.

From the above data and maybe some ``top`` output, it can be determined that
the *revalidator55* thread is taking a lot of CPU time, probably because it
has to do a lot of revalidator work by itself. The solution here is to increase
the number of revalidator threads, so more work could be done in parallel.

Here is another run of the same command in another scenario:

.. code-block:: console

  $ sudo ./kernel_delay.py --start-trigger u:udpif_revalidator:start_dump --stop-trigger u:udpif_revalidator:sweep_done
  # Start sampling (trigger@795160501758971) @2023-06-14T15:48:23.518512 (13:48:23 UTC)
  # Stop sampling (trigger@795160764940201) @2023-06-14T15:48:23.781381 (13:48:23 UTC)
  # Triggered sample dump, stop-start delta 263,181,230 ns @2023-06-14T15:48:23.781414 (13:48:23 UTC)
  TID        THREAD           <RESOURCE SPECIFIC>
  ---------- ---------------- ----------------------------------------------------------------------------
     1457733 ovs-vswitchd     [SYSCALL STATISTICS]
                              ...
     1457792 revalidator55    [SYSCALL STATISTICS]
                              NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                              recvmsg                  47         284       193,422,110        46,248,418
                              sendto                   44           2            46,685            23,665
                              sendmsg                  46           2            24,916            12,703
                              write                     1           1             6,534             6,534
                              TOTAL( - poll):                     289       193,500,245

                              [THREAD RUN STATISTICS]
                              SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                       2        47,333,558           331,516        47,002,042

                              [THREAD READY STATISTICS]
                              SCHED_CNT           TOTAL ns            MAX ns
                                       3        87,000,403        45,999,712

                              [SOFT IRQ STATISTICS]
                              NAME                 VECT_NR       COUNT          TOTAL ns            MAX ns
                              sched                      7           2             9,504             5,109
                              TOTAL:                                 2             9,504


Here you can see the revalidator run took about 263ms, which does not look
odd, however, the ``THREAD READY STATISTICS`` information shows that OVS was
waiting 87ms for a CPU to be run on. This means the revalidator process could
have finished 87ms faster. Looking at the ``MAX ns`` value, a worst-case delay
of almost 46ms can be seen, which hints at an overloaded system.

One final example that uses a ``uprobe`` to get some statistics on a
``bridge_run()`` execution that takes more than 1ms.

.. code-block:: console

  $ sudo ./kernel_delay.py --start-trigger up:bridge_run --stop-trigger ur:bridge_run --trigger-delta 1000000
  # Start sampling (trigger@2245245432101270) @2023-06-14T16:21:10.467919 (14:21:10 UTC)
  # Stop sampling (trigger@2245245432414656) @2023-06-14T16:21:10.468296 (14:21:10 UTC)
  # Sample dump skipped, delta 313,386 ns @2023-06-14T16:21:10.468419 (14:21:10 UTC)
  # Start sampling (trigger@2245245505301745) @2023-06-14T16:21:10.540970 (14:21:10 UTC)
  # Stop sampling (trigger@2245245506911119) @2023-06-14T16:21:10.542499 (14:21:10 UTC)
  # Triggered sample dump, stop-start delta 1,609,374 ns @2023-06-14T16:21:10.542565 (14:21:10 UTC)
  TID        THREAD           <RESOURCE SPECIFIC>
  ---------- ---------------- ----------------------------------------------------------------------------
     3371035 <unknown:3366258/3371035> [SYSCALL STATISTICS]
     ... <REMOVED 7 MORE unknown THREADS>
     3371102 handler66        [SYSCALL STATISTICS]
     ... <REMOVED 7 MORE HANDLER THREADS>
     3366258 ovs-vswitchd     [SYSCALL STATISTICS]
                              NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                              futex                   202          43           403,469           199,312
                              clone3                  435          13           174,394            30,731
                              munmap                   11           8           115,774            21,861
                              poll                      7           5            92,969            38,307
                              unlink                   87           2            49,918            35,741
                              mprotect                 10           8            47,618            13,201
                              accept                   43          10            31,360             6,976
                              mmap                      9           8            30,279             5,776
                              write                     1           6            27,720            11,774
                              rt_sigprocmask           14          28            12,281               970
                              read                      0           6             9,478             2,318
                              recvfrom                 45           3             7,024             4,024
                              sendto                   44           1             4,684             4,684
                              getrusage                98           5             4,594             1,342
                              close                     3           2             2,918             1,627
                              recvmsg                  47           1             2,722             2,722
                              TOTAL( - poll):                     144           924,233

                              [THREAD RUN STATISTICS]
                              SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                      13           817,605             5,433           524,376

                              [THREAD READY STATISTICS]
                              SCHED_CNT           TOTAL ns            MAX ns
                                      14            28,646            11,566

                              [SOFT IRQ STATISTICS]
                              NAME                 VECT_NR       COUNT          TOTAL ns            MAX ns
                              rcu                        9           1             2,838             2,838
                              TOTAL:                                 1             2,838

     3371110 revalidator74    [SYSCALL STATISTICS]
     ... <REMOVED 7 MORE NEW revalidator THREADS>
     3366311 urcu3            [SYSCALL STATISTICS]
     ...


OVS removed some of the threads and their resource-specific data, but based
on the ``<unknown:3366258/3371035>`` thread name, you can determine that some
threads no longer exist. In the ``ovs-vswitchd`` thread, you can see some
``clone3`` syscalls, indicating threads were created. In this example, it was
due to the deletion of a bridge, which resulted in the recreation of the
revalidator and handler threads.


Use with Openshift
------------------
This section describes how you would use the tool on a node in an OpenShift
cluster. It assumes you have console access to the node, either directly or
through a debug container.

A base fedora38 container will be used through podman, as this will allow the
use of some additional tools and packages needed.

First the containers need to be started:

.. code-block:: console

  [core@localhost ~]$ sudo podman run -it --rm \
     -e PS1='[(DEBUG)\u@\h \W]\$ ' \
     --privileged --network=host --pid=host \
     -v /lib/modules:/lib/modules:ro \
     -v /sys/kernel/debug:/sys/kernel/debug \
     -v /proc:/proc \
     -v /:/mnt/rootdir \
     quay.io/fedora/fedora:38-x86_64

  [(DEBUG)root@localhost /]#


Next add the ``linux_delay.py`` dependencies:

.. code-block:: console

  [(DEBUG)root@localhost /]# dnf install -y bcc-tools perl-interpreter \
       python3-pytz  python3-psutil


You need to install the devel, debug and source RPMs for your OVS and kernel
version:

.. code-block:: console

  [(DEBUG)root@localhost home]# rpm -i \
      openvswitch2.17-debuginfo-2.17.0-67.el8fdp.x86_64.rpm \
      openvswitch2.17-debugsource-2.17.0-67.el8fdp.x86_64.rpm \
      kernel-devel-4.18.0-372.41.1.el8_6.x86_64.rpm


Now the tool can be started. Here the above ``bridge_run()`` example is used:

.. code-block:: console

  [(DEBUG)root@localhost home]# ./kernel_delay.py --start-trigger up:bridge_run --stop-trigger ur:bridge_run
  # Start sampling (trigger@75279117343513) @2023-06-15T11:44:07.628372 (11:44:07 UTC)
  # Stop sampling (trigger@75279117443980) @2023-06-15T11:44:07.628529 (11:44:07 UTC)
  # Triggered sample dump, stop-start delta 100,467 ns @2023-06-15T11:44:07.628569 (11:44:07 UTC)
  TID        THREAD           <RESOURCE SPECIFIC>
  ---------- ---------------- ----------------------------------------------------------------------------
        1246 ovs-vswitchd     [SYSCALL STATISTICS]
                              NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                              getdents64              217           2             8,560             8,162
                              openat                  257           1             6,951             6,951
                              accept                   43           4             6,942             3,763
                              recvfrom                 45           1             3,726             3,726
                              recvmsg                  47           2             2,880             2,188
                              stat                      4           2             1,946             1,384
                              close                     3           1             1,393             1,393
                              fstat                     5           1             1,324             1,324
                              TOTAL( - poll):                      14            33,722

                              [THREAD RUN STATISTICS]
                              SCHED_CNT           TOTAL ns            MIN ns            MAX ns

                              [THREAD READY STATISTICS]
                              SCHED_CNT           TOTAL ns            MAX ns


.. rubric:: Footnotes

.. [#BCC] https://github.com/iovisor/bcc
.. [#BCC_EVENT] https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#events--arguments
.. [#REVAL_BLOG] https://developers.redhat.com/articles/2022/10/19/open-vswitch-revalidator-process-explained