File: stackcount_example.txt

package info (click to toggle)
bpfcc 0.18.0%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 12,368 kB
  • sloc: ansic: 132,727; python: 36,226; cpp: 26,973; sh: 710; yacc: 525; makefile: 141; lex: 94
file content (963 lines) | stat: -rw-r--r-- 21,965 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
Demonstrations of stackcount, the Linux eBPF/bcc version.


This program traces functions and frequency counts them with their entire
stack trace, summarized in-kernel for efficiency. For example, counting
stack traces that led to the submit_bio() kernel function, which creates
block device I/O:

# ./stackcount submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
  submit_bio
  submit_bh
  journal_submit_commit_record.isra.13
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
  mb_cache_list
    1

  submit_bio
  __block_write_full_page.constprop.39
  block_write_full_page
  blkdev_writepage
  __writepage
  write_cache_pages
  generic_writepages
  do_writepages
  __writeback_single_inode
  writeback_sb_inodes
  __writeback_inodes_wb
    2

  submit_bio
  __block_write_full_page.constprop.39
  block_write_full_page
  blkdev_writepage
  __writepage
  write_cache_pages
  generic_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_fdatawrite
  fdatawrite_one_bdev
    36

  submit_bio
  submit_bh
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
  mb_cache_list
    38

  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_rename
  ext4_rename2
  vfs_rename
  sys_rename
  entry_SYSCALL_64_fastpath
    79

Detaching...

The output shows unique stack traces, in order from leaf (on-CPU) to root,
followed by their occurrence count. The last stack trace in the above output
shows syscall handling, ext4_rename(), and filemap_flush(): looks like an
application issued file rename has caused back end disk I/O due to ext4
block allocation and a filemap_flush().


Now adding the -P option to display stacks separately for each process:

# ./stackcount -P submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_release_file
  __fput
  ____fput
  task_work_run
  exit_to_usermode_loop
  syscall_return_slowpath
  entry_SYSCALL_64_fastpath
  [unknown]
  [unknown]
    tar [15069]
    5

  submit_bio
  ext4_bio_write_page
  mpage_submit_page
  mpage_map_and_submit_buffers
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_release_file
  __fput
  ____fput
  task_work_run
  exit_to_usermode_loop
  syscall_return_slowpath
  entry_SYSCALL_64_fastpath
  [unknown]
  [unknown]
    tar [15069]
    15

  submit_bio
  ext4_readpages
  __do_page_cache_readahead
  ondemand_readahead
  page_cache_async_readahead
  generic_file_read_iter
  __vfs_read
  vfs_read
  sys_read
  entry_SYSCALL_64_fastpath
  [unknown]
    tar [15069]
    113

Detaching...

The last stack trace in the above output shows syscall handling, sys_read(),
vfs_read(), and then "readahead" functions: looks like an application issued
file read has triggered read ahead. With "-P", the application can be seen
after the stack trace, in this case, "tar [15069]" for the "tar" command,
PID 15069.

The order of printed stack traces is from least to most frequent. The most
frequent in this case, the ext4_readpages() stack, was taken 113 times during
tracing.

The "[unknown]" frames are from user-level, since this simple workload is
the tar command, which apparently has been compiled without frame pointers.
It's a common compiler optimization, but it breaks frame pointer-based stack
walkers. Similar broken stacks will be seen by other profilers and debuggers
that use frame pointers. Hopefully your application preserves them so that
the user-level stack trace is visible. So how does one get frame pointers, if
your application doesn't have them to start with? For the current bcc (until
it supports other stack walkers), you need to be running an application binaries
that preserves frame pointers, eg, using gcc's -fno-omit-frame-pointer. That's
about all I'll say here: this is a big topic that is not bcc/BPF specific.

It can be useful to trace the path to submit_bio to explain unusual rates of
disk IOPS. These could have in-kernel origins (eg, background scrub).


Now adding the -d option to delimit kernel and user stacks:

# ./stackcount -P -d submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
  submit_bio
  submit_bh
  journal_submit_commit_record
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
    --
    jbd2/xvda1-8 [405]
    1

  submit_bio
  submit_bh
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
    --
    jbd2/xvda1-8 [405]
    2

  submit_bio
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_release_file
  __fput
  ____fput
  task_work_run
  exit_to_usermode_loop
  syscall_return_slowpath
  entry_SYSCALL_64_fastpath
    --
  [unknown]
  [unknown]
    tar [15187]
    5

  submit_bio
  ext4_bio_write_page
  mpage_submit_page
  mpage_map_and_submit_buffers
  ext4_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_flush
  ext4_alloc_da_blocks
  ext4_release_file
  __fput
  ____fput
  task_work_run
  exit_to_usermode_loop
  syscall_return_slowpath
  entry_SYSCALL_64_fastpath
    --
  [unknown]
  [unknown]
    tar [15187]
    15

  submit_bio
  ext4_readpages
  __do_page_cache_readahead
  ondemand_readahead
  page_cache_async_readahead
  generic_file_read_iter
  __vfs_read
  vfs_read
  sys_read
  entry_SYSCALL_64_fastpath
    --
  [unknown]
  [unknown]
  [unknown]
    tar [15187]
    171

Detaching...

A "--" is printed between the kernel and user stacks.


As a different example, here is the kernel function hrtimer_init_sleeper():

# ./stackcount.py -P -d hrtimer_init_sleeper
Tracing 1 functions for "hrtimer_init_sleeper"... Hit Ctrl-C to end.
^C
  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  [unknown]
    containerd [16020]
    1

  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  __pthread_cond_timedwait
  Monitor::IWait(Thread*, long)
  Monitor::wait(bool, long, bool)
  CompileQueue::get()
  CompileBroker::compiler_thread_loop()
  JavaThread::thread_main_inner()
  JavaThread::run()
  java_start(Thread*)
  start_thread
    java [4996]
    1

  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  [unknown]
  [unknown]
    containerd [16020]
    1

  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  __pthread_cond_timedwait
  VMThread::loop()
  VMThread::run()
  java_start(Thread*)
  start_thread
    java [4996]
    3

  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  [unknown]
    dockerd [16008]
    4

  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  [unknown]
  [unknown]
    dockerd [16008]
    4

  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  __pthread_cond_timedwait
  Lio/netty/util/ThreadDeathWatcher$Watcher;::run
  Interpreter
  Interpreter
  call_stub
  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)
  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)
  thread_entry(JavaThread*, Thread*)
  JavaThread::thread_main_inner()
  JavaThread::run()
  java_start(Thread*)
  start_thread
    java [4996]
    4

  hrtimer_init_sleeper
  do_futex
  SyS_futex
  entry_SYSCALL_64_fastpath
    --
  __pthread_cond_timedwait
  clock_gettime
  [unknown]
    java [4996]
    79

Detaching...

I was just trying to find a more interesting example. This output includes
some Java stacks where user-level has been walked correctly (even includes a
JIT symbol translation). dockerd and containerd don't have frame pointers
(grumble), but Java does (which is running with -XX:+PreserveFramePointer).


Here's another kernel function, ip_output():

# ./stackcount.py -P -d ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_push
  tcp_sendmsg
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  __vfs_write
  vfs_write
  SyS_write
  entry_SYSCALL_64_fastpath
    --
  __write_nocancel
  [unknown]
    sshd [15015]
    5

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_push
  tcp_sendmsg
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  __vfs_write
  vfs_write
  SyS_write
  entry_SYSCALL_64_fastpath
    --
  __write_nocancel
  [unknown]
  [unknown]
    sshd [8234]
    5

  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_push
  tcp_sendmsg
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  __vfs_write
  vfs_write
  SyS_write
  entry_SYSCALL_64_fastpath
    --
  __write_nocancel
    sshd [15015]
    7

Detaching...

This time just sshd is triggering ip_output() calls.


Watch what happens if I filter on kernel stacks only (-K) for ip_output():

# ./stackcount.py -K ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
  ip_output
  ip_queue_xmit
  tcp_transmit_skb
  tcp_write_xmit
  __tcp_push_pending_frames
  tcp_push
  tcp_sendmsg
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  __vfs_write
  vfs_write
  SyS_write
  entry_SYSCALL_64_fastpath
    13

Detaching...

They have grouped together as a single unique stack, since the kernel part
was the same.


Here is just the user stacks, fetched during the kernel function ip_output():

# ./stackcount.py -P -U ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
  [unknown]
    snmpd [1645]
    1

  __write_nocancel
  [unknown]
  [unknown]
    sshd [8234]
    3

  __write_nocancel
    sshd [15015]
    4

I should really run a custom sshd with frame pointers so we can see its
stack trace...


User-space functions can also be traced if a library name is provided. For
example, to quickly identify code locations that allocate heap memory for
PID 4902 (using -p), by tracing malloc from libc ("c:malloc"):

# ./stackcount -p 4902 c:malloc
Tracing 1 functions for "malloc"... Hit Ctrl-C to end.
^C
  malloc
  rbtree_new
  main
  [unknown]
    12

  malloc
  _rbtree_node_new_internal
  _rbtree_node_insert
  rbtree_insert
  main
  [unknown]
    1189

Detaching...

Kernel stacks are absent as this didn't enter kernel code.

Note that user-space uses of stackcount can be somewhat more limited because
a lot of user-space libraries and binaries are compiled without frame-pointers
as discussed earlier (the -fomit-frame-pointer compiler default) or are used
without debuginfo.


In addition to kernel and user-space functions, kernel tracepoints and USDT
tracepoints are also supported.

For example, to determine where threads are being created in a particular
process, use the pthread_create USDT tracepoint:

# ./stackcount -P -p $(pidof parprimes) u:pthread:pthread_create
Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end.
^C

    parprimes [11923]
  pthread_create@@GLIBC_2.2.5
  main
  __libc_start_main
  [unknown]
    7

You can use "readelf -n file" to see if it has USDT tracepoints.


Similarly, to determine where context switching is happening in the kernel,
use the sched:sched_switch kernel tracepoint:

# ./stackcount -P t:sched:sched_switch
  __schedule
  schedule
  worker_thread
  kthread
  ret_from_fork
    kworker/0:2 [25482]
    1

  __schedule
  schedule
  schedule_hrtimeout_range_clock
  schedule_hrtimeout_range
  ep_poll
  SyS_epoll_wait
  entry_SYSCALL_64_fastpath
  epoll_wait
  Lsun/nio/ch/SelectorImpl;::lockAndDoSelect
  Lsun/nio/ch/SelectorImpl;::select
  Lio/netty/channel/nio/NioEventLoop;::select
  Lio/netty/channel/nio/NioEventLoop;::run
  Interpreter
  Interpreter
  call_stub
  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)
  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)
  thread_entry(JavaThread*, Thread*)
  JavaThread::thread_main_inner()
  JavaThread::run()
  java_start(Thread*)
  start_thread
    java [4996]
    1

... (omitted for brevity)

  __schedule
  schedule
  schedule_preempt_disabled
  cpu_startup_entry
  xen_play_dead
  arch_cpu_idle_dead
  cpu_startup_entry
  cpu_bringup_and_idle
    swapper/1 [0]
    289


A -i option can be used to set an output interval, and -T to include a
timestamp. For example:

# ./stackcount.py -P -Tdi 1 submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.

06:05:13

06:05:14
  submit_bio
  xfs_do_writepage
  write_cache_pages
  xfs_vm_writepages
  do_writepages
  __writeback_single_inode
  writeback_sb_inodes
  __writeback_inodes_wb
  wb_writeback
  wb_workfn
  process_one_work
  worker_thread
  kthread
  ret_from_fork
    --
    kworker/u16:1 [15686]
    1

  submit_bio
  process_one_work
  worker_thread
  kthread
  ret_from_fork
    --
    kworker/u16:0 [16007]
    1

  submit_bio
  xfs_buf_submit
  xlog_bdstrat
  xlog_sync
  xlog_state_release_iclog
  _xfs_log_force
  xfs_log_force
  xfs_fs_sync_fs
  sync_fs_one_sb
  iterate_supers
  sys_sync
  entry_SYSCALL_64_fastpath
    --
  [unknown]
    sync [16039]
    1

  submit_bio
  submit_bh
  journal_submit_commit_record
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
    --
    jbd2/xvda1-8 [405]
    1

  submit_bio
  process_one_work
  worker_thread
  kthread
  ret_from_fork
    --
    kworker/0:2 [25482]
    2

  submit_bio
  ext4_writepages
  do_writepages
  __writeback_single_inode
  writeback_sb_inodes
  __writeback_inodes_wb
  wb_writeback
  wb_workfn
  process_one_work
  worker_thread
  kthread
  ret_from_fork
    --
    kworker/u16:0 [16007]
    4

  submit_bio
  xfs_vm_writepages
  do_writepages
  __writeback_single_inode
  writeback_sb_inodes
  __writeback_inodes_wb
  wb_writeback
  wb_workfn
  process_one_work
  worker_thread
  kthread
  ret_from_fork
    --
    kworker/u16:1 [15686]
    5

  submit_bio
  __block_write_full_page
  block_write_full_page
  blkdev_writepage
  __writepage
  write_cache_pages
  generic_writepages
  blkdev_writepages
  do_writepages
  __filemap_fdatawrite_range
  filemap_fdatawrite
  fdatawrite_one_bdev
  iterate_bdevs
  sys_sync
  entry_SYSCALL_64_fastpath
    --
  [unknown]
    sync [16039]
    7

  submit_bio
  submit_bh
  jbd2_journal_commit_transaction
  kjournald2
  kthread
  ret_from_fork
    --
    jbd2/xvda1-8 [405]
    8

  submit_bio
  ext4_bio_write_page
  mpage_submit_page
  mpage_map_and_submit_buffers
  ext4_writepages
  do_writepages
  __writeback_single_inode
  writeback_sb_inodes
  __writeback_inodes_wb
  wb_writeback
  wb_workfn
  process_one_work
  worker_thread
  kthread
  ret_from_fork
    --
    kworker/u16:0 [16007]
    8

  submit_bio
  __block_write_full_page
  block_write_full_page
  blkdev_writepage
  __writepage
  write_cache_pages
  generic_writepages
  blkdev_writepages
  do_writepages
  __writeback_single_inode
  writeback_sb_inodes
  __writeback_inodes_wb
  wb_writeback
  wb_workfn
  process_one_work
  worker_thread
  kthread
  ret_from_fork
    --
    kworker/u16:0 [16007]
    60


06:05:15

06:05:16

Detaching...

This only included output for the 06:05:14 interval. The other internals
did not span block device I/O.


The -s output prints the return instruction offset for each function (aka
symbol offset). Eg:

# ./stackcount.py -P -s tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
  tcp_sendmsg+0x1
  sock_sendmsg+0x38
  sock_write_iter+0x85
  __vfs_write+0xe3
  vfs_write+0xb8
  SyS_write+0x55
  entry_SYSCALL_64_fastpath+0x1e
  __write_nocancel+0x7
    sshd [15015]
    3

  tcp_sendmsg+0x1
  sock_sendmsg+0x38
  sock_write_iter+0x85
  __vfs_write+0xe3
  vfs_write+0xb8
  SyS_write+0x55
  entry_SYSCALL_64_fastpath+0x1e
  __write_nocancel+0x7
    sshd [8234]
    3

Detaching...

If it wasn't clear how one function called another, knowing the instruction
offset can help you locate the lines of code from a disassembly dump.


The -v output is verbose, and shows raw addresses:

./stackcount.py -P -v tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
  ffffffff817b05c1 tcp_sendmsg
  ffffffff8173ea48 sock_sendmsg
  ffffffff8173eae5 sock_write_iter
  ffffffff81232b33 __vfs_write
  ffffffff812331b8 vfs_write
  ffffffff81234625 SyS_write
  ffffffff818739bb entry_SYSCALL_64_fastpath
  7f120511e6e0     __write_nocancel
    sshd [8234]
    3

  ffffffff817b05c1 tcp_sendmsg
  ffffffff8173ea48 sock_sendmsg
  ffffffff8173eae5 sock_write_iter
  ffffffff81232b33 __vfs_write
  ffffffff812331b8 vfs_write
  ffffffff81234625 SyS_write
  ffffffff818739bb entry_SYSCALL_64_fastpath
  7f919c5a26e0     __write_nocancel
    sshd [15015]
    11

Detaching...


A wildcard can also be used. Eg, all functions beginning with "tcp_send",
kernel stacks only (-K) with offsets (-s):

# ./stackcount -Ks 'tcp_send*'
Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end.
^C
  tcp_send_delayed_ack0x1
  tcp_rcv_established0x3b1
  tcp_v4_do_rcv0x130
  tcp_v4_rcv0x8e0
  ip_local_deliver_finish0x9f
  ip_local_deliver0x51
  ip_rcv_finish0x8a
  ip_rcv0x29d
  __netif_receive_skb_core0x637
  __netif_receive_skb0x18
  netif_receive_skb_internal0x23
    1

  tcp_send_delayed_ack0x1
  tcp_rcv_established0x222
  tcp_v4_do_rcv0x130
  tcp_v4_rcv0x8e0
  ip_local_deliver_finish0x9f
  ip_local_deliver0x51
  ip_rcv_finish0x8a
  ip_rcv0x29d
  __netif_receive_skb_core0x637
  __netif_receive_skb0x18
  netif_receive_skb_internal0x23
    4

  tcp_send_mss0x1
  inet_sendmsg0x67
  sock_sendmsg0x38
  sock_write_iter0x78
  __vfs_write0xaa
  vfs_write0xa9
  sys_write0x46
  entry_SYSCALL_64_fastpath0x16
    7

  tcp_sendmsg0x1
  sock_sendmsg0x38
  sock_write_iter0x78
  __vfs_write0xaa
  vfs_write0xa9
  sys_write0x46
  entry_SYSCALL_64_fastpath0x16
    7

Detaching...

Use -r to allow regular expressions.


The -f option will emit folded output, which can be used as input to other
tools including flame graphs. For example, with delimiters as well:

# ./stackcount.py -P -df t:sched:sched_switch
^Csnmp-pass;[unknown];[unknown];[unknown];[unknown];[unknown];-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
kworker/7:0;-;ret_from_fork;kthread;worker_thread;schedule;__schedule 1
watchdog/0;-;ret_from_fork;kthread;smpboot_thread_fn;schedule;__schedule 1
snmp-pass;[unknown];[unknown];[unknown];[unknown];[unknown];-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
svscan;[unknown];-;entry_SYSCALL_64_fastpath;SyS_nanosleep;hrtimer_nanosleep;do_nanosleep;schedule;__schedule 1
python;[unknown];__select_nocancel;-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
kworker/2:0;-;ret_from_fork;kthread;worker_thread;schedule;__schedule 1
[...]

Flame graphs visualize stack traces. For information about them and links
to open source software, see http://www.brendangregg.com/flamegraphs.html .
This folded output can be piped directly into flamegraph.pl (the Perl version).


USAGE message:

# ./stackcount -h
usage: stackcount [-h] [-p PID] [-c CPU] [-i INTERVAL] [-D DURATION] [-T] [-r]
                  [-s] [-P] [-K] [-U] [-v] [-d] [-f] [--debug]
                  pattern

Count events and their stack traces

positional arguments:
  pattern               search expression for events

optional arguments:
  -h, --help            show this help message and exit
  -p PID, --pid PID     trace this PID only
  -c CPU, --cpu CPU     trace this CPU only
  -i INTERVAL, --interval INTERVAL
                        summary interval, seconds
  -D DURATION, --duration DURATION
                        total duration of trace, seconds
  -T, --timestamp       include timestamp on output
  -r, --regexp          use regular expressions. Default is "*" wildcards
                        only.
  -s, --offset          show address offsets
  -P, --perpid          display stacks separately for each process
  -K, --kernel-stacks-only
                        kernel stack only
  -U, --user-stacks-only
                        user stack only
  -v, --verbose         show raw addresses
  -d, --delimited       insert delimiter between kernel/user stacks
  -f, --folded          output folded format
  --debug               print BPF program before starting (for debugging
                        purposes)

examples:
    ./stackcount submit_bio         # count kernel stack traces for submit_bio
    ./stackcount -d ip_output       # include a user/kernel stack delimiter
    ./stackcount -s ip_output       # show symbol offsets
    ./stackcount -sv ip_output      # show offsets and raw addresses (verbose)
    ./stackcount 'tcp_send*'        # count stacks for funcs matching tcp_send*
    ./stackcount -r '^tcp_send.*'   # same as above, using regular expressions
    ./stackcount -Ti 5 ip_output    # output every 5 seconds, with timestamps
    ./stackcount -p 185 ip_output   # count ip_output stacks for PID 185 only
    ./stackcount -p 185 c:malloc    # count stacks for malloc in PID 185
    ./stackcount t:sched:sched_fork # count stacks for sched_fork tracepoint
    ./stackcount -p 185 u:node:*    # count stacks for all USDT probes in node
    ./stackcount -c 1 put_prev_entity   # count put_prev_entity stacks for CPU 1 only
    ./stackcount -K t:sched:sched_switch   # kernel stacks only
    ./stackcount -U t:sched:sched_switch   # user stacks only