1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
|
Demonstrations of perf-stat-hist, the Linux perf_events version.
Tracing the net:net_dev_xmit tracepoint, and building a power-of-4 histogram
for the "len" variable, for 10 seconds:
# ./perf-stat-hist net:net_dev_xmit len 10
Tracing net:net_dev_xmit, power-of-4, max 1048576, for 10 seconds...
Range : Count Distribution
0 : 0 | |
1 -> 3 : 0 | |
4 -> 15 : 0 | |
16 -> 63 : 2 |# |
64 -> 255 : 30 |### |
256 -> 1023 : 3 |# |
1024 -> 4095 : 446 |######################################|
4096 -> 16383 : 0 | |
16384 -> 65535 : 0 | |
65536 -> 262143 : 0 | |
262144 -> 1048575 : 0 | |
1048576 -> : 0 | |
This showed that most of the network transmits were between 1024 and 4095 bytes,
with a handful between 64 and 255 bytes.
Cat the format file for the tracepoint to see what other variables are available
to trace. Eg:
# cat /sys/kernel/debug/tracing/events/net/net_dev_xmit/format
name: net_dev_xmit
ID: 1078
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0;
field:unsigned int len; offset:16; size:4; signed:0;
field:int rc; offset:20; size:4; signed:1;
field:__data_loc char[] name; offset:24; size:4; signed:1;
print fmt: "dev=%s skbaddr=%p len=%u rc=%d", __get_str(name), REC->skbaddr, REC->len, REC->rc
That's where "len" came from.
This works by creating a series of tracepoint and filter pairs for each
histogram bucket, and doing in-kernel counts. The overhead should in many cases
be better than user space post-processing, however, this approach is still
not ideal. I've called it a "perf hacktogram". The overhead is relative to
the frequency of events, multiplied by the number of buckets. You can modify
the script to use power-of-2 instead, or whatever you like, but the overhead
for more buckets will be higher.
Histogram of the returned read() syscall sizes:
# ./perf-stat-hist syscalls:sys_exit_read ret 10
Tracing syscalls:sys_exit_read, power-of-4, max 1048576, for 10 seconds...
Range : Count Distribution
0 : 90 |# |
1 -> 3 : 9587 |######################################|
4 -> 15 : 69 |# |
16 -> 63 : 590 |### |
64 -> 255 : 250 |# |
256 -> 1023 : 389 |## |
1024 -> 4095 : 296 |## |
4096 -> 16383 : 183 |# |
16384 -> 65535 : 12 |# |
65536 -> 262143 : 0 | |
262144 -> 1048575 : 0 | |
1048576 -> : 0 | |
Most of our read()s were tiny, between 1 and 3 bytes.
Using power-of-2, and a max of 1024:
# ./perf-stat-hist -P 2 -m 1024 syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, power-of-2, max 1024, until Ctrl-C...
^C
Range : Count Distribution
-> -1 : 29 |## |
0 -> 0 : 1 |# |
1 -> 1 : 959 |######################################|
2 -> 3 : 1 |# |
4 -> 7 : 0 | |
8 -> 15 : 2 |# |
16 -> 31 : 14 |# |
32 -> 63 : 1 |# |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 1 |# |
1024 -> : 1 |# |
Specifying custom bucket sizes:
# ./perf-stat-hist -b "10 50 100 5000" syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, specified buckets, until Ctrl-C...
^C
Range : Count Distribution
-> 9 : 989 |######################################|
10 -> 49 : 5 |# |
50 -> 99 : 0 | |
100 -> 4999 : 2 |# |
5000 -> : 0 | |
Specifying a single value to bifurcate statistics:
# ./perf-stat-hist -b 10 syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, specified buckets, until Ctrl-C...
^C
Range : Count Distribution
-> 9 : 2959 |######################################|
10 -> : 7 |# |
This has the lowest overhead for collection, since only two tracepoint
filter pairs are used.
Use -h to print the USAGE message:
# ./perf-stat-hist -h
USAGE: perf-stat-hist [-h] [-b buckets|-P power] [-m max] tracepoint
variable [seconds]
-b buckets # specify histogram bucket points
-P power # power-of (default is 4)
-m max # max value for power-of
-h # this usage message
eg,
perf-stat-hist syscalls:sys_enter_read count 5
# read() request histogram, 5 seconds
perf-stat-hist syscalls:sys_exit_read ret 5
# read() return histogram, 5 seconds
perf-stat-hist -P 10 syscalls:sys_exit_read ret 5
# ... use power-of-10
perf-stat-hist -P 2 -m 1024 syscalls:sys_exit_read ret 5
# ... use power-of-2, max 1024
perf-stat-hist -b "10 50 100 500" syscalls:sys_exit_read ret 5
# ... histogram based on these bucket ranges
perf-stat-hist -b 10 syscalls:sys_exit_read ret 5
# ... bifurcate by the value 10 (lowest overhead)
See the man page and example file for more info.
|