1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341
|
Demonstrations of functrace, the Linux ftrace version.
A (usually) good example to start with is do_nanosleep(), since it is not called
frequently, and easily triggered. Here's tracing it using functrace:
# ./functrace 'do_nanosleep'
Tracing "do_nanosleep"... Ctrl-C to end.
svscan-1678 [000] .... 6412438.703521: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [000] .... 6412443.703678: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [000] .... 6412448.703865: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412453.216241: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [000] .... 6412453.704049: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412454.216524: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412455.216816: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412456.217093: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412457.217378: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412458.217660: do_nanosleep <-hrtimer_nanosleep
^C
Ending tracing...
While tracing, I ran a "vmstat 1" in another window. vmstat and its process ID
can be seen as the 1st column, and the timestamp and one second intervals can
be seen as the 4th column.
This is basic details: who was on-CPU (process name and PID), flags, timestamp,
and calling function. Treat this as the next step, after funccount, for getting
a little more information on kernel function execution, before using more
capabilities to dig further.
This is Linux 3.16, and the output is the ftrace text buffer format, which has
changed slightly between kernel versions.
To see the column headers, use -H. This is Linux 3.16:
# ./functrace -H do_nanosleep
Tracing "do_nanosleep"... Ctrl-C to end.
# tracer: function
#
# entries-in-buffer/entries-written: 0/0 #P:2
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
svscan-1678 [001] .... 6413283.729520: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [001] .... 6413288.729679: do_nanosleep <-hrtimer_nanosleep
For comparison, here's Linux 3.2:
# ./functrace -H do_nanosleep
Tracing "do_nanosleep"... Ctrl-C to end.
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
vmstat-11789 [000] 1763207.021204: do_nanosleep <-hrtimer_nanosleep
vmstat-11789 [000] 1763208.022970: do_nanosleep <-hrtimer_nanosleep
vmstat-11789 [000] 1763209.023267: do_nanosleep <-hrtimer_nanosleep
For documentation on the exact format, see the Linux kernel source under
Documentation/trace/ftrace.txt.
This error:
# ./functrace 'ext4_z*'
Tracing "ext4_z*"... Ctrl-C to end.
./functrace: line 136: echo: write error: Invalid argument
ERROR: enabling "ext4_z*". Exiting.
Is because there were no functions beginning with "ext4_z". You can check
available functions in the /sys/kernel/debug/tracing/available_filter_functions
file.
You might want to use funccount to check the frequency of events before using
functrace. For example, counting ext3 events on a system:
# ./funccount -d 10 'ext3*'
Tracing "ext3*" for 10 seconds...
FUNC COUNT
ext3_journal_dirty_data 1
ext3_ordered_write_end 1
ext3_write_begin 1
ext3_writepage_trans_blocks 1
ext3_dirty_inode 2
ext3_do_update_inode 2
ext3_get_group_desc 2
ext3_get_inode_block.isra.20 2
ext3_get_inode_flags 2
ext3_get_inode_loc 2
ext3_mark_iloc_dirty 2
ext3_mark_inode_dirty 2
ext3_reserve_inode_write 2
ext3_journal_start_sb 3
ext3_block_to_path.isra.22 6
ext3_bmap 6
ext3_get_block 6
ext3_get_blocks_handle 6
ext3_get_branch 6
ext3_discard_reservation 11
ext3_ioctl 11
ext3_release_file 11
Ending tracing...
During 10 seconds, there weren't many ext3 calls. I might consider tracing
them all (warnings about dynamic tracing many kernel functions apply: test
before use, as in the past there have been bugs causing panics).
# ./functrace 'ext3_*'
Tracing "ext3_*"... Ctrl-C to end.
register_start.-17008 [000] 1763557.577985: ext3_release_file <-__fput
register_start.-17008 [000] 1763557.577987: ext3_discard_reservation <-ext3_release_file
register_start.-17026 [000] 1763558.163620: ext3_ioctl <-file_ioctl
register_start.-17026 [000] 1763558.481081: ext3_release_file <-__fput
register_start.-17026 [000] 1763558.481083: ext3_discard_reservation <-ext3_release_file
register_start.-17041 [000] 1763559.186984: ext3_ioctl <-file_ioctl
register_start.-17041 [000] 1763559.511267: ext3_release_file <-__fput
[...]
For comparison, here's a different system and ext4:
# ./funccount -d 10 'ext4*'
Tracing "ext4*" for 10 seconds...
FUNC COUNT
ext4_journal_commit_callback 2
ext4_htree_fill_tree 6
ext4_htree_free_dir_info 6
ext4_release_dir 6
ext4_readdir 12
ext4fs_dirhash 29
ext4_htree_store_dirent 29
ext4_follow_link 36
ext4_file_mmap 42
ext4_free_data_callback 44
ext4_getattr 45
ext4_bmap 62
ext4_get_block 62
ext4_add_entry 280
ext4_add_nondir 280
ext4_alloc_da_blocks 280
ext4_alloc_inode 280
ext4_bio_write_page 280
ext4_can_truncate 280
ext4_claim_free_clusters 280
ext4_clear_inode 280
ext4_create 280
ext4_da_get_block_prep 280
ext4_da_invalidatepage 280
ext4_da_update_reserve_space 280
ext4_da_write_begin 280
ext4_da_write_end 280
ext4_dec_count.isra.22 280
ext4_delete_entry 280
ext4_destroy_inode 280
ext4_drop_inode 280
ext4_end_bio 280
ext4_es_init_tree 280
ext4_es_lru_del 280
ext4_evict_inode 280
ext4_ext_calc_metadata_amount 280
ext4_ext_correct_indexes 280
ext4_ext_find_goal 280
ext4_ext_insert_extent 280
ext4_ext_remove_space 280
ext4_ext_tree_init 280
ext4_ext_truncate 280
ext4_ext_truncate_extend_resta 280
ext4_ext_try_to_merge 280
ext4_ext_try_to_merge_right 280
ext4_file_write_iter 280
ext4_find_dest_de 280
ext4_finish_bio 280
ext4_free_blocks 280
ext4_free_inode 280
ext4_generic_delete_entry 280
ext4_has_free_clusters 280
ext4_i_callback 280
ext4_init_acl 280
ext4_init_security 280
ext4_inode_attach_jinode 280
ext4_inode_to_goal_block 280
ext4_insert_dentry 280
ext4_invalidatepage 280
ext4_io_submit_init 280
ext4_itable_unused_count 280
ext4_lookup 280
ext4_mb_complex_scan_group 280
ext4_mb_find_by_goal 280
ext4_mb_free_metadata 280
ext4_mb_initialize_context 280
ext4_mb_mark_diskspace_used 280
ext4_mb_new_blocks 280
ext4_mb_normalize_request 280
ext4_mb_regular_allocator 280
ext4_mb_release_context 280
ext4_mb_use_best_found 280
ext4_mb_use_preallocated 280
ext4_nonda_switch 280
ext4_orphan_del 280
ext4_put_io_end_defer 280
ext4_releasepage 280
ext4_rename 280
ext4_set_aops 280
ext4_setent 280
ext4_set_inode_flags 280
ext4_truncate 280
ext4_writepages 280
ext4_writepage_trans_blocks 280
ext4_xattr_delete_inode 280
ext4_xattr_get 285
ext4_xattr_ibody_get 285
ext4_xattr_security_get 285
ext4_bread 286
ext4_release_file 288
ext4_file_open 305
ext4_superblock_csum_set 494
ext4_block_bitmap_csum_set 560
ext4_es_free_extent 560
ext4_es_insert_extent 560
ext4_es_remove_extent 560
ext4_ext_find_extent 560
ext4_ext_map_blocks 560
ext4_free_group_clusters_set 560
ext4_free_inodes_set 560
ext4_get_group_no_and_offset 560
ext4_get_reserved_space 560
ext4_init_io_end 560
ext4_inode_bitmap_csum_set 560
ext4_io_submit 560
ext4_mb_good_group 560
ext4_orphan_add 560
ext4_put_io_end 560
ext4_read_block_bitmap 560
ext4_read_block_bitmap_nowait 560
ext4_read_inode_bitmap 560
ext4_release_io_end 560
ext4_set_bits 560
ext4_validate_block_bitmap 560
ext4_wait_block_bitmap 560
ext4_mb_load_buddy 604
ext4_mb_unload_buddy.isra.24 604
ext4_block_bitmap 840
ext4_discard_preallocations 840
ext4_ext_drop_refs 840
ext4_ext_get_access.isra.30 840
ext4_ext_index_trans_blocks 840
ext4_find_entry 840
ext4_free_group_clusters 840
ext4_handle_dirty_dirent_node 840
ext4_inode_bitmap 840
ext4_meta_trans_blocks 840
ext4_dirty_inode 845
ext4_free_inodes_count 1120
ext4_group_desc_csum 1120
ext4_group_desc_csum_set 1120
ext4_getblk 1126
ext4_map_blocks 1468
ext4_es_lookup_extent 1748
ext4_mb_check_limits 1875
ext4_es_lru_add 2028
ext4_data_block_valid 2308
ext4_journal_check_start 3085
ext4_mark_inode_dirty 5325
ext4_get_inode_flags 5951
ext4_get_inode_loc 5951
ext4_mark_iloc_dirty 5951
ext4_reserve_inode_write 5951
ext4_inode_table 7071
ext4_get_group_desc 8471
ext4_has_inline_data 9486
Ending tracing...
There are many functions called frequently. Tracing them all may cost
significant performance overhead. I may read through this list and look for
the most interesting functions to trace, reducing overheads by only selecting
a few.
For example, ext4_create() looks interesting:
# ./functrace ext4_create
Tracing "ext4_create"... Ctrl-C to end.
supervise-1681 [000] .... 6414396.700163: ext4_create <-vfs_create
supervise-1684 [001] .... 6414396.700287: ext4_create <-vfs_create
supervise-1681 [000] .... 6414396.700598: ext4_create <-vfs_create
supervise-1684 [001] .... 6414396.700636: ext4_create <-vfs_create
supervise-1687 [001] .... 6414396.701577: ext4_create <-vfs_create
supervise-1688 [000] .... 6414396.702590: ext4_create <-vfs_create
supervise-1693 [001] .... 6414396.702829: ext4_create <-vfs_create
supervise-1693 [001] .... 6414396.703592: ext4_create <-vfs_create
supervise-1688 [000] .... 6414396.703598: ext4_create <-vfs_create
supervise-1687 [001] .... 6414396.703988: ext4_create <-vfs_create
supervise-1685 [001] .... 6414396.704126: ext4_create <-vfs_create
supervise-1685 [001] .... 6414396.704458: ext4_create <-vfs_create
supervise-1682 [001] .... 6414396.704577: ext4_create <-vfs_create
supervise-1683 [000] .... 6414396.704984: ext4_create <-vfs_create
supervise-1682 [001] .... 6414396.704985: ext4_create <-vfs_create
[...]
Now I know that different PIDs of the supervise program are calling ext4_create,
of around the same time, and from vfs_create().
The duration mode uses buffering, instead of printing events as they occur.
This greatly reduces overheads. For example:
# ./functrace -d 10 ext4_create > out.ext4_create
# wc out.ext4_create
283 1687 21059 out.ext4_create
Note that the buffer has a limited size. Check the timestamps to see if the
range does not match your duration, as one clue that the buffer was exhausted
and events were missed.
Use -h to print the USAGE message:
# ./functrace -h
USAGE: functrace [-hH] [-p PID] [-L TID] [-d secs] funcstring
-d seconds # trace duration, and use buffers
-h # this usage message
-H # include column headers
-p PID # trace when this pid is on-CPU
-L TID # trace when this thread is on-CPU
eg,
functrace do_nanosleep # trace the do_nanosleep() function
functrace '*sleep' # trace functions ending in "sleep"
functrace -p 198 'vfs*' # trace "vfs*" funcs for PID 198
functrace 'tcp*' > out # trace all "tcp*" funcs to out file
functrace -d 1 'tcp*' > out # trace 1 sec, then write out file
See the man page and example file for more info.
|