1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923
|
# Simpleperf
Simpleperf is a native profiling tool for Android. It can be used to profile
both Android applications and native processes running on Android. It can
profile both Java and C++ code on Android. It can be used on Android L
and above.
Simpleperf is part of the Android Open Source Project. The source code is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/).
The latest document is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/README.md).
Bugs and feature requests can be submitted at http://github.com/android-ndk/ndk/issues.
## Table of Contents
- [Simpleperf introduction](#simpleperf-introduction)
- [Why simpleperf](#why-simpleperf)
- [Tools in simpleperf](#tools-in-simpleperf)
- [Simpleperf's profiling principle](#simpleperfs-profiling-principle)
- [Main simpleperf commands](#main-simpleperf-commands)
- [Simpleperf list](#simpleperf-list)
- [Simpleperf stat](#simpleperf-stat)
- [Simpleperf record](#simpleperf-record)
- [Simpleperf report](#simpleperf-report)
- [Android application profiling](#android-application-profiling)
- [Prepare an Android application](#prepare-an-android-application)
- [Record and report profiling data (using command-lines)](#record-and-report-profiling-data-using-commandlines)
- [Record and report profiling data (using python scripts)](#record-and-report-profiling-data-using-python-scripts)
- [Record and report call graph](#record-and-report-call-graph)
- [Visualize profiling data](#visualize-profiling-data)
- [Annotate source code](#annotate-source-code)
- [Answers to common issues](#answers-to-common-issues)
- [The correct way to pull perf.data on host](#the-correct-way-to-pull-perfdata-on-host)
## Simpleperf introduction
### Why simpleperf
Simpleperf works similar to linux-tools-perf, but it has some specific features for
Android profiling:
1. Aware of Android environment
a. It can profile embedded shared libraries in apk.
b. It reads symbols and debug information from .gnu_debugdata section.
c. It gives suggestions when errors occur.
d. When recording with -g option, unwind the stack before writting to file to
save storage space.
e. It supports adding additional information (like symbols) in perf.data, to
support recording on device and reporting on host.
2. Using python scripts for profiling tasks
3. Easy to release
a. Simpleperf executables on device are built as static binaries. They can be
pushed on any Android device and run.
b. Simpleperf executables on host are built as static binaries, and support
different hosts: mac, linux and windows.
### Tools in simpleperf
Simpleperf is periodically released with Android ndk, located at `simpleperf/`.
The latest release can be found [here](https://android.googlesource.com/platform/prebuilts/simpleperf/).
Simpleperf tools contain executables, shared libraries and python scripts.
**Simpleperf executables running on Android device**
Simpleperf executables running on Android device are located at `bin/android/`.
Each architecture has one executable, like `bin/android/arm64/simpleperf`. It
can record and report profiling data. It provides a command-line interface
broadly the same as the linux-tools perf, and also supports some additional
features for Android-specific profiling.
**Simpleperf executables running on hosts**
Simpleperf executables running on hosts are located at `bin/darwin`, `bin/linux`
and `bin/windows`. Each host and architecture has one executable, like
`bin/linux/x86_64/simpleperf`. It provides a command-line interface for
reporting profiling data on hosts.
**Simpleperf report shared libraries used on host**
Simpleperf report shared libraries used on host are located at `bin/darwin`,
`bin/linux` and `bin/windows`. Each host and architecture has one library, like
`bin/linux/x86_64/libsimpleperf_report.so`. It is a library for parsing
profiling data.
**Python scripts**
Python scripts are written to help different profiling tasks.
`annotate.py` is used to annotate source files based on profiling data.
`app_profiler.py` is used to profile Android applications.
`binary_cache_builder.py` is used to pull libraries from Android devices.
`pprof_proto_generator.py` is used to convert profiling data to format used by pprof.
`report.py` is used to provide a GUI interface to report profiling result.
`report_sample.py` is used to generate flamegraph.
`simpleperf_report_lib.py` provides a python interface for parsing profiling data.
### Simpleperf's profiling principle
Modern CPUs have a hardware component called the performance monitoring unit
(PMU). The PMU has several hardware counters, counting events like how many cpu
cycles have happened, how many instructions have executed, or how many cache
misses have happened.
The Linux kernel wraps these hardware counters into hardware perf events. In
addition, the Linux kernel also provides hardware independent software events
and tracepoint events. The Linux kernel exposes all this to userspace via the
perf_event_open system call, which simpleperf uses.
Simpleperf has three main functions: stat, record and report.
The stat command gives a summary of how many events have happened in the
profiled processes in a time period. Here’s how it works:
1. Given user options, simpleperf enables profiling by making a system call to
linux kernel.
2. Linux kernel enables counters while scheduling on the profiled processes.
3. After profiling, simpleperf reads counters from linux kernel, and reports a
counter summary.
The record command records samples of the profiled process in a time period.
Here’s how it works:
1. Given user options, simpleperf enables profiling by making a system call to
linux kernel.
2. Simpleperf creates mapped buffers between simpleperf and linux kernel.
3. Linux kernel enable counters while scheduling on the profiled processes.
4. Each time a given number of events happen, linux kernel dumps a sample to a
mapped buffer.
5. Simpleperf reads samples from the mapped buffers and generates perf.data.
The report command reads a "perf.data" file and any shared libraries used by
the profiled processes, and outputs a report showing where the time was spent.
### Main simpleperf commands
Simpleperf supports several subcommands, including list, stat, record and report.
Each subcommand supports different options. This section only covers the most
important subcommands and options. To see all subcommands and options,
use --help.
# List all subcommands.
$ simpleperf --help
# Print help message for record subcommand.
$ simpleperf record --help
#### Simpleperf list
simpleperf list is used to list all events available on the device. Different
devices may support different events because of differences in hardware and
kernel.
$ simpleperf list
List of hw-cache events:
branch-loads
...
List of hardware events:
cpu-cycles
instructions
...
List of software events:
cpu-clock
task-clock
...
#### Simpleperf stat
simpleperf stat is used to get a raw event counter information of the profiled program
or system-wide. By passing options, we can select which events to use, which
processes/threads to monitor, how long to monitor and the print interval.
Below is an example.
# Stat using default events (cpu-cycles,instructions,...), and monitor
# process 7394 for 10 seconds.
$ simpleperf stat -p 7394 --duration 10
Performance counter statistics:
1,320,496,145 cpu-cycles # 0.131736 GHz (100%)
510,426,028 instructions # 2.587047 cycles per instruction (100%)
4,692,338 branch-misses # 468.118 K/sec (100%)
886.008130(ms) task-clock # 0.088390 cpus used (100%)
753 context-switches # 75.121 /sec (100%)
870 page-faults # 86.793 /sec (100%)
Total test time: 10.023829 seconds.
**Select events**
We can select which events to use via -e option. Below are examples:
# Stat event cpu-cycles.
$ simpleperf stat -e cpu-cycles -p 11904 --duration 10
# Stat event cache-references and cache-misses.
$ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
When running the stat command, if the number of hardware events is larger than
the number of hardware counters available in the PMU, the kernel shares hardware
counters between events, so each event is only monitored for part of the total
time. In the example below, there is a percentage at the end of each row,
showing the percentage of the total time that each event was actually monitored.
# Stat using event cache-references, cache-references:u,....
$ simpleperf stat -p 7394 -e cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
Performance counter statistics:
4,331,018 cache-references # 4.861 M/sec (87%)
3,064,089 cache-references:u # 3.439 M/sec (87%)
1,364,959 cache-references:k # 1.532 M/sec (87%)
91,721 cache-misses # 102.918 K/sec (87%)
45,735 cache-misses:u # 51.327 K/sec (87%)
38,447 cache-misses:k # 43.131 K/sec (87%)
9,688,515 instructions # 10.561 M/sec (89%)
Total test time: 1.026802 seconds.
In the example above, each event is monitored about 87% of the total time. But
there is no guarantee that any pair of events are always monitored at the same
time. If we want to have some events monitored at the same time, we can use
--group option. Below is an example.
# Stat using event cache-references, cache-references:u,....
$ simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
Performance counter statistics:
3,638,900 cache-references # 4.786 M/sec (74%)
65,171 cache-misses # 1.790953% miss rate (74%)
2,390,433 cache-references:u # 3.153 M/sec (74%)
32,280 cache-misses:u # 1.350383% miss rate (74%)
879,035 cache-references:k # 1.251 M/sec (68%)
30,303 cache-misses:k # 3.447303% miss rate (68%)
8,921,161 instructions # 10.070 M/sec (86%)
Total test time: 1.029843 seconds.
**Select target to monitor**
We can select which processes or threads to monitor via -p option or -t option.
Monitoring a process is the same as monitoring all threads in the process.
Simpleperf can also fork a child process to run the new command and then monitor
the child process. Below are examples.
# Stat process 11904 and 11905.
$ simpleperf stat -p 11904,11905 --duration 10
# Stat thread 11904 and 11905.
$ simpleperf stat -t 11904,11905 --duration 10
# Start a child process running `ls`, and stat it.
$ simpleperf stat ls
**Decide how long to monitor**
When monitoring existing threads, we can use --duration option to decide how long
to monitor. When monitoring a child process running a new command, simpleperf
monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
at any time. Below are examples.
# Stat process 11904 for 10 seconds.
$ simpleperf stat -p 11904 --duration 10
# Stat until the child process running `ls` finishes.
$ simpleperf stat ls
# Stop monitoring using Ctrl-C.
$ simpleperf stat -p 11904 --duration 10
^C
**Decide the print interval**
When monitoring perf counters, we can also use --interval option to decide the print
interval. Below are examples.
# Print stat for process 11904 every 300ms.
$ simpleperf stat -p 11904 --duration 10 --interval 300
# Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
# system wide profiling needs root privilege
$ su 0 simpleperf stat -a --duration 10 --interval 300
**Display counters in systrace**
simpleperf can also work with systrace to dump counters in the collected trace.
Below is an example to do a system wide stat
# capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
$ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
# on host launch systrace to collect trace for 10 seconds
(HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
# open the collected new.html in browser and perf counters will be shown up
#### Simpleperf record
simpleperf record is used to dump records of the profiled program. By passing
options, we can select which events to use, which processes/threads to monitor,
what frequency to dump records, how long to monitor, and where to store records.
# Record on process 7394 for 10 seconds, using default event (cpu-cycles),
# using default sample frequency (4000 samples per second), writing records
# to perf.data.
$ simpleperf record -p 7394 --duration 10
simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
**Select events**
In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
As a hardware event, it is both accurate and efficient. We can also use other
events via -e option. Below is an example.
# Record using event instructions.
$ simpleperf record -e instructions -p 11904 --duration 10
**Select target to monitor**
The way to select target in record command is similar to that in stat command.
Below are examples.
# Record process 11904 and 11905.
$ simpleperf record -p 11904,11905 --duration 10
# Record thread 11904 and 11905.
$ simpleperf record -t 11904,11905 --duration 10
# Record a child process running `ls`.
$ simpleperf record ls
**Set the frequency to record**
We can set the frequency to dump records via the -f or -c options. For example,
-f 4000 means dumping approximately 4000 records every second when the monitored
thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
records every second. Another way is using -c option. For example, -c 10000
means dumping one record whenever 10000 events happen. Below are examples.
# Record with sample frequency 1000: sample 1000 times every second running.
$ simpleperf record -f 1000 -p 11904,11905 --duration 10
# Record with sample period 100000: sample 1 time every 100000 events.
$ simpleperf record -c 100000 -t 11904,11905 --duration 10
**Decide how long to monitor**
The way to decide how long to monitor in record command is similar to that in
stat command. Below are examples.
# Record process 11904 for 10 seconds.
$ simpleperf record -p 11904 --duration 10
# Record until the child process running `ls` finishes.
$ simpleperf record ls
# Stop monitoring using Ctrl-C.
$ simpleperf record -p 11904 --duration 10
^C
**Set the path to store records**
By default, simpleperf stores records in perf.data in current directory. We can
use -o option to set the path to store records. Below is an example.
# Write records to data/perf2.data.
$ simpleperf record -p 11904 -o data/perf2.data --duration 10
#### Simpleperf report
simpleperf report is used to report based on perf.data generated by simpleperf
record command. Report command groups records into different sample entries,
sorts sample entries based on how many events each sample entry contains, and
prints out each sample entry. By passing options, we can select where to find
perf.data and executable binaries used by the monitored program, filter out
uninteresting records, and decide how to group records.
Below is an example. Records are grouped into 4 sample entries, each entry is
a row. There are several columns, each column shows piece of information
belonging to a sample entry. The first column is Overhead, which shows the
percentage of events inside current sample entry in total events. As the
perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
time used in each function.
# Reports perf.data, using only records sampled in libsudo-game-jni.so,
# grouping records using thread name(comm), process id(pid), thread id(tid),
# function name(symbol), and showing sample count for each row.
$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
Arch: arm64
Event: cpu-cycles (type 0, config 0)
Samples: 28235
Event count: 546356211
Overhead Sample Command Pid Tid Symbol
59.25% 16680 sudogame 7394 7394 checkValid(Board const&, int, int)
20.42% 5620 sudogame 7394 7394 canFindSolution_r(Board&, int, int)
13.82% 4088 sudogame 7394 7394 randomBlock_r(Board&, int, int, int, int, int)
6.24% 1756 sudogame 7394 7394 @plt
**Set the path to read records**
By default, simpleperf reads perf.data in current directory. We can use -i
option to select another file to read records.
$ simpleperf report -i data/perf2.data
**Set the path to find executable binaries**
If reporting function symbols, simpleperf needs to read executable binaries
used by the monitored processes to get symbol table and debug information. By
default, the paths are the executable binaries used by monitored processes while
recording. However, these binaries may not exist when reporting or not contain
symbol table and debug information. So we can use --symfs to redirect the paths.
Below is an example.
$ simpleperf report
# In this case, when simpleperf wants to read executable binary /A/b,
# it reads file in /A/b.
$ simpleperf report --symfs /debug_dir
# In this case, when simpleperf wants to read executable binary /A/b,
# it prefers file in /debug_dir/A/b to file in /A/b.
**Filter records**
When reporting, it happens that not all records are of interest. Simpleperf
supports five filters to select records of interest. Below are examples.
# Report records in threads having name sudogame.
$ simpleperf report --comms sudogame
# Report records in process 7394 or 7395
$ simpleperf report --pids 7394,7395
# Report records in thread 7394 or 7395.
$ simpleperf report --tids 7394,7395
# Report records in libsudo-game-jni.so.
$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
# Report records in function checkValid or canFindSolution_r.
$ simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
**Decide how to group records into sample entries**
Simpleperf uses --sort option to decide how to group sample entries. Below are
examples.
# Group records based on their process id: records having the same process
# id are in the same sample entry.
$ simpleperf report --sort pid
# Group records based on their thread id and thread comm: records having
# the same thread id and thread name are in the same sample entry.
$ simpleperf report --sort tid,comm
# Group records based on their binary and function: records in the same
# binary and function are in the same sample entry.
$ simpleperf report --sort dso,symbol
# Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
# thread, and belong to the same function in the same binary.
$ simpleperf report
## Android application profiling
This section shows how to profile an Android application.
[Here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/README.md) are examples. And we use
[SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) project to show the profiling results.
Simpleperf only supports profiling native instructions in binaries in ELF
format. If the Java code is executed by interpreter, or with jit cache, it
can’t be profiled by simpleperf. As Android supports Ahead-of-time compilation,
it can compile Java bytecode into native instructions with debug information.
On devices with Android version <= M, we need root privilege to compile Java
bytecode with debug information. However, on devices with Android version >= N,
we don't need root privilege to do so.
Profiling an Android application involves three steps:
1. Prepare the application.
2. Record profiling data.
3. Report profiling data.
To profile, we can use either command lines or python scripts. Below shows both.
### Prepare an Android application
Before profiling, we need to install the application to be profiled on an Android device.
To get valid profiling results, please check following points:
**1. The application should be debuggable.**
It means [android:debuggable](https://developer.android.com/guide/topics/manifest/application-element.html#debug)
should be true. So we need to use debug [build type](https://developer.android.com/studio/build/build-variants.html#build-types)
instead of release build type. It is understandable because we can't profile others' apps.
However, on a rooted Android device, the application doesn't need to be debuggable.
**2. Run on an Android device >= L.**
Profiling on emulators are not yet supported. And to profile Java code, we need
the jvm running in oat mode, which is only available >= L.
**3. On Android O, add `wrap.sh` in the apk.**
To profile Java code, we need the jvm running in oat mode. But on Android O,
debuggable applications are forced to run in jit mode. To work around this,
we need to add a `wrap.sh` in the apk. So if you are running on Android O device,
Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
for how to add `wrap.sh` in the apk.
**4. Make sure C++ code is compiled with optimizing flags.**
If the application contains C++ code, it can be compiled with -O0 flag in debug build type.
This makes C++ code slow. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
for how to avoid that.
**5. Use native libraries with debug info in the apk when possible.**
If the application contains C++ code or pre-compiled native libraries, try to use
unstripped libraries in the apk. This helps simpleperf generating better profiling
results. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
for how to use unstripped libraries.
Here we use [SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) as an example.
It builds an app-profiling.apk for profiling.
$ git clone https://android.googlesource.com/platform/system/extras
$ cd extras/simpleperf/demo
# Open SimpleperfExamplesPureJava project with Android studio,
# and build this project sucessfully, otherwise the `./gradlew` command below will fail.
$ cd SimpleperfExamplePureJava
# On windows, use "gradlew" instead.
$ ./gradlew clean assemble
$ adb install -r app/build/outputs/apk/app-profiling.apk
### Record and report profiling data (using command-lines)
We recommend using python scripts for profiling because they are more convenient.
But using command-line will give us a better understanding of the profile process
step by step. So we first show how to use command lines.
**1. Enable profiling**
$ adb shell setprop security.perf_harden 0
**2. Fully compile the app**
We need to compile Java bytecode into native instructions to profile Java code
in the application. This needs different commands on different Android versions.
On Android >= N:
$ adb shell setprop debug.generate-debug-info true
$ adb shell cmd package compile -f -m speed com.example.simpleperf.simpleperfexamplepurejava
# Restart the app to take effect
$ adb shell am force-stop com.example.simpleperf.simpleperfexamplepurejava
On Android M devices, We need root privilege to force Android to fully compile
Java code into native instructions in ELF binaries with debug information. We
also need root privilege to read compiled native binaries (because installd
writes them to a directory whose uid/gid is system:install). So profiling Java
code can only be done on rooted devices.
$ adb root
$ adb shell setprop dalvik.vm.dex2oat-flags -g
# Reinstall the app.
$ adb install -r app/build/outputs/apk/app-profiling.apk
On Android L devices, we also need root privilege to compile the app with debug info
and access the native binaries.
$ adb root
$ adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols
# Reinstall the app.
$ adb install -r app/build/outputs/apk/app-profiling.apk
**3. Find the app process**
# Start the app if needed
$ adb shell am start -n com.example.simpleperf.simpleperfexamplepurejava/.MainActivity
# Run `ps` in the app's context. On Android >= O devicces, run `ps -e` instead.
$ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ps | grep simpleperf
u0_a151 6885 3346 1590504 53980 SyS_epoll_ 6fc2024b6c S com.example.simpleperf.simpleperfexamplepurejava
So the id of the app process is `6885`. We will use this number in the command lines below,
please replace this number with what you get by running `ps` command.
**4. Download simpleperf to the app's data directory**
# Find which architecture the app is using.
$ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cat /proc/6885/maps | grep boot.oat
708e6000-70e33000 r--p 00000000 103:09 1214 /system/framework/arm64/boot.oat
# The app uses /arm64/boot.oat, so push simpleperf in bin/android/arm64/ to device.
$ cd ../../scripts/
$ adb push bin/android/arm64/simpleperf /data/local/tmp
$ adb shell chmod a+x /data/local/tmp/simpleperf
$ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cp /data/local/tmp/simpleperf .
**5. Record perf.data**
$ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -p 6885 --duration 10
simpleperf I 04-27 20:41:11 6940 6940 cmd_record.cpp:357] Samples recorded: 40008. Samples lost: 0.
$ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ls -lh perf.data
simpleperf I 04-27 20:31:40 5999 5999 cmd_record.cpp:357] Samples recorded: 39949. Samples lost: 0.
The profiling data is recorded at perf.data.
Normally we need to use the app when profiling, otherwise we may record no samples.
But in this case, the MainActivity starts a busy thread. So we don't need to use
the app while profiling.
There are many options to record profiling data, check [record command](#simpleperf-record) for details.
**6. Report perf.data**
# Pull perf.data on host.
$ adb shell "run-as com.example.simpleperf.simpleperfexamplepurejava cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
$ adb pull /data/local/tmp/perf.data
# Report samples using corresponding simpleperf executable on host.
# On windows, use "bin\windows\x86_64\simpleperf" instead.
$ bin/linux/x86_64/simpleperf report
...
Overhead Command Pid Tid Shared Object Symbol
83.54% Thread-2 6885 6900 /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
16.11% Thread-2 6885 6900 /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)
See [here](#the-correct-way-to-pull-perfdata-on-host) for why we use tee rather than just >.
There are many ways to show reports, check [report command](#simpleperf-report) for details.
### Record and report profiling data (using python scripts)
Besides command lines, We can use `app-profiler.py` to profile Android applications.
It downloads simpleperf on device, records perf.data, and collects profiling
results and native binaries on host. It is configured by `app-profiler.config`.
**1. Fill `app-profiler.config`**
Change `app_package_name` line to app_package_name="com.example.simpleperf.simpleperfexamplepurejava"
Change `apk_file_path` line to apk_file_path = "../SimpleperfExamplePureJava/app/build/outputs/apk/app-profiling.apk"
Change `android_studio_project_dir` line to android_studio_project_dir = "../SimpleperfExamplePureJava/"
Change `record_options` line to record_options = "--duration 10"
`apk_file_path` is needed to fully compile the application on Android L/M. It is
not necessary on Android >= N.
`android_studio_project_dir` is used to search native libraries in the
application. It is not necessary for profiling.
`record_options` can be set to any option accepted by simpleperf record command.
**2. Run `app-profiler.py`**
$ python app_profiler.py
If running successfully, it will collect profiling data in perf.data in current
directory, and related native binaries in binary_cache/.
**3. Report perf.data**
We can use `report.py` to report perf.data.
$ python report.py
We can add any option accepted by `simpleperf report` command to `report.py`.
### Record and report call graph
A call graph is a tree showing function call relations. Below is an example.
main() {
FunctionOne();
FunctionTwo();
}
FunctionOne() {
FunctionTwo();
FunctionThree();
}
callgraph:
main-> FunctionOne
| |
| |-> FunctionTwo
| |-> FunctionThree
|
|-> FunctionTwo
#### Record dwarf based call graph
When using command lines, add `-g` option like below:
$ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -g -p 6685 --duration 10
When using python scripts, change `app-profiler.config` as below:
Change `record_options` line to record_options = "--duration 10 -g"
Recording dwarf based call graph needs support of debug information
in native binaries. So if using native libraries in the application,
it is better to contain non-stripped native libraries in the apk.
#### Record stack frame based call graph
When using command lines, add `--call-graph fp` option like below:
$ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record --call-graph fp -p 6685 --duration 10
When using python scripts, change `app-profiler.config` as below:
Change `record_options` line to record_options = "--duration 10 --call-graph fp"
Recording stack frame based call graphs needs support of stack frame
register. Notice that on arm architecture, the stack frame register
is not well supported, even if compiled using -O0 -g -fno-omit-frame-pointer
options. It is because the kernel can't unwind user stack containing both
arm/thumb code. **So please consider using dwarf based call graph on arm
architecture, or profiling in arm64 environment.**
#### Report call graph
To report call graph using command lines, add `-g` option.
$ bin/linux/x86_64/simpleperf report -g
...
Children Self Command Pid Tid Shared Object Symbol
99.97% 0.00% Thread-2 10859 10876 /system/framework/arm64/boot.oat java.lang.Thread.run
|
-- java.lang.Thread.run
|
-- void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
|--83.66%-- [hit in function]
|
|--16.22%-- int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)
| |--99.97%-- [hit in function]
To report call graph using python scripts, add `-g` option.
$ python report.py -g
# Double-click an item started with '+' to show its callgraph.
### Visualize profiling data
`simpleperf_report_lib.py` provides an interface reading samples from perf.data.
By using it, You can write python scripts to read perf.data or convert perf.data
to other formats. Below are two examples.
### Show flamegraph
$ python report_sample.py >out.perf
$ stackcollapse-perf.pl out.perf >out.folded
$ ./flamegraph.pl out.folded >a.svg
### Visualize using pprof
pprof is a tool for visualization and analysis of profiling data. It can
be got from https://github.com/google/pprof. pprof_proto_generator.py can
generate profiling data in a format acceptable by pprof.
$ python pprof_proto_generator.py
$ pprof -pdf pprof.profile
### Annotate source code
`annotate.py` reads perf.data, binaries in `binary-cache` (collected by `app-profiler.py`)
and source code, and generates annoated source code in `annotated_files/`.
**1. Run annotate.py**
$ python annotate.py -s ../SimpleperfExamplePureJava
`addr2line` is need to annotate source code. It can be found in Android ndk
release, in paths like toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/aarch64-linux-android-addr2line.
Please use `--addr2line` option to set the path of `addr2line` if annotate.py
can't find it.
**2. Read annotated code**
The annotated source code is located at `annotated_files/`.
`annotated_files/summary` shows how each source file is annotated.
One annotated source file is `annotated_files/java/com/example/simpleperf/simpleperfexamplepurejava/MainActivity.java`.
It's content is similar to below:
// [file] shows how much time is spent in current file.
/* [file] acc_p: 99.966552%, p: 99.837438% */package com.example.simpleperf.simpleperfexamplepurejava;
...
// [func] shows how much time is spent in current function.
/* [func] acc_p: 16.213395%, p: 16.209250% */ private int callFunction(int a) {
...
// This shows how much time is spent in current line.
// acc_p field means how much time is spent in current line and functions called by current line.
// p field means how much time is spent just in current line.
/* acc_p: 99.966552%, p: 83.628188% */ i = callFunction(i);
## Answers to common issues
### The correct way to pull perf.data on host
As perf.data is generated in app's context, it can't be pulled directly to host.
One way is to `adb shell run-as xxx cat perf.data >perf.data`. However, it
doesn't work well on Windows, because the content can be modified when it goes
through the pipe. So we first copy it from app's context to shell's context,
then pull it on host. The commands are as below:
$adb shell "run-as xxx cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
$adb pull /data/local/tmp/perf.data
## Inferno

### Description
Inferno is a flamegraph generator for native (C/C++) Android apps. It was
originally written to profile and improve surfaceflinger performance
(Android compositor) but it can be used for any native Android application
. You can see a sample report generated with Inferno
[here](./inferno/report.html). Report are self-contained in HTML so they can be
exchanged easily.
Notice there is no concept of time in a flame graph since all callstack are
merged together. As a result, the width of a flamegraph represents 100% of
the number of samples and the height is related to the number of functions on
the stack when sampling occurred.

In the flamegraph featured above you can see the main thread of SurfaceFlinger.
It is immediatly apparent that most of the CPU time is spent processing messages
`android::SurfaceFlinger::onMessageReceived`. The most expensive task is to ask
the screen to be refreshed as `android::DisplayDevice::prepare` shows in orange
. This graphic division helps to see what part of the program is costly and
where a developer's effort to improve performances should go.
### Example of bottleneck
A flamegraph give you instant vision on the CPU cycles cost centers but
it can also be used to find specific offenders. To find them, look for
plateaus. It is easier to see an example:

In the previous flamegraph, two
plateaus (due to `android::BufferQueueCore::validateConsistencyLocked`)
are immediately apparent.
### How it works
Inferno relies on simpleperf to record the callstack of a native application
thousands of times per second. Simpleperf takes care of unwinding the stack
either using frame pointer (recommended) or dwarf. At the end of the recording
`simpleperf` also symbolize all IPs automatically. The record are aggregated and
dumps dumped to a file `perf.data`. This file is pulled from the Android device
and processed on the host by Inferno. The callstacks are merged together to
visualize in which part of an app the CPU cycles are spent.
### How to use it
Open a terminal and from `simpleperf` directory type:
```
./inferno.sh (on Linux/Mac)
./inferno.bat (on Windows)
```
Inferno will collect data, process them and automatically open your web browser
to display the HTML report.
### Parameters
You can select how long to sample for, the color of the node and many other
things. Use `-h` to get a list of all supported parameters.
```
./inferno.sh -h
```
### Troubleshooting
#### Messy flame graph
A healthy flame graph features a single call site at its base
(see `inferno/report.html`).
If you don't see a unique call site like `_start` or `_start_thread` at the base
from which all flames originate, something went wrong. : Stack unwinding may
fail to reach the root callsite. These incomplete
callstack are impossible to merge properly. By default Inferno asks
`simpleperf` to unwind the stack via the kernel and frame pointers. Try to
perform unwinding with dwarf `-du`, you can further tune this setting.
#### No flames
If you see no flames at all or a mess of 1 level flame without a common base,
this may be because you compiled without frame pointers. Make sure there is no
` -fomit-frame-pointer` in your build config. Alternatively, ask simpleperf to
collect data with dward unwinding `-du`.
#### High percentage of lost samples
If simpleperf reports a lot of lost sample it is probably because you are
unwinding with `dwarf`. Dwarf unwinding involves copying the stack before it is
processed. Try to use frame pointer unwinding which can be done by the kernel
and it much faster.
The cost of frame pointer is negligible on arm64 parameter but considerable
on arm 32-bit arch (due to register pressure). Use a 64-bit build for better
profiling.
#### run-as: package not debuggable
If you cannot run as root, make sure the app is debuggable otherwise simpleperf
will not be able to profile it.
|