1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172
|
REF:
====
https://www.bfilipek.com/2019/11/perfguidecpu.html
https://easyperf.net/notes/
https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux
PERF
====
# Is a sampling profiler. Not 100% precies. Artifacts of measurement will have precision loss.
perf stat <exe> # course grain
# By default perf shows calle data. Expanding the tree will show the callers.
# But des not show proper call graph
perf record <exe>
perf report
# For callgraph use -g, this requires, frame pointers
# Hence make sure to compile/link with -fno-omit-frame-ponter
# This tells the compiler to stop deleting the frame pointer,allowing us to walk up/down the call stack
# At the cost of one register.
perf record -g <exe>
perf report -g
# Can have a flags file for compiler options:
-O3
-std=c++14
-lc++abi
-fno-exceptions
-fno-rtti
-pedantic
-fno-omit-frame-ponter
clang++ $(< flags) -o <test-name> <src file> -lbenchmark && ./<test-name>
# For proper callgraphs (top down) use:
# # 0.5 if for filtering function ?, caller means caller on top, then callee
perf record -g <exe>
oerf report -g "graph,0.5,caller"
==================================================================
- http://www.brendangregg.com/USEmethod/use-linux.html - Good collection of performance debug
- https://easyperf.net/notes/
- https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux
- https://easyperf.net/blog/2018/08/26/Basics-of-profiling-with-perf - Basic perf understanding
Use of perf: http://www.brendangregg.com/perf.html
===============================================================
cd $WK ; cp /var/tmp/ma0/DEFS/metabuilder.def .
Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# Show system call overhead,as a summary.
strace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
strace -e trace=open Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def # show what files opened
# track library calls. It intercepts and records the dynamic library calls which are called
# by the executed process and the signals which are received by that process.
# It can also intercept and print the system calls executed by the program
ltrace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
ltrace -S -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# Show page faults, data and instruction cache misses use:
perf stat -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# perf stat to run the same test workload multiple times and get for each count, the standard deviation from the mean.
perf stat -r 5 -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# For more detail and higher level overview, compile debug, -g means record stack traces
perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
perf report --sort comm,dso # high level overview
# show report to stdout, adjust sampling, avoid high number
perf record -F 99 -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
perf report -n --stdio
# for a graphical display, you can use flame graphs. This *ONLY* works properly with debug builds
# and you must use -g (collect stack traces)
# http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
# assume you have down loaded Flamegraph, git clone https://github.com/brendangregg/FlameGraph
# The wider the graph the more time is spent.
perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
perf script | $HOME/FlameGraph/stackcollapse-perf.pl > out.perf-folded
cat out.perf-folded | $HOME/FlameGraph/flamegraph.pl > my_perf.svg
display my_perf.svg
*alternatively* view from the browser, as this will also show:
- *full* function names
- no of samples
- percentage cpu times
The metabuilder.def below needs preparation for perf test below:
o/ module load ecflow/5new
o/ mb; git checkout ci; ./generate -a # switch to metabuilder ci, and generate all, so job generate can find all includes and scripts
o/ ecflow_client --port 3142 --host ecflow-metab --get > metabuilder.def
o/ Edit metabuilder.def and replace
edit ECF_HOME '/home/ma/deploy/servers/ecflow-metab.5062/metabuilder/...'
with
edit ECF_HOME '/var/tmp/ma0/workspace/metabuilder/...'
o/ Edit metabuilder.def and replace:
edit REMOTE_HOST 'ecflow-metab' | edit WSHOST 'ecflow-metab'
with
edit REMOTE_HOST 'polonius' | edit WSHOST 'polonius'
Use of perf: http://www.brendangregg.com/perf.html
===============================================================
cd $WK ; cp /var/tmp/ma0/DEFS/metabuilder.def .
time Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# Show system call overhead,as a summary.
strace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# Show page faults, data and instruction cache misses use:
perf stat -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# perf stat to run the same test workload multiple times and get for each count, the standard deviation from the mean.
perf stat -r 10 -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
# For more detail and higher level overview, compile debug, -g means record stack traces
perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
perf report --sort comm,dso # high level overview
# for a graphical display, you can use flame graphs. This *ONLY* works properly with debug builds
# and you must use -g (collect stack traces)
# assume you have down loaded Flamegraph, git clone https://github.com/brendangregg/FlameGraph
# The wider the graph the more time is spent.
perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
perf script | $HOME/FlameGraph/stackcollapse-perf.pl > out.perf-folded
cat out.perf-folded | $HOME/FlameGraph/flamegraph.pl > my_perf.svg
display my_perf.svg
Using valgrind
==========================================================================
# To see the callgraph make sure you use valgrind --tool=callgrind
# and *NOT* valgrind --tool=cachegrind
valgrind --tool=callgrind Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
kcachegrind callgrind.out.18473
test job creation
==========================================================================
- Note writing to scratch can be slow, this can be overridden by user specfiying
- thier own directory:
export PYTHONPATH=/var/tmp/ma0/workspace/ecflow/Pyext/ecflow
cat > tmp.py << EOF
import shutil
from ecflow import *
defs = Defs("metabuilder.def")
job_ctrl = JobCreationCtrl()
job_ctrl.set_dir_for_job_creation("/var/tmp/ma0/tmp/ecflow") # generate jobs file under this directory
#job_ctrl.set_verbose(True)
defs.check_job_creation(job_ctrl)
print(job_ctrl.get_error_msg())
#print("removing job generation directory tree " + job_ctrl.get_dir_for_job_creation())
#shutil.rmtree(job_ctrl.get_dir_for_job_creation())
EOF
strace -c python tmp.py
strace
===========================================================================
# strace with table of system calls and percentages ******
cd $WK ; cp /var/tmp/ma0/DEFS/metabuilder.def .
strace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
|