File: PERF.txt

package info (click to toggle)
ecflow 5.15.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 51,868 kB
  • sloc: cpp: 269,341; python: 22,756; sh: 3,609; perl: 770; xml: 333; f90: 204; ansic: 141; makefile: 70
file content (172 lines) | stat: -rw-r--r-- 7,273 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
REF:
====
https://www.bfilipek.com/2019/11/perfguidecpu.html
https://easyperf.net/notes/
https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux

PERF 
====
# Is a sampling profiler. Not 100% precies. Artifacts of measurement will have precision loss.
  perf stat <exe>  # course grain
  
# By default perf shows calle data. Expanding the tree will show the callers.
# But des not show proper call graph
  perf record <exe>
  perf report
  
# For callgraph use -g,  this requires, frame pointers
# Hence make sure to compile/link with -fno-omit-frame-ponter 
# This tells the compiler to stop deleting the frame pointer,allowing us to walk up/down the call stack
# At the cost of one register. 
  perf record -g <exe>
  perf report -g
 
# Can have a flags file for compiler options:
-O3
-std=c++14
-lc++abi
-fno-exceptions
-fno-rtti
-pedantic
-fno-omit-frame-ponter

  clang++ $(< flags) -o <test-name> <src file> -lbenchmark && ./<test-name>

# For proper callgraphs (top down) use:
# # 0.5 if for filtering function ?, caller means caller on top, then callee
  perf record -g <exe>
  oerf report -g "graph,0.5,caller"  

==================================================================
- http://www.brendangregg.com/USEmethod/use-linux.html                - Good collection of performance debug
- https://easyperf.net/notes/
- https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux
- https://easyperf.net/blog/2018/08/26/Basics-of-profiling-with-perf  - Basic perf understanding

Use of perf:  http://www.brendangregg.com/perf.html
===============================================================
  cd $WK ; cp /var/tmp/ma0/DEFS/metabuilder.def .
  Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # Show system call overhead,as a summary.
  strace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  strace -e trace=open Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def  # show what files opened
  
  # track library calls. It intercepts and records the dynamic library calls which are called 
  # by the executed process and the signals which are received by that process.  
  # It can also intercept and print the system calls executed by the program
  ltrace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  ltrace -S -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # Show page faults, data and instruction cache misses use:
  perf stat -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # perf stat to run the same test workload multiple times and get for each count, the standard deviation from the mean.
  perf stat -r 5 -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # For more detail and higher level overview, compile debug, -g means record stack traces
  perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
  perf report --sort comm,dso  # high level overview
  
  # show report to stdout, adjust sampling, avoid high number
  perf record -F 99 -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
  perf report -n --stdio

  # for a graphical display, you can use flame graphs. This *ONLY* works properly with debug builds
  # and you must use -g (collect stack traces)
  # http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
  # assume you have down loaded Flamegraph, git clone https://github.com/brendangregg/FlameGraph
  # The wider the graph the more time is spent.
  perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
  perf script | $HOME/FlameGraph/stackcollapse-perf.pl > out.perf-folded
  cat out.perf-folded | $HOME/FlameGraph/flamegraph.pl > my_perf.svg
  display my_perf.svg
  
  *alternatively* view from the browser, as this will also show:
    - *full* function names
    - no of samples
    - percentage cpu times


The metabuilder.def below needs preparation for perf test below: 
o/ module load ecflow/5new
o/ mb; git checkout ci; ./generate -a  # switch to metabuilder ci, and generate all, so job generate can find all includes and scripts
o/ ecflow_client --port 3142 --host ecflow-metab --get > metabuilder.def
o/ Edit metabuilder.def and replace 
      edit ECF_HOME '/home/ma/deploy/servers/ecflow-metab.5062/metabuilder/...'
   with
      edit ECF_HOME '/var/tmp/ma0/workspace/metabuilder/...'
o/ Edit metabuilder.def and replace:
      edit REMOTE_HOST 'ecflow-metab' | edit WSHOST 'ecflow-metab'
   with 
      edit REMOTE_HOST 'polonius' | edit WSHOST 'polonius'


Use of perf:  http://www.brendangregg.com/perf.html
===============================================================
  cd $WK ; cp /var/tmp/ma0/DEFS/metabuilder.def .
  time Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # Show system call overhead,as a summary.
  strace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # Show page faults, data and instruction cache misses use:
  perf stat -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # perf stat to run the same test workload multiple times and get for each count, the standard deviation from the mean.
  perf stat -r 10 -d Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def
  
  # For more detail and higher level overview, compile debug, -g means record stack traces
  perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
  perf report --sort comm,dso  # high level overview
  
  # for a graphical display, you can use flame graphs. This *ONLY* works properly with debug builds
  # and you must use -g (collect stack traces)
  # assume you have down loaded Flamegraph, git clone https://github.com/brendangregg/FlameGraph
  # The wider the graph the more time is spent.
  perf record -g Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
  perf script | $HOME/FlameGraph/stackcollapse-perf.pl > out.perf-folded
  cat out.perf-folded | $HOME/FlameGraph/flamegraph.pl > my_perf.svg
  display my_perf.svg
  
  
Using valgrind
==========================================================================
# To see the callgraph make sure you use valgrind --tool=callgrind
# and *NOT* valgrind --tool=cachegrind
valgrind --tool=callgrind Base/bin/gcc-7.3.0/debug/perf_job_gen ./metabuilder.def
kcachegrind callgrind.out.18473


test job creation
==========================================================================
 - Note writing to scratch can be slow, this can be overridden by user specfiying
 - thier own directory:
 
export PYTHONPATH=/var/tmp/ma0/workspace/ecflow/Pyext/ecflow
cat > tmp.py << EOF
import shutil
from ecflow import *

defs = Defs("metabuilder.def")

job_ctrl = JobCreationCtrl()
job_ctrl.set_dir_for_job_creation("/var/tmp/ma0/tmp/ecflow")  # generate jobs file under this directory
#job_ctrl.set_verbose(True)
defs.check_job_creation(job_ctrl)
print(job_ctrl.get_error_msg())

#print("removing job generation directory tree " + job_ctrl.get_dir_for_job_creation())
#shutil.rmtree(job_ctrl.get_dir_for_job_creation())     
EOF

strace -c python tmp.py


strace
===========================================================================

# strace with table of system calls and percentages ******
cd $WK ; cp /var/tmp/ma0/DEFS/metabuilder.def .
strace -c Base/bin/gcc-7.3.0/release/perf_job_gen ./metabuilder.def