File: profiling.txt

package info (click to toggle)
coq 8.16.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 40,596 kB
  • sloc: ml: 219,376; sh: 3,545; python: 3,231; ansic: 2,529; makefile: 767; lisp: 279; javascript: 63; xml: 24; sed: 2
file content (128 lines) | stat: -rw-r--r-- 3,882 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# How to profile Coq?

I (Pierre-Marie Pédrot) mainly use two OCaml branches to profile Coq, whether I
want to profile time or memory consumption. AFAIK, this only works for Linux.

## Time

In Coq source folder:

opam switch 4.09.0+trunk+fp
./configure -profile devel -debug
make
perf record -g bin/coqc file.v
perf report -g fractal,callee --no-children

To profile only part of a file, first load it using

bin/coqtop -l file.v

and plug into the process

perf record -g -p PID

### Per-component [flame graphs](https://github.com/brendangregg/FlameGraph)

I (Andres Erbsen) have found it useful to look at library-wide flame graphs of
coq time consumption.  As the Ltac interpreter stack is reflected in the OCaml
stack, calls to the same primitive can appear on top of multiple essentially
equivalent stacks. To make the profiles more readable, one could either try to
edit the stack trace to merge "equivalent" frames, or simply look at the
aggregate profile on a component-by-component basis. Here is how to do the
second for the standard library ([example output](https://cdn.rawgit.com/andres-erbsen/b29b29cb6480dfc6a662062e4fcd0ae3/raw/304fc3fea9630c8e453929aa7920ca8a2a570d0b/stdlib_categorized_outermost.svg)).

~~~~~
#!/bin/bash
make -f Makefile.dune clean
make -f Makefile.dune states
perf record -F99  `# ~1GB of data` --call-graph=dwarf -- make -f Makefile.dune world
perf script --time '0%-100%'  |
        stackcollapse-perf.pl |
        grep Coqtop__compile |
        sed -rf <(cat <<'EOF'
                s/;caml/;/g
                s/_[0-9]*;/;/g
                s/Logic_monad__fun;//g
                s/_apply[0-9];//g
                s/;System/@&@/
                s/;Hashcons/@&@/
                s/;Grammar/@&@/
                s/;Declaremods/@&@/
                s/;Tactics/@&@/
                s/;Pretyping/@&@/
                s/;Typeops/@&@/
                s/;Reduction/@&@/
                s/;Unification/@&@/
                s/;Evarutil/@&@/
                s/;Evd/@&@/
                s/;EConstr/@&@/
                s/;Constr/@&@/
                s/;Univ/@&@/
                s/;Ugraph/@&@/
                s/;UState/@&@/
                s/;Micromega/@&@/
                s/;Omega/@&@/
                s/;Auto/@&@/
                s/;Ltac_plugin__Tacinterp/@&@/
                s/;Ltac_plugin__Rewrite/@&@/
                s/[^@]*@;([^@]*)@/\1;\1/
                s/@//g
                :a; s/;([^;]+);\1;/;\1;/g;ta
EOF
        ) |
        flamegraph.pl
~~~~~

## Memory

You first need a few commits atop trunk for this to work.

git remote add ppedrot https://github.com/ppedrot/coq.git
git fetch ppedrot
git checkout ppedrot/allocation-profiling
git rebase master

Then:

opam switch 4.00.1+alloc-profiling
./configure -profile devel -debug
make

Note that linking the coqtop binary takes quite an amount of time with this
branch, so do not worry too much. There are more recent branches of
alloc-profiling on mshinwell's repo which can be found at:

https://github.com/mshinwell/opam-repo-dev

### For memory dump:

CAMLRUNPARAM=T,mj bin/coqc file.v

In another terminal:

pkill -SIGUSR1 $COQTOPPID
...
pkill -SIGUSR1 $COQTOPPID
dev/decode-major-heap.sh heap.$COQTOPPID.$N bin/coqtop

where $COQTOPPID is coqtop pid and $N the index of the call to pkill.

First column is the memory taken by the objects (in words), second one is the
number of objects and third is the place where the objects where allocated.

### For complete memory graph:

CAMLRUNPARAM=T,gr bin/coqc file.v

In another terminal:

pkill -SIGUSR1 $COQTOPPID
...
pkill -SIGUSR1 $COQTOPPID
ocaml dev/decodegraph.ml edge.$COQTOPPID.$N bin/coqtop > memory.dot
dot -Tpdf -o memory.pdf memory.dot

where $COQTOPPID is coqtop pid and $N the index of the call to pkill.

The pdf produced by the last command gives a compact graphical representation of
the various objects allocated.