# How to profile Coq? I (Pierre-Marie Pédrot) mainly use two OCaml branches to profile Coq, whether I want to profile time or memory consumption. AFAIK, this only works for Linux. ## Time In Coq source folder: opam switch 4.09.0+trunk+fp ./configure -profile devel -debug make perf record -g bin/coqc file.v perf report -g fractal,callee --no-children To profile only part of a file, first load it using bin/coqtop -l file.v and plug into the process perf record -g -p PID ### Per-component [flame graphs](https://github.com/brendangregg/FlameGraph) I (Andres Erbsen) have found it useful to look at library-wide flame graphs of coq time consumption. As the Ltac interpreter stack is reflected in the OCaml stack, calls to the same primitive can appear on top of multiple essentially equivalent stacks. To make the profiles more readable, one could either try to edit the stack trace to merge "equivalent" frames, or simply look at the aggregate profile on a component-by-component basis. Here is how to do the second for the standard library ([example output](https://cdn.rawgit.com/andres-erbsen/b29b29cb6480dfc6a662062e4fcd0ae3/raw/304fc3fea9630c8e453929aa7920ca8a2a570d0b/stdlib_categorized_outermost.svg)). ~~~~~ #!/bin/bash make -f Makefile.dune clean make -f Makefile.dune states perf record -F99 `# ~1GB of data` --call-graph=dwarf -- make -f Makefile.dune world perf script --time '0%-100%' | stackcollapse-perf.pl | grep Coqtop__compile | sed -rf <(cat <<'EOF' s/;caml/;/g s/_[0-9]*;/;/g s/Logic_monad__fun;//g s/_apply[0-9];//g s/;System/@&@/ s/;Hashcons/@&@/ s/;Grammar/@&@/ s/;Declaremods/@&@/ s/;Tactics/@&@/ s/;Pretyping/@&@/ s/;Typeops/@&@/ s/;Reduction/@&@/ s/;Unification/@&@/ s/;Evarutil/@&@/ s/;Evd/@&@/ s/;EConstr/@&@/ s/;Constr/@&@/ s/;Univ/@&@/ s/;Ugraph/@&@/ s/;UState/@&@/ s/;Micromega/@&@/ s/;Omega/@&@/ s/;Auto/@&@/ s/;Ltac_plugin__Tacinterp/@&@/ s/;Ltac_plugin__Rewrite/@&@/ s/[^@]*@;([^@]*)@/\1;\1/ s/@//g :a; s/;([^;]+);\1;/;\1;/g;ta EOF ) | flamegraph.pl ~~~~~ ## Memory You first need a few commits atop trunk for this to work. git remote add ppedrot https://github.com/ppedrot/coq.git git fetch ppedrot git checkout ppedrot/allocation-profiling git rebase master Then: opam switch 4.00.1+alloc-profiling ./configure -profile devel -debug make Note that linking the coqtop binary takes quite an amount of time with this branch, so do not worry too much. There are more recent branches of alloc-profiling on mshinwell's repo which can be found at: https://github.com/mshinwell/opam-repo-dev ### For memory dump: CAMLRUNPARAM=T,mj bin/coqc file.v In another terminal: pkill -SIGUSR1 $COQTOPPID ... pkill -SIGUSR1 $COQTOPPID dev/decode-major-heap.sh heap.$COQTOPPID.$N bin/coqtop where $COQTOPPID is coqtop pid and $N the index of the call to pkill. First column is the memory taken by the objects (in words), second one is the number of objects and third is the place where the objects where allocated. ### For complete memory graph: CAMLRUNPARAM=T,gr bin/coqc file.v In another terminal: pkill -SIGUSR1 $COQTOPPID ... pkill -SIGUSR1 $COQTOPPID ocaml dev/decodegraph.ml edge.$COQTOPPID.$N bin/coqtop > memory.dot dot -Tpdf -o memory.pdf memory.dot where $COQTOPPID is coqtop pid and $N the index of the call to pkill. The pdf produced by the last command gives a compact graphical representation of the various objects allocated.