File: pprof_integration.adoc

package info (click to toggle)
google-perftools 2.18-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 10,616 kB
  • sloc: cpp: 78,788; ansic: 17,902; sh: 10,783; python: 4,278; makefile: 1,127; ruby: 198; asm: 130; awk: 12
file content (119 lines) | stat: -rw-r--r-- 4,974 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
== pprof integration

:reproducible:

gperftools was the original home for the pprof program. This program
is used to visualize and analyze profiles (CPU profiles, heap
profiles, heap samples, set of thread stacks, etc.). The original
pprof was written in Perl. As of this writing, the Linux distros are
shipping this version of pprof. Meanwhile, pprof was completely
modernized and rewritten in Go. The Go version is a much better
one. We've been recommending people to switch to the Go version for a
number of years and starting gperftools 2.17 we no longer have the
original pprof.

You can get the Go pprof binary by running:

  % go install github.com/google/pprof@latest

The binary will normally appear in `$HOME/go/bin`. So you may want to
add it to your `$PATH`.

The main documentation of pprof can be found at
https://github.com/google/pprof/blob/main/doc/README.md

On this page, I'll point out some helpful integration aspects.

Here are the kinds of "profiles" that gperfools can feed into pprof.

=== CPU profiling

CPU profiler is provided in a distinct library: libprofiler. It's C++
API is in `gperftools/profiler.h`. You can invoke
`ProfilerStart()`/`ProfilerStop()` to control it. Or you can have
libprofiler automagically profile the full run of your program by setting
`CPUPROFILE` environment variable.

See link:cpuprofile.html[documentation of CPU profiler] for full
details.

A general description of how statistical sampling profilers work can be
found in this nice blog post: https://research.swtch.com/pprof.

We produce a "legacy" CPU profile format. The format is described
here: link:cpuprofile-fileformat.html[].

=== Heap sample

libtcmalloc supports very low overhead sampling of allocations. If this feature is enabled, you can call:

  std::string sample_profile;
  MallocExtension::instance()->GetHeapSample(&sample_profile);

And you'll get a statistical estimate of all currently in-use memory
allocations with backtraces of allocations. It will show you where
currently in-use memory was allocated. Heap sample can be saved and
fed to the pprof program for visualization and analysis.

At Google, this feature is enabled fleet-wide (and by default), but in
gperftools, our default is off. You can turn it on by setting the
environment variable `TCMALLOC_SAMPLE_PARAMETER`. However, please note
that libtcmalloc_minimal doesn't have this feature. In order to use
heap sampling, you need to link to "full" libtcmalloc.

The reasonable value of the sample parameter is from 524288 (512kb;
original default) to a few megs (current default at Google). A lower
value gives you more samples, so higher statistical precision. But a
lower value also causes higher overhead and lock contention.

Our sibling project, "abseil" tcmalloc, also supports heap
sampling. Implementation has evolved a bit, but this is fundamentally
the same logic. In addition to sampling, they also have allocation and
deallocation profiling powered by the same sampling facility. Their
docs are at:
https://github.com/google/tcmalloc/blob/master/docs/sampling.md.

Go has similar feature called heap profiling. Go's heap profiles
combine information about in-use memory and all the allocations ever
made. It is similar to gperftools' link:heapprofile.html[heap profiler] but works
via sampling, so it is low overhead and runs by default. You can read
about it here: https://pkg.go.dev/runtime/pprof. Approximately every
512k bytes (value of runtime.MemProfileRate) of memory allocated, Go's
runtime triggers heap sampling. Heap sampling grabs backtrace, and
then updates per-call-site allocation counters. The heap profile is a
collection of call sites (identified by the backtrace chain) and
relevant statistics.

=== Heap Growth stacks

Every time tcmalloc extends its heap, it grabs stack trace. A
collection of those stacks can be obtained by:

  MallocExtension::instance()->GetHeapGrowthStacks(&string);

and fed to pprof for visualization and analysis. This kind of profile
gives you locations in your code that extended heap (either due to
regular usage, leaks, or fragmentation).

Heap growth tracking is always enabled in full libtcmalloc and is cut
off from libtcmalloc_minimal.

=== Heap Profiler

See link:heapprofile.html[Heap Profiler documentation]. Note that the
heap profiler intercepts every allocation and deallocation call, so it
runs with a much higher overhead than normal malloc and is not
suitable for production.

=== HTTP interfaces

The more commonly used pprof integration point used at Google is via
HTTP endpoints. Go standard library provides a great example of how it
is done and how to use it. https://pkg.go.dev/net/http/pprof documents
it.

gperftools doesn't provide any HTTP handlers, but we do give you raw
profiling data, which you can serve by whatever HTTP-serving APIs you
like. Each profile kind (with the partial exception of heap profiler)
has an API to obtain profile data, which can be returned from an HTTP
handler.