File: README.md

package info (click to toggle)
ruby-stackprof 0.2.26-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 552 kB
  • sloc: python: 2,494; ruby: 1,264; perl: 920; ansic: 761; javascript: 735; makefile: 4
file content (345 lines) | stat: -rw-r--r-- 12,429 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
# Stackprof

A sampling call-stack profiler for Ruby.

Inspired heavily by [gperftools](https://code.google.com/p/gperftools/), and written as a replacement for [perftools.rb](https://github.com/tmm1/perftools.rb).

## Requirements

* Ruby 2.2+
* Linux-based OS

## Getting Started

### Install

In your Gemfile add:

```ruby
gem 'stackprof'
```

Then run `$ bundle install`. Alternatively you can run `$ gem install stackprof`.


### Run

in ruby:

``` ruby
StackProf.run(mode: :cpu, out: 'tmp/stackprof-cpu-myapp.dump') do
  #...
end
```

via rack:

``` ruby
use StackProf::Middleware, enabled: true,
                           mode: :cpu,
                           interval: 1000,
                           save_every: 5
```

reporting:

```
$ stackprof tmp/stackprof-cpu-*.dump --text --limit 1
  ==================================
    Mode: cpu(1000)
    Samples: 60395 (1.09% miss rate)
    GC: 2851 (4.72%)
  ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
        1660   (2.7%)        1595   (2.6%)     String#blank?

$ stackprof tmp/stackprof-cpu-*.dump --method 'String#blank?'
  String#blank? (gems/activesupport-2.3.14.github30/lib/active_support/core_ext/object/blank.rb:80)
    samples:  1595 self (2.6%)  /   1660 total (2.7%)
    callers:
       373  (   41.0%)  ApplicationHelper#current_user
       192  (   21.1%)  ApplicationHelper#current_repository
    callers:
       803  (   48.4%)  Object#present?
    code:
                                    |    80  |   def blank?
   1225    (2.0%) /  1225   (2.0%)  |    81  |     self !~ /[^[:space:]]/
                                    |    82  |   end

$ stackprof tmp/stackprof-cpu-*.dump --method 'Object#present?'
  Object#present? (gems/activesupport-2.3.14.github30/lib/active_support/core_ext/object/blank.rb:20)
    samples:    59 self (0.1%)  /    910 total (1.5%)
    callees (851 total):
       803  (   94.4%)  String#blank?
        32  (    3.8%)  Object#blank?
        16  (    1.9%)  NilClass#blank?
    code:
                                    |    20  |   def present?
    910    (1.5%) /    59   (0.1%)  |    21  |     !blank?
                                    |    22  |   end
```

For an experimental version of WebUI reporting of stackprof, see [stackprof-webnav](https://github.com/alisnic/stackprof-webnav)

To generate flamegraphs with Stackprof, additional data must be collected using the `raw: true` flag. Once you've collected results with this flag enabled, generate a flamegraph with:

```
$ stackprof --flamegraph tmp/stackprof-cpu-myapp.dump > tmp/flamegraph
```

After the flamegraph has been generated, you can generate a viewer command with:

```
$ stackprof --flamegraph-viewer=tmp/flamegraph
```

The `--flamegraph-viewer` command will output the exact shell command you need to run in order to open the `tmp/flamegraph` you generated with the built-in stackprof flamegraph viewer:

![Flamegraph Viewer](http://i.imgur.com/EwndrgD.png)

Alternatively, you can generate a flamegraph that uses [d3-flame-graph](https://github.com/spiermar/d3-flame-graph):

```
$ stackprof --d3-flamegraph tmp/stackprof-cpu-myapp.dump > flamegraph.html
```

And just open the result by your browser.

## Sampling

Four sampling modes are supported:

  - `:wall` (using `ITIMER_REAL` and `SIGALRM`) [default mode]
  - `:cpu` (using `ITIMER_PROF` and `SIGPROF`)
  - `:object` (using `RUBY_INTERNAL_EVENT_NEWOBJ`)
  - `:custom` (user-defined via `StackProf.sample`)

Samplers have a tuneable interval which can be used to reduce overhead or increase granularity:

  - Wall time: sample every _interval_ microseconds of wallclock time (default: 1000)

```ruby
StackProf.run(mode: :wall, out: 'tmp/stackprof.dump', interval: 1000) do
  #...
end
```

  - CPU time: sample every _interval_ microseconds of CPU activity (default: 1000 = 1 millisecond)

```ruby
StackProf.run(mode: :cpu, out: 'tmp/stackprof.dump', interval: 1000) do
  #...
end
```

  - Object allocation: sample every _interval_ allocations (default: 1)


```ruby
StackProf.run(mode: :object, out: 'tmp/stackprof.dump', interval: 1) do
  #...
end
```

By default, samples taken during garbage collection will show as garbage collection frames
including both mark and sweep phases. For longer traces, these can leave gaps in a flamegraph
that are hard to follow. They can be disabled by setting the `ignore_gc` option to true.
Garbage collection time will still be present in the profile but not explicitly marked with
its own frame.

Samples are taken using a combination of three new C-APIs in ruby 2.1:

  - Signal handlers enqueue a sampling job using `rb_postponed_job_register_one`.
    this ensures callstack samples can be taken safely, in case the VM is garbage collecting
    or in some other inconsistent state during the interruption.

  - Stack frames are collected via `rb_profile_frames`, which provides low-overhead C-API access
    to the VM's call stack. No object allocations occur in this path, allowing stackprof to collect
    callstacks in allocation mode.

  - In allocation mode, samples are taken via `rb_tracepoint_new(RUBY_INTERNAL_EVENT_NEWOBJ)`,
    which provides a notification every time the VM allocates a new object.

## Aggregation

Each sample consists of N stack frames, where a frame looks something like `MyClass#method` or `block in MySingleton.method`.
For each of these frames in the sample, the profiler collects a few pieces of metadata:

  - `samples`: Number of samples where this was the topmost frame
  - `total_samples`: Samples where this frame was in the stack
  - `lines`: Samples per line number in this frame
  - `edges`: Samples per callee frame (methods invoked by this frame)

The aggregation algorithm is roughly equivalent to the following pseudo code:

``` ruby
trap('PROF') do
  top, *rest = caller

  top.samples += 1
  top.lines[top.lineno] += 1
  top.total_samples += 1

  prev = top
  rest.each do |frame|
    frame.edges[prev] += 1
    frame.total_samples += 1
    prev = frame
  end
end
```

This technique builds up an incremental call graph from the samples. On any given frame,
the sum of the outbound edge weights is equal to total samples collected on that frame
(`frame.total_samples == frame.edges.values.sum`).

## Reporting

Multiple reporting modes are supported:
  - Text
  - Dotgraph
  - Source annotation

### `StackProf::Report.new(data).print_text`

```
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
        91  (48.4%)          91  (48.4%)     A#pow
        58  (30.9%)          58  (30.9%)     A.newobj
        34  (18.1%)          34  (18.1%)     block in A#math
       188 (100.0%)           3   (1.6%)     block (2 levels) in <main>
       185  (98.4%)           1   (0.5%)     A#initialize
        35  (18.6%)           1   (0.5%)     A#math
       188 (100.0%)           0   (0.0%)     <main>
       188 (100.0%)           0   (0.0%)     block in <main>
       188 (100.0%)           0   (0.0%)     <main>
```

### `StackProf::Report.new(data).print_graphviz`

```
digraph profile {
  70346498324780 [size=23.5531914893617] [fontsize=23.5531914893617] [shape=box] [label="A#pow\n91 (48.4%)\r"];
  70346498324680 [size=18.638297872340424] [fontsize=18.638297872340424] [shape=box] [label="A.newobj\n58 (30.9%)\r"];
  70346498324480 [size=15.063829787234042] [fontsize=15.063829787234042] [shape=box] [label="block in A#math\n34 (18.1%)\r"];
  70346498324220 [size=10.446808510638299] [fontsize=10.446808510638299] [shape=box] [label="block (2 levels) in <main>\n3 (1.6%)\rof 188 (100.0%)\r"];
  70346498324220 -> 70346498324900 [label="185"];
  70346498324900 [size=10.148936170212766] [fontsize=10.148936170212766] [shape=box] [label="A#initialize\n1 (0.5%)\rof 185 (98.4%)\r"];
  70346498324900 -> 70346498324780 [label="91"];
  70346498324900 -> 70346498324680 [label="58"];
  70346498324900 -> 70346498324580 [label="35"];
  70346498324580 [size=10.148936170212766] [fontsize=10.148936170212766] [shape=box] [label="A#math\n1 (0.5%)\rof 35 (18.6%)\r"];
  70346498324580 -> 70346498324480 [label="34"];
  70346497983360 [size=10.0] [fontsize=10.0] [shape=box] [label="<main>\n0 (0.0%)\rof 188 (100.0%)\r"];
  70346497983360 -> 70346498325080 [label="188"];
  70346498324300 [size=10.0] [fontsize=10.0] [shape=box] [label="block in <main>\n0 (0.0%)\rof 188 (100.0%)\r"];
  70346498324300 -> 70346498324220 [label="188"];
  70346498325080 [size=10.0] [fontsize=10.0] [shape=box] [label="<main>\n0 (0.0%)\rof 188 (100.0%)\r"];
  70346498325080 -> 70346498324300 [label="188"];
}
```

### `StackProf::Report.new(data).print_method(/pow|newobj|math/)`

```
A#pow (/Users/tmm1/code/stackprof/sample.rb:11)
                         |    11  |   def pow
   91  (48.4% / 100.0%)  |    12  |     2 ** 100
                         |    13  |   end
A.newobj (/Users/tmm1/code/stackprof/sample.rb:15)
                         |    15  |   def self.newobj
   33  (17.6% /  56.9%)  |    16  |     Object.new
   25  (13.3% /  43.1%)  |    17  |     Object.new
                         |    18  |   end
A#math (/Users/tmm1/code/stackprof/sample.rb:20)
                         |    20  |   def math
    1   (0.5% / 100.0%)  |    21  |     2.times do
                         |    22  |       2 + 3 * 4 ^ 5 / 6
block in A#math (/Users/tmm1/code/stackprof/sample.rb:21)
                         |    21  |     2.times do
   34  (18.1% / 100.0%)  |    22  |       2 + 3 * 4 ^ 5 / 6
                         |    23  |     end
```

## Usage

The profiler is compiled as a C-extension and exposes a simple api: `StackProf.run(mode: [:cpu|:wall|:object])`.
The `run` method takes a block of code and returns a profile as a simple hash.

``` ruby
# sample after every 1ms of cpu activity
profile = StackProf.run(mode: :cpu, interval: 1000) do
  MyCode.execute
end
```

This profile data structure is part of the public API, and is intended to be saved
(as json/marshal for example) for later processing. The reports above can be generated
by passing this structure into `StackProf::Report.new`.

The format itself is very simple. It contains a header and a list of frames. Each frame has a unique ID and
identifying information such as its name, file, and line. The frame also contains sampling data, including per-line
samples, and a list of relationships to other frames represented as weighted edges.

``` ruby
{:version=>1.0,
 :mode=>:cpu,
 :inteval=>1000,
 :samples=>188,
 :missed_samples=>0,
 :frames=>
  {70346498324780=>
    {:name=>"A#pow",
     :file=>"/Users/tmm1/code/stackprof/sample.rb",
     :line=>11,
     :total_samples=>91,
     :samples=>91,
     :lines=>{12=>91}},
   70346498324900=>
    {:name=>"A#initialize",
     :file=>"/Users/tmm1/code/stackprof/sample.rb",
     :line=>5,
     :total_samples=>185,
     :samples=>1,
     :edges=>{70346498324780=>91, 70346498324680=>58, 70346498324580=>35},
     :lines=>{8=>1}},
```

Above, `A#pow` was involved in 91 samples, and in all cases it was at the top of the stack on line 12.

`A#initialize` was in 185 samples, but it was at the top of the stack in only 1 sample. The rest of the samples are
divided up between its callee edges. All 91 calls to `A#pow` came from `A#initialize`, as seen by the edge numbered
`70346498324780`.

## Advanced usage

The profiler can be started and stopped manually. Results are accumulated until retrieval, across
multiple `start`/`stop` invocations.

``` ruby
StackProf.running? # => false
StackProf.start(mode: :cpu)
StackProf.running? # => true
StackProf.stop
StackProf.results('/tmp/some.file')
```

## All options

`StackProf.run` accepts an options hash. Currently, the following options are recognized:

Option      | Meaning
-------     | ---------
`mode`      | Mode of sampling: `:cpu`, `:wall`, `:object`, or `:custom` [c.f.](#sampling)
`out`       | The target file, which will be overwritten
`interval`  | Mode-relative sample rate [c.f.](#sampling)
`ignore_gc` | Ignore garbage collection frames
`aggregate` | Defaults: `true` - if `false` disables [aggregation](#aggregation)
`raw`       | Defaults `false` - if `true` collects the extra data required by the `--flamegraph` and `--stackcollapse` report types
`metadata`  | Defaults to `{}`. Must be a `Hash`. metadata associated with this profile
`save_every`| (Rack middleware only) write the target file after this many requests

## Todo

* file/iseq blacklist
* restore signal handlers on stop