File: investigating_heap_dump_example.md

package info (click to toggle)
chromium 73.0.3683.75-1~deb9u1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 1,792,156 kB
  • sloc: cpp: 13,473,466; ansic: 1,577,080; python: 898,539; javascript: 655,737; xml: 341,883; asm: 306,070; java: 289,969; perl: 80,911; objc: 67,198; sh: 43,184; cs: 27,853; makefile: 12,092; php: 11,064; yacc: 10,373; tcl: 8,875; ruby: 3,941; lex: 1,800; pascal: 1,473; lisp: 812; awk: 41; jsp: 39; sed: 19; sql: 3
file content (301 lines) | stat: -rw-r--r-- 13,228 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
# Example Investigation of a Heap Dump

This document describes the steps taken to investigate a real memory leak
discovered by heap profiling in the wild. For investigators less familiar with
the code base, `Navigating the Stack Trace` should be enough information to
determine the relevant component, and to forward the bug to a component OWNER.

## Understanding the heap dump summary

The opening comment of [Issue
834033](https://bugs.chromium.org/p/chromium/issues/detail?id=834033) contains a
heap dump summary. The highlights are:

* 315723 calls to malloc without corresponding call to free.
* 806MB of memory.
* The common stacktrace for all 315723 allocations.

Usually, anything that uses over 10MB of memory is a red flag. With the
exception of large image resources, most code in Chrome should use much less
than 10MB. Anything that has over 100k allocations is also a red flag.

### Navigating the Stack Trace - Detailed Breakdown

Let's take a look at the stack trace:

```
profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
<???>
<???>
base::allocator::UncheckedMallocMac(unsigned long, void**)
sk_malloc_flags(unsigned long, unsigned int)
SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
extensions::ExtensionActionSetIconFunction::RunExtensionAction()
extensions::ExtensionActionFunction::Run()
ExtensionFunction::RunWithValidation()
extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoop::RunTask(base::PendingTask*)
base::MessageLoop::DoWork()
base::MessagePumpCFRunLoopBase::RunWork()
base::mac::CallWithEHFrame(void () block_pointer)
base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
<???>
<???>
<???>
<???>
<???>
<???>
<???>
<???>
<???>
__71-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]_block_invoke
base::mac::CallWithEHFrame(void () block_pointer)
-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
<???>
base::MessagePumpNSApplication::DoRun(base::MessagePump::Delegate*)
base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*)
<name omitted>
ChromeBrowserMainParts::MainMessageLoopRun(int*)
content::BrowserMainLoop::RunMainMessageLoopParts()
content::BrowserMainRunnerImpl::Run()
content::BrowserMain(content::MainFunctionParams const&)
content::ContentMainRunnerImpl::Run()
service_manager::Main(service_manager::MainParams const&)
content::ContentMain(content::ContentMainParams const&)
ChromeMain
main
<???>
```

The first step is to divide the stack trace into smaller segments to get a
better understanding of what's happening at the time of allocations. The best
way to do this is to segment by name space and/or function prefixes.

```
profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
<???>
<???>
base::allocator::UncheckedMallocMac(unsigned long, void**)
```

The top of each stack will always contain some `base` and/or `profiling`
code. This is the code responsible for allocating and recording the memory.

```
sk_malloc_flags(unsigned long, unsigned int)
SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
```

Next, we three 3 frames with the prefix `sk`. Searching for
`sk_malloc_flags` on
[codesearch](https://cs.chromium.org/search/?q=sk_malloc_flags&sq=package:chromium&type=cs)
reveals that the component is `third_party/skia`. Looking at the
[README](https://cs.chromium.org/chromium/src/third_party/skia/README) reveals
that Skia is a 2D graphics library.

```
IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
```

Next we see a templated function called `Read` in the namespace `IPC`.
`IPC` stands for inter-process communication. This suggests that the
function is responsible for reading an IPC Message, perhaps concerning an
`SkBitmap`.

```
ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
extensions::ExtensionActionSetIconFunction::RunExtensionAction()
extensions::ExtensionActionFunction::Run()
ExtensionFunction::RunWithValidation()
extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
```

Next, we see many frames with the `extension` prefix. Extensions are exactly
what they sound like - Chrome extensions like AdBlock are used to modify the
behavior of the browser.

```
content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
```

`content` is the name of code that glues together web code [like extensions] and
the rest of Chrome.

```
IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
```

More `IPC` code.

```
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoop::RunTask(base::PendingTask*)
base::MessageLoop::DoWork()
base::MessagePumpCFRunLoopBase::RunWork()
base::mac::CallWithEHFrame(void () block_pointer)
base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
```

More `base` code. The bottom of most stack traces should go back to
`MessageLoop`, a primitive Chrome construct used to run tasks.

### Navigating the Stack Trace - Summary

* The top and bottom of the stack should generally be the same and are not very
  interesting.
* The prefixes of frames can be used to get a rough idea of the components
  involved.
* Function names can be used to get a rough idea of what's going on.

In this case, extension code is calling `ParseIconFromCanvasDictionary` - so
it's probably trying to parse an icon. This calls into Skia code. Given that
Skia is a 2D drawing library, and the function is `tryAllocPixels`, Skia is
allocating some pixels for the icon. This process is being repeated 315 thousand
times, and the icon is being leaked every time.


## Diving into the code

Now that we have a rough idea of what's happening, let's look at the code for
ParseIconFromCanvasDictionary.

```cpp
bool ExtensionAction::ParseIconFromCanvasDictionary(
    const base::DictionaryValue& dict,
    gfx::ImageSkia* icon) {
  for (base::DictionaryValue::Iterator iter(dict); !iter.IsAtEnd();
       iter.Advance()) {
    std::string binary_string64;
    IPC::Message pickle;
    if (iter.value().is_blob()) {
      pickle = IPC::Message(iter.value().GetBlob().data(),
                            iter.value().GetBlob().size());
    } else if (iter.value().GetAsString(&binary_string64)) {
      std::string binary_string;
      if (!base::Base64Decode(binary_string64, &binary_string))
        return false;
      pickle = IPC::Message(binary_string.c_str(), binary_string.length());
    } else {
      continue;
    }
    base::PickleIterator pickle_iter(pickle);
    SkBitmap bitmap;
    if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
      return false;
    CHECK(!bitmap.isNull());

    // Chrome helpfully scales the provided icon(s), but let's not go overboard.
    const int kActionIconMaxSize = 10 * ActionIconSize();
    if (bitmap.drawsNothing() || bitmap.width() > kActionIconMaxSize)
      continue;

    float scale = static_cast<float>(bitmap.width()) / ActionIconSize();
    icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
  }
  return true;
}
```

There's a lot going on here, but we can use the information we have to focus.
The leak happens in IPC::ReadParam, so the relevant lines are:

```
SkBitmap bitmap;
if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
  return false;
```

The `IPC` message is being decoded into `bitmap`.

```
 icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
```
Looking at subsequent consumers of `bitmap`, we see that it is being added as a
representation to `icon`. `icon` is an output parameter of this function, so we
have to look at the calling frame,
`ExtensionActionSetIconFunction::RunExtensionAction`.

```
ExtensionFunction::ResponseAction
ExtensionActionSetIconFunction::RunExtensionAction() {
...
    EXTENSION_FUNCTION_VALIDATE(
        ExtensionAction::ParseIconFromCanvasDictionary(*canvas_set, &icon));

    if (icon.isNull())
      return RespondNow(Error("Icon invalid."));

    extension_action_->SetIcon(tab_id_, gfx::Image(icon));
...
}
```

In this case, I've already focused on the code that calls
`ParseIconFromCanvasDictionary`. Let's look at `SetIcon`.

```
void ExtensionAction::SetIcon(int tab_id, const gfx::Image& image) {
  SetValue(&icon_, tab_id, image);
}
```

```
template<class T>
void SetValue(std::map<int, T>* map, int tab_id, const T& val) {
  (*map)[tab_id] = val;
}
```

The icon is being added to a map `icon_`, with `tab_id` as the key. Ah ha!
Adding elements to a container [and never removing them] is one of the most
common sources of memory issues.

There are two ways for this memory to be released - the container `icon_` can be
destroyed, or the element can be removed from the container.

`icon_` is a member of `ExtensionAction`, whose documentation reads:
```
// ExtensionAction encapsulates the state of a browser action or page action.
// Instances can have both global and per-tab state. If a property does not have
// a per-tab value, the global value is used instead.
```

This suggests that the lifetime of `icon_` is tied to the lifetime of the
ExtensionAction, which we can guess is tied to the lifetime of the Extension. As
long as the extension stays installed and enabled, `icon_` will not be
destroyed.

Next, we use codesearch to look at all code that removes elements from `icon_`.
The only place that performs removal is

```
void ExtensionAction::ClearAllValuesForTab(int tab_id) {
...
  icon_.erase(tab_id);
...
}
```

This is called by `ExtensionActionAPI::ClearAllValuesForTab`, which is called by
`TabHelper::DidFinishNavigation`. The name of this method suggests that each
time a tab is navigated, the previous tab-specific icon is cleared. However,
that means that if a tab is closed, then the icon is leaked forever.