1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
|
<?xml version="1.0" encoding="utf-8"?>
<chapter>
<title>使用 <application>Massif</application> 分析 GNOME 软件的内存使用</title>
<para>
This article describes how to use the <application>Massif</application> heap profiler with GNOME applications. We describe how to invoke, interpret, and act on the output of <application>Massif</application>. The <application>Swell Foop</application> game is used as an example.
</para>
<sect1 id="optimization-massif-TBL-intro">
<title>简介</title>
<para><application>Massif</application> 是 <ulink type="http" url="http://valgrind.org/">valgrind</ulink> 内存分析套件的一个成员。它的目的是在程序的生命期里给出动态内存使用的细节。它会专门记录堆栈的内存使用。</para>
<para>堆是使用类似于 malloc 的函数分配内存的地方。它根据需要增长,而且通常是程序中最大的内存区域。栈是存放函数的本地数据的地方,包括 C 语言中的自动变量和子函数的返回地址。栈通常比堆更小更活跃。我们并没有显式地考虑栈,因为 <application>Massif</application> 将其认定为堆的另外一部分。<application>Massif</application> 也给出管理堆所使用的内存量。</para>
<para><application>Massif</application> 产生两个输出文件:一个 postscript 格式的概况以及一个文本格式的细节详述。</para>
</sect1>
<sect1 id="optimization-massif-TBL-using-massif">
<title>在 GNOME 中使用 <application>Massif</application></title>
<para>
<application>Massif</application> has very few options and for many programs does not need them. However for GNOME applications, where memory allocation might be buried deep in either glib or GTK, the number of levels down the call-stack Massif descends needs to be increased. This is achieved using the --depth parameter. By default this is 3; increasing it to 5 will guarantee the call-stack reaches down to your code. One or two more levels may also be desirable to provide your code with some context. Since the level of detail becomes quickly overwhelming it is best to start with the smaller depth parameter and only increase it when it becomes apparent that it isn't sufficient.
</para>
<para>
It is also useful to tell <application>Massif</application> which functions allocate memory in glib. It removes an unnecessary layer of function calls from the reports and gives you a clearer idea of what code is allocating memory. The allocating functions in glib are g_malloc, g_malloc0, g_realloc, g_try_malloc, and g_mem_chunk_alloc. You use the --alloc-fn option to tell Massif about them.
</para>
<para>因此,您的命令行应该类似于:</para>
<programlisting>
valgrind --tool=massif --depth=5 --alloc-fn=g_malloc --alloc-fn=g_realloc --alloc-fn=g_try_malloc \
--alloc-fn=g_malloc0 --alloc-fn=g_mem_chunk_alloc swell-foop
</programlisting>
<para>
<application>Swell Foop</application> is the program we will be using as an example. Be warned that, since valgrind emulates the CPU, it will run <emphasis>very</emphasis> slowly. You will also need a lot of memory.
</para>
</sect1>
<sect1 id="optimization-massif-TBL-interpreting-results">
<title>对结果的解释</title>
<para>
The graphical output of <application>Massif</application> is largely self explanatory. Each band represents the memory allocated by one function over time. Once you identify which bands are using the most memory, usually the big thick ones at the top you will have to consult the text file for the details.
</para>
<para>
The text file is arranged as a hierarchy of sections, at the top is a list of the worst memory users arranged in order of decreasing spacetime. Below this are further sections, each breaking the results down into finer detail as you proceed down the call-stack. To illustrate this we will use the output of the command above.
</para>
<figure id="optimization-massif-FIG-output-unoptimized">
<title><application>Massif</application> output for the unoptimized version of the <application>Swell Foop</application> program.</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/massif-before.png" format="PNG"/>
</imageobject>
</mediaobject>
</figure>
<para>
<xref linkend="optimization-massif-FIG-output-unoptimized"/> shows a typical postscript output from <application>Massif</application>. This is the result you would get from playing a single game of <application>Swell Foop</application> (version 2.8.0) and then quitting. The postscript file will have a name like <filename>massif.12345.ps</filename> and the text file will be called <filename>massif.12345.txt</filename>. The number in the middle is the process ID of the program that was examined. If you actually try this example you will find two versions of each file, with slightly different numbers, this is because <application>Swell Foop</application> starts a second process and <application>Massif</application> follows that too. We will ignore this second process, it consumes very little memory.
</para>
<para>
At the top of the graph we see a large yellow band labelled gdk_pixbuf_new. This seems like an ideal candidate for optimization, but we will need to use the text file to find out what is calling gdk_pixbuf_new. The top of the text file will look something like this:
</para>
<programlisting>
Command: ./swell-foop
== 0 ===========================
Heap allocation functions accounted for 90.4% of measured spacetime
Called from:
28.8% : 0x6BF83A: gdk_pixbuf_new (in /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
6.1% : 0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)
5.9% : 0x510B3C: (within /usr/lib/libfreetype.so.6.3.7)
3.5% : 0x2A4A6B: __gconv_open (in /lib/tls/libc-2.3.3.so)
</programlisting>
<para>
The line with the '=' signs indicates how far down the stack trace we are, in this case we are at the top. After this it lists the heaviest users of memory in order of decreasing spacetime. Spacetime is the product of the amount of memory used and how long it was used for. It corresponds to the area of the bands in the graph. This part of the file tells us what we already know: most of the spacetime is dedicated to gdk_pixbuf_new. To find out what called gdk_pixbuf_new we need to search further down the text file:
</para>
<programlisting>
== 4 ===========================
Context accounted for 28.8% of measured spacetime
0x6BF83A: gdk_pixbuf_new (in /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
0x3A998998: (within /usr/lib/gtk-2.0/2.4.0/loaders/libpixbufloader-png.so)
0x6C2760: (within /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
0x6C285E: gdk_pixbuf_new_from_file (in /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
Called from:
27.8% : 0x804C1A3: load_scenario (swell-foop.c:463)
0.9% : 0x3E8095E: (within /usr/lib/libgnomeui-2.so.0.792.0)
and 1 other insignificant place
</programlisting>
<para>
The first line tells us we are now four levels deep into the stack. Below it is a listing of the function calls that leads from here to gdk_pixbuf_new. Finally there is a list of functions that are at the next level down and call these functions. There are, of course, also entries for levels 1, 2, and 3, but this is the first level to reach right down through the GDK code to the <application>Swell Foop</application> code. From this listing, we can see instantly that the problem code is load_scenario.
</para>
<para>
Now that we know what part of our code is using all the spacetime we can look at it and find out why. It turns out that the load_scenario is loading a pixbuf from file and then never freeing that memory. Having identified the problem code, we can start to fix it.
</para>
</sect1>
<sect1 id="optimization-massif-TBL-acting-on-results">
<title>Acting on the Results</title>
<para>
Reducing spacetime consumption is good, but there are two ways of reducing it and they are not equal. You can either reduce the amount of memory allocated, or reduce the amount of time it is allocated for. Consider for a moment a model system with only two processes running. Both processes use up almost all the physical RAM and if they overlap at all then the system will swap and everything will slow down. Obviously if we reduce the memory usage of each process by a factor of two then they can peacefully coexist without the need for swapping. If instead we reduce the time the memory is allocated by a factor of two then the two programs can coexist, but only as long as their periods of high memory use don't overlap. So it is better to reduce the amount of memory allocated.
</para>
<para>
Unfortunately, the choice of optimization is also constrained by the needs of the program. The size of the pixbuf data in <application>Swell Foop</application> is determined by the size of the game's graphics and cannot be easily reduced. However, the amount of time it spends loaded into memory can be drastically reduced. <xref linkend="optimization-massif-FIG-output-optimized"/> shows the <application>Massif</application> analysis of <application>Swell Foop</application> after being altered to dispose of the pixbufs once the images have been loaded into the X server.
</para>
<figure id="optimization-massif-FIG-output-optimized">
<title><application>Massif</application> output for the optimized <application>Swell Foop</application> program.</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/massif-after.png"/>
</imageobject>
</mediaobject>
</figure>
<para>
The spacetime use of gdk_pixbuf_new is now a thin band that only spikes briefly (it is now the sixteenth band down and shaded magenta). As a bonus, the peak memory use has dropped by 200 kB since the spike occurs before other memory is allocated. If two processes like this were run together the chances of the peak memory usage coinciding, and hence the risk of swapping, would be quite low.
</para>
<para>我们可以做得更好吗?快速检查 <application>Massif</application> 的文本输出揭示:g_strdup 是新的主犯。</para>
<programlisting>
Command: ./swell-foop
== 0 ===========================
Heap allocation functions accounted for 87.6% of measured spacetime
Called from:
7.7% : 0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)
7.6% : 0x43BC9F: (within /usr/lib/libgdk-x11-2.0.so.0.400.9)
6.9% : 0x510B3C: (within /usr/lib/libfreetype.so.6.3.7)
5.2% : 0x2A4A6B: __gconv_open (in /lib/tls/libc-2.3.3.so)
</programlisting>
<para>如果我们更自习的查看,我们发现它从很多很多地方调用。</para>
<programlisting>
== 1 ===========================
Context accounted for 7.7% of measured spacetime
0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)
Called from:
1.8% : 0x8BF606: gtk_icon_source_copy (in /usr/lib/libgtk-x11-2.0.so.0.400.9)
1.1% : 0x67AF6B: g_param_spec_internal (in /usr/lib/libgobject-2.0.so.0.400.6)
0.9% : 0x91FCFC: (within /usr/lib/libgtk-x11-2.0.so.0.400.9)
0.8% : 0x57EEBF: g_quark_from_string (in /usr/lib/libglib-2.0.so.0.400.6)
and 155 other insignificant places
</programlisting>
<para>
We now face diminishing returns for our optimization efforts. The graph hints at another possible approach: Both the "other" and "heap admin" bands are quite large. This tells us that there are a lot of small allocations being made from a variety of places. Eliminating these will be difficult, but if they can be grouped then the individual allocations can be larger and the "heap admin" overhead can be reduced.
</para>
</sect1>
<sect1 id="optimization-massif-TBL-caveats">
<title>注意事项</title>
<para>
There are a couple of things to watch out for: Firstly, spacetime is only reported as a percentage, you have to compare it to the overall size of the program to decide if the amount of memory is worth pursuing. The graph, with its kilobyte vertical axis, is good for this.
</para>
<para>
Secondly, <application>Massif</application> only takes into account the memory used by your own program. Resources like pixmaps are stored in the X server and aren't considered by <application>Massif</application>. In the <application>Swell Foop</application> example we have actually only moved the memory consumption from client-side pixbufs to server-side pixmaps. Even though we cheated there are performance gains. Keeping the image data in the X server makes the graphics routines quicker and removes a lot of inter-process communication. Also, the pixmaps will be stored in a native graphics format which is often more compact than the 32-bit RGBA format used by gdk_pixbuf. To measure the effect of pixmaps, and other X resources use the <ulink type="http" url="http://www.freedesktop.org/Software/xrestop">xrestop</ulink> program.
</para>
</sect1>
</chapter>
|