1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346
|
\sec{Profiling}{profiling}{Profiling.html}
With {\mlton} and {\tt mlprof}, you can profile your program along
different measures of performance. To profile you program, compile
with {\tt -profile {\em kind}}. Then, run the executable, which, when
it finishes, will write an {\tt mlmon.out} file. You can then run
{\tt mlprof} on the executable and the {\tt mlmon.out} file to see the
gathered performance data.
%
\subsec{Profiling time}{proftime}
You can compile your program with {\tt -profile time} to find out how
much time is spent by each source function over an entire run of the
program. Here is an example of time profiling, run from within the
{\tt examples/profiling} directory.
\begin{verbatim}
% mlton -profile time tak.sml
% ./tak
% mlprof tak mlmon.out
6.00 seconds of CPU time (0.00 seconds GC)
function cur
------------- -----
Tak.tak1.tak2 75.8%
Tak.tak1 24.2%
\end{verbatim}
This is a contrived example with two mutually recursive copies of the
{\tt tak} function, defined within the {\tt Tak} structure. This
examples shows how {\tt mlprof} indicates lexical nesting via a
sequence of period-separated names indicating the structures and
functions in which a function definition is nested. The profiling
data shows that roughly three-quarters of the time is spent in the
{\tt Tak.tak1.tak2} function, while the rest is spent in {\tt
Tak.tak1}.
You can display raw counts in addition to percentages with {\tt -raw
true}.
\begin{verbatim}
% mlprof -raw true tak mlmon.out
6.00 seconds of CPU time (0.00 seconds GC)
function cur raw
------------- ----- -------
Tak.tak1.tak2 75.8% (4.55s)
Tak.tak1 24.2% (1.45s)
\end{verbatim}
%
You can display the file name and line number for each function in
addition to its name with {\tt -show-line true}.
\begin{verbatim}
% mlprof -show-line true tak mlmon.out
6.00 seconds of CPU time (0.00 seconds GC)
function cur
------------------------- -----
Tak.tak1.tak2 tak.sml: 5 75.8%
Tak.tak1 tak.sml: 3 24.2%
\end{verbatim}
Time profiling is designed to have a very small performance impact.
You can also compile with {\tt -profile time -profile-branch true} to
find out how much time is spent in each branch of a function; see
\secref{profcounts} for more details.
\subsec{Profiling allocation}{profalloc}
You can compile your program with {\tt -profile alloc} to find out how
many bytes each function allocates. Here is an example run from
within the {\tt examples/profiling} directory.
\begin{verbatim}
% mlton -profile alloc list-rev.sml
% ./list-rev
% mlprof -show-line true list-rev mlmon.out
6,030,136 bytes allocated (108,336 bytes by GC)
function cur
----------------------- -----
append list-rev.sml: 1 97.6%
<gc> 1.8%
<main> 0.4%
rev list-rev.sml: 6 0.2%
\end{verbatim}
%
The data shows that most of the allocation is done by the {\tt append}
function defined on line 1 of {\tt list-rev.sml}. The table also
shows how special functions like {\tt gc} and {\tt main} are handled:
they are printed with surrounding brackets. C functions are displayed
similarly. Here, as is usually the case, the allocation done by the
garbage collector is due to stack growth.
The performance impact of allocation profiling is noticeable, because
it inserts additional C calls for object allocation. You can also
compile with {\tt -profile alloc -profile-branch true} to find out how
much allocation is done in each branch of a function; see
\secref{profcounts} for more details.
%
\subsec{Profiling counts}{profcounts}
You can compile your program with {\tt -profile count -profile-branch
true} to find out how many times each function is called and how many
times each branch is taken. Here's an example.
\begin{verbatim}
% mlton -profile count -profile-branch true tak.sml
% ./tak
% mlprof -raw true -show-line true tak mlmon.out
623,610,002 ticks
function cur raw
--------------------------------- ----- -------------
Tak.tak1.tak2 tak.sml: 5 38.2% (238,530,000)
Tak.tak1.tak2.<true> tak.sml: 7 27.5% (171,510,000)
Tak.tak1 tak.sml: 3 10.7% (67,025,000)
Tak.tak1.<true> tak.sml: 14 10.7% (67,025,000)
Tak.tak1.tak2.<false> tak.sml: 9 10.7% (67,020,000)
Tak.tak1.<false> tak.sml: 16 2.0% (12,490,000)
f tak.sml: 23 0.0% (5,001)
f.<branch> tak.sml: 25 0.0% (5,000)
f.<branch> tak.sml: 23 0.0% (1)
uncalled tak.sml: 29 0.0% (0)
f.<branch> tak.sml: 24 0.0% (0)
\end{verbatim}
Branches are displayed with the usual lexical nesting followed by {\tt
<branch>} where the function name would normally be, or {\tt <true>}
or {\tt <false>} for if-expressions. Hence, it is best to run {\tt
mlprof} with {\tt -show-line true} to help identify the branch.
One use of {\tt -profile count} is as a code-coverage tool, to help
find code in your program that hasn't been tested. For this reason,
{\tt mlprof} displays functions and branches that have a count of
zero. As the above output shows, the branch on line 24 was never
taken and the function defined on line 29 was never called. To see
zero counts, it is best to run {\tt mlprof} with {\tt -raw true},
since some code (e.g. the branch on line 23 above) will show up with
{\tt 0.0\%} but may still have been executed and hence have a nonzero
raw count.
%
\subsec{Profiling the stack}{profstack}
For all forms of profiling, you can use {\tt -profile-stack true} to
count the time spent (or bytes allocated) while a function is on the
stack. Here is an example.
\begin{verbatim}
% mlton -profile alloc -profile-stack true list-rev.sml
% ./list-rev
% mlprof -show-line true list-rev mlmon.out
6,030,136 bytes allocated (108,336 bytes by GC)
function cur stack GC
----------------------- ----- ----- ----
append list-rev.sml: 1 97.6% 97.6% 1.4%
<gc> 1.8% 0.0% 1.8%
<main> 0.4% 98.2% 1.8%
rev list-rev.sml: 6 0.2% 97.6% 1.8%
\end{verbatim}
%
In the above table, we see that {\tt rev}, defined on line 6 of {\tt
list-rev.sml}, is only responsible for 0.2\% of the allocation, but is
on the stack while 97.6\% of the allocation is done by the user
program and while 1.8\% of the allocation is done by the garbage
collector.
The performance impact of {\tt -profile-stack true} can be noticeable
since there is some extra bookkeeping at every nontail call and
return.
%
\subsection{Call graphs}
For easier visualization of profiling data, {\tt mlprof} can create a
call graph of the program in dot format, from which you can use the
\htmladdnormallink{{\tt graphviz}}
{http://www.research.att.com/sw/tools/graphviz/}
software package to create a postscript graph. For example, {\tt
mlprof -call-graph foo.dot foo mlmon.out} will create {\tt foo.dot}
with a complete call graph. For each source function, there will be
one node in the graph that contains the function name (and source
position with {\tt -show-line true}), as well as the percentage of
ticks. If you want to create a call graph for your program without
any profiling data, you can simply call {\tt mlprof} without any {\tt
mlmon.out} files, e.g. {\tt mlprof -call-graph foo.dot foo}.
Because SML has higher-order functions, the call graph is is dependent
on {\mlton}'s analysis of which functions call each other. This
analysis depends on many implementation details and might display
spurious edges that a human could conclude are impossible. However,
in practice, the call graphs tend to be very accurate.
Because call graphs can get big, {\tt mlprof} provides the {\tt -keep}
option to specify the nodes that you would like to see. This
option also controls which functions appear in the table that {\tt
mlprof} prints. The argument to {\tt -keep} is an expression
describing a set of source functions (i.e. graph nodes), taken from
the following grammar.
\begin{latexonly}
\begin{center}
\begin{tabular}{lcl}
{\it e} & ::= & {\tt all} \\
& $\alt$ & {\tt "{\it s}"} \\
& $\alt$ & {\tt (and {\it e} ...)} \\
& $\alt$ & {\tt (from {\it e})} \\
& $\alt$ & {\tt (not {\it e})} \\
& $\alt$ & {\tt (or {\it e} ...)} \\
& $\alt$ & {\tt (pred {\it e})} \\
& $\alt$ & {\tt (succ {\it e})} \\
& $\alt$ & {\tt (to {\it e})} \\
& $\alt$ & {\tt (thresh {\it x})} \\
& $\alt$ & {\tt (thresh-gc {\it x})} \\
& $\alt$ & {\tt (thresh-stack {\it x})} \\
\end{tabular}
\end{center}
\end{latexonly}
\begin{htmlonly}
\begin{center}
\begin{tabular}{lcl}
{\it e} & ::= & {\tt all} \\
& | & {\tt "{\it s}"} \\
& | & {\tt (and {\it e} ...)} \\
& | & {\tt (from {\it e})} \\
& | & {\tt (not {\it e})} \\
& | & {\tt (or {\it e} ...)} \\
& | & {\tt (pred {\it e})} \\
& | & {\tt (succ {\it e})} \\
& | & {\tt (to {\it e})} \\
& | & {\tt (thresh {\it x})} \\
& | & {\tt (thresh-gc {\it x})} \\
& | & {\tt (thresh-stack {\it x})} \\
\end{tabular}
\end{center}
\end{htmlonly}
In the grammar, {\tt all} denotes the set of all nodes. {\tt "{\it
s}"} is a regular expression denoting the set of functions whose name
(followed by a space and the source position) has a prefix matching
the regexp. The {\tt and}, {\tt not}, and {\tt or} expressions denote
intersection, complement, and union, respectively. The {\tt pred} and
{\tt succ} expressions add the set of immediate predecessors or
successors to their argument, respectively. The {\tt from} and {\tt
to} expressions denote the set of nodes that have paths from or to the
set of nodes denoted by their arguments, respectively. Finally, {\tt
thresh}, {\tt thresh-gc}, and {\tt thresh-stack} denote the set of
nodes whose percentage of ticks, gc ticks, or stack ticks,
respectively, is greater than or equal to the real number {\it x}.
For example, if you want to see the entire call graph for a program,
you can use {\tt -keep all} (this is the default). If you want to see
all nodes reachable from function {\tt foo} in your program, you would
use {\tt -keep '(from "foo")'}. Or, if you want to see all the
functions defined in subdirectory {\tt bar} of your project that used
at least 1\% of the ticks, you would use {\tt -keep '(and ".*/bar/"
(thresh 1.0))'}. To see all functions with ticks above a threshold,
you can also use {\tt -thresh x}, which is an abbreviation for {\tt
-keep '(thresh x)'}. You can not use multiple {\tt -keep} arguments
or both {\tt -keep} and {\tt -thresh}. When you use {\tt -keep} to
display a subset of the functions, {\tt mlprof} will add dashed edges
to the call graph to indicate a path in the original call graph from
one function to another.
When compiling with {\tt -profile-stack true}, you can use {\tt mlprof
-gray true} to make the nodes darker or lighter depending on whether
their stack percentage is higher or lower.
{\mlton}'s optimizer may duplicate source functions for any of a
number of reasons (functor duplication, monomorphisation,
polyvariance, inlining). By default, all duplicates of a function are
treated as one. If you would like to treat the duplicates separately,
you can use {\tt mlprof -split} {\it regexp}, which will cause all
duplicates of functions whose name has a prefix matching the regular
expression to be treated separately. This can be especially useful
for higher-order utility functions like {\tt General.o}.
%
\subsection{Using {\tt MLton.Profile}}
To profile individual portions of your program, you can use the {\tt
MLton.Profile} structure (see \secref{profile-structure}). This
allows you to create many units of profiling data (essentially,
mappings from functions to counts) during a run of a program, to
switch between them while the program is running, and to output
multiple {\tt mlmon.out} files. Executing {\tt mlprof} with multiple
{\tt mlmon.out} files sums the profiling data to produce the output
profiling information.
Here is an example, run from within the {\tt examples/profiling}
directory, showing how to profile the executions of the {\tt fib} and
{\tt tak} functions separately.
\begin{verbatim}
% mlton -profile time fib-tak.sml
% ./fib-tak
% mlprof fib-tak mlmon.fib.out
5.77 seconds of CPU time (0.00 seconds GC)
function cur
--------- -----
fib 96.9%
<unknown> 3.1%
% mlprof fib-tak mlmon.tak.out
0.68 seconds of CPU time (0.00 seconds GC)
function cur
-------- ------
tak 100.0%
% mlprof fib-tak mlmon.fib.out mlmon.tak.out mlmon.out
6.45 seconds of CPU time (0.00 seconds GC)
function cur
--------- -----
fib 86.7%
tak 10.5%
<unknown> 2.8%
\end{verbatim}
%
\subsection{Profiling details}
Technically speaking, {\tt mlprof} produces a call-stack graph rather
than a call graph, because it describes the set of possible call
stacks. The difference is in how tail calls are displayed. For
example if {\tt f} nontail calls {\tt g} and {\tt g} tail calls {\tt
h}, then the call-stack graph has edges from {\tt f} to {\tt g} and
{\tt f} to {\tt h}, while the call graph has edges from {\tt f} to {\tt
g} and {\tt g} to {\tt h}. That is, a tail call from {\tt g} to {\tt
h} removes {\tt g} from the call stack and replaces it with {\tt h}.
Conceptually, both allocation and time profiling work in the same way.
The compiler produces information that maps machine code positions to
source functions that the profiler uses while the program is running
to find out the current source function. With {\tt -profile-stack
true}, the profiler also keeps track of which functions are on the
stack. The profiler associates counters (either clock ticks or byte
counts) with source functions. For allocation profiling, the compiler
inserts code whenever an object is allocated to call a C function to
increment the appropriate counter. For time profiling, the profiler
catches the {\tt SIGPROF} signal 100 times per second and increments
the appropriate counter. When the program finishes, the profiler
writes the counts out to the {\tt mlmon.out} file. Then, {\tt mlprof}
uses source information stored in the executable to associate the
counts in the {\tt mlmon.out} file with source functions.
With {\tt -profile time}, use of the following in your program will
cause a run-time error, since they would interfere with the profiler
signal handler.\\
\begin{tabular}{l}
\tt MLton.Itimer.set (MLton.Itimer.Prof, ...)\}\\
\tt MLton.Signal.setHandler (MLton.Signal.prof, ...)
\end{tabular}\\
Also, because of the random sampling used to implement {\tt -profile
time}, it is best to have a long running program (at least tens of
seconds) in order to get reasonable time data.
There may be a few missed clock ticks or bytes allocated at the
very end of the program after the data is written.
Profiling has not been tested with signals or threads. In particular,
stack profiling may behave strangely.
|