1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303
|
Profiling Implementation Notes -- June/July/Sept 1994
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Simon and Will
Pre-code-generator-ish
~~~~~~~~~~~~~~~~~~~~~~
* Automagic insertion of _sccs_ on...
- If -fprof-auto-exported is specified, add _scc_ on each *exported* top-level definition.
NB this includes CAFs. Done by addAutoCostCentres (Core-to-Core pass).
- If -fprof-auto-top is specified, add _scc_ on *all* top-level definitions.
Done by same pass.
- If -fprof-auto is specified, add _scc_ on *all* definitions.
- Always: just before code generation of module M, onto any CAF
which hasn't already got an explicit cost centre attached, pin
"AllCAFs-M".
Done by finalStgMassageForProfiling (final STG-to-STG pass)
Only the one-off costs of evaluating the CAFs will be attributed
to the AllCAFs-M cost centre. We hope that these costs will be
small; since the _scc_s are introduced automatically it's
confusing to attribute any significant costs to them. However if
there *are* significant one-off costs we'd better know about it.
Why so late in the compilation process? We aren't *absolutely*
sure what is and isn't a CAF until *just* before code generation.
So we don't want to mark them as such until then.
- Individual DICTs
We do it in the desugarer, because that's the *only* point at
which we *know* exactly what bindings are introduced by
overloading. NB should include bindings for selected methods, eg
f d = let op = _scc_ DICT op_sel d in
...op...op...op
The DICT CC ensures that:
(a) [minor] that the selection cost is separately attributed
(b) [major] that the cost of executing op is attributed to
its call site, eg
...(scc "a" op)...(scc "b" op)...(scc "c" op)...
* Automagic "boxing" of higher-order args:
finalStgMassageForProfiling (final STG-to-STG pass)
This (as well as CAF stuff above) is really quite separate
from the other business of finalStgMassageForProfiling
(collecting up CostCentres that need to be
declared/registered).
But throwing it all into the pot together means that we don't
have to have Yet Another STG Syntax Walker.
Furthermore, these "boxes" are really just let-bindings that
many other parts of the compiler will happily substitute away!
Doing them at the very last instant prevents this.
A down side of doing these so late is that we get lots of
"let"s, which if generated earlier and not substituted away,
could be floated outwards. Having them floated outwards would
lessen the chance of skewing profiling results (because of
gratuitous "let"s added by the compiler into the inner loop of
some program...). The allocation itself will be attributed to
profiling overhead; the only thing which'll be skewed is time measurement.
So if we have, post-boxing-higher-order-args...
_scc_ "foo" ( let f' = [f] \ [] f
in
map f' xs )
... we want "foo" to be put in the thunk for "f'", but we want the
allocation cost (heap census stuff) to be attr to OVERHEAD.
As an example of what could be improved
f = _scc_ "f" (g h)
To save dynamic allocation, we could have a static closure for h:
h_inf = _scc_ "f" h
f = _scc_ "f" (g h_inf)
Code generator-ish
~~~~~~~~~~~~~~~~~~
(1) _Entry_ code for a closure *usually* sets CC from the closure,
at the fast entry point
Exceptions:
(a) Top-level subsumed functions (i.e., w/ no _scc_ on them)
Refrain from setting CC from the closure
(b) Constructors
Again, refrain. (This is *new*)
Reasons: (i) The CC will be zapped very shortly by the restore
of the enclosing CC when we return to the eval'ing "case".
(ii) Any intervening updates will indirect to this existing
constructor (...mumble... new update mechanism... mumble...)
(2) "_scc_ cc expr"
Set current CC to "cc".
No later "restore" of the previous CC is reqd.
(3) "case e of { ...alts... }" expression (eval)
Save CC before eval'ing scrutinee
Restore CC at the start of the case-alternative(s)
(4) _Updates_ : updatee gets current CC
(???? not sure this is OK yet 94/07/04)
Reasons:
* Constructors : want to be insensitive to return-in-heap vs
return-in-regs. For example,
f x = _scc_ "f" (x, x)
The pair (x,x) would get CC of "f" if returned-in-heap;
therefore, updatees should get CC of "f".
* PAPs : Example:
f x = _scc_ "f" (let g = \ y -> ... in g)
At the moment of update (updatePAP?), CC is "f", which
is what we want to set it to if the "updatee" is entered
When we enter the PAP ("please put the arguments back so I can
use them"), we restore the setup as at the moment the
arg-satisfaction check failed.
Be careful! UPDATE_PAP is called from the arg-satis check,
which is before the fast entry point. So the cost centre
won't yet have been set from the closure which has just
been entered. Solution: in UPDATE_PAP see if the cost centre inside
the function closure which is being entered is "SUB"; if so, use
the current cost centre to update the updatee; otherwise use that
inside the function closure. (See the computation of cc_pap
in rule 16_l for lexical semantics.)
(5) CAFs
CAFs get their own cost centre. Ie
x = e
is transformed to
x = _scc_ "CAF:x" e
Or sometimes we lump all the CAFs in a module together.
(Reporting issue or code-gen issue?)
Hybrid stuff
~~~~~~~~~~~~
The problem:
f = _scc_ "CAF:f" (let g = \xy -> ...
in (g,g))
Now, g has cost-centre "CAF:f", and is returned as part of
the result. So whenever the function embedded in the result
is called, the costs will accumulate to "CAF:f". This is
particularly (de)pressing for dictionaries, which contain lots
of functions.
Solution:
A. Whenever in case (1) above we would otherwise "set the CC from the
closure", we *refrain* from doing so if
(a) the closure is a function, not a thunk; and
(b) the cost-centre in the closure is a CAF cost centre.
B. Whenever we enter a thunk [at least, one which might return a function]
we save the current cost centre in the update frame. Then, UPDATE_PAP
restores the saved cost centre from the update frame iff the cost
centre at the point of update (cc_pap in (4) above) is a CAF cost centre.
It isn't necessary to save and possibly-restore the cost centre for
thunks which will certainly return a constructor, because the
cost centre is about to be restored anyway by the enclosing case.
Both A and B are runtime tests. For A, consider:
f = _scc_ "CAF:f" (g 2)
h y = _scc_ "h" g (y+y)
g x = let w = \p -> ...
in (w,w)
Now, in the call to g from h, the cost-centre on w will be "h", and
indeed all calls to the result of the call should be attributed to
"h".
... _scc_ "x1" (let (t,_) = h 2 in t 3) ...
Costs of executing (w 3) attributed to "h".
But in the call to g from f, the cost-centre on w will be
"CAF:f", and calls to w should be attributed to the call site.
..._scc_ "x2" (let (t,_) = f in t 3)...
Costs of executing (w 3) attributed to "x2".
Remaining problem
Consider
_scc_ "CAF:f" (if expensive then g 2 else g 3)
where g is a function with arity 2. In theory we should
restore the enclosing cost centre once we've reduced to
(g 2) or (g 3). In practice this is pretty tiresome; and pretty rare.
A quick fix: given (_scc_ "CAF" e) where e might be function-valued
(in practice we usually know, because CAF sccs are top level), transform to
_scc_ "CAF" (let f = e in f)
============
scc cc x ===> x
UNLESS
(a) cc is a user-defined, non-dup'd cost
centre (so we care about entry counts)
OR
(b) cc is not a CAF/DICT cost centre and x is top-level subsumed
function.
[If x is lambda/let bound it'll have a cost centre
attached dynamically.]
To repeat, the transformation is OK if
x is a not top-level subsumed function
OR
cc is a CAF/DICT cost centre and x is a top-level
subsumed function
(scc cc e) x ===> (scc cc e x)
OK????? IFF
cc is not CAF/DICT --- remains to be proved!!!!!!
True for lex
False for eval
Can we tell which in hybrid?
eg Is this ok?
(scc "f" (scc "CAF" (\x.b))) y ==> (scc "f" (scc "CAF" (\x.b) y))
\x -> (scc cc e) ===> (scc cc \x->e)
OK IFF cc is not CAF/DICT
scc cc1 (scc cc2 e)) ===> scc cc2 e
IFF not interested in cc1's entry count
AND cc2 is not CAF/DICT
(scc cc1 ... (scc cc2 e) ...) ===> (scc cc1 ... e ...)
IFF cc2 is CAF/DICT
AND e is a lambda not appearing as the RHS of a let
OR
e is a variable not bound to SUB
|