File: README.optimization

package info (click to toggle)
bobcat 2.08.01-1
  • links: PTS
  • area: main
  • in suites: squeeze
  • size: 5,668 kB
  • ctags: 953
  • sloc: cpp: 10,403; makefile: 9,042; perl: 401; sh: 195
file content (77 lines) | stat: -rw-r--r-- 3,859 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
The `build' script uses -O3 as one of its compiler flags. Why -O3 and not -O2? 

    Compared to -O2, the compiler applies the following additional 
optimizations:

       -finline-functions
           Integrate all simple functions into their callers.  The compiler
           heuristically decides which functions are simple enough to be worth
           integrating in this way.

           If all calls to a given function are integrated, and the function
           is declared "static", then the function is normally not output as
           assembler code in its own right.

This is what we want for all class member functions defined inside their
classes. These functions are always simple, consisting of at most one line of
code. There are probably no other situations in this code for which the
compiler will find it useful to integrate, but for the inline members. But in
the case of the inline members (especially with the accessors) the overhead of
the additional call seems to be needlessly spillfull. After all, C++
explicitly offers the in-class function definition for these kinds of
functions, and thus, integrating their code rather than calling them 
seems like the right thing to do.

       -funswitch-loops
           Move branches with loop invariant conditions out of the loop, with
           duplicates of the loop on both branches (modified according to
           result of the condition).

Since they're invariants, they can safely be moved, even though it will
enlarge the code somewhat (because of the duplication). It prevents the code
from testing a condition time and again for each individual iteration within a
loop when that's not required.

       -fgcse-after-reload
           When -fgcse-after-reload is enabled, a redundant load elimination
           pass is performed after reload.  The purpose of this pass is to
           cleanup redundant spilling.

This refers to things like: removing dead code and reusing where possible
values that can be proven to be already available in some registers. In
`Contributions to the GNU Compiler Collection' the following paragraph is
found:
          
     The first case, where redundant loads appear before register allocation,
     is handled by redundancy elimination optimization. Redundancy elimination
     removes redundant calculations of expressions by reusing previously
     calculated values that are stored in some register. The redundancy
     elimination pass of CCC did consider loading a calculation of an
     expression from memory, but did not consider store operations as
     expressions. Thus, GCC did replace a load following another load from the
     same memory location by a register copy, but did not replace a load
     following a store to the same location. We enhanced the redundancy
     elimination pass so that it would also consider stores as expressions,
     and hence replace subsequent loads from the same location with register
     copies.
 
     The second case of load-hit-store events was due to poor register
     spilling (i.e., the reload pass in GCC). (43) We handled this case in two
     ways.  First, we added a "cleanup" pass after the reload that removed
     such redundancies, similar to the first case. However, this solution is
     limited because it works with hard (that is, allocated) registers. We
     reused the existing redundancy elimination infrastructure and added a
     special consideration of register availability for the register moves
     that we generate. We also took care of partial redundancy elimination by
     adding loads on basic blocks that are less critical (according to
     profiling), provided we can replace loads from critical blocks by
     register moves.

Again, this optimization is considered a desirable one and thus it was decided
to use -O3 rather than -O2.

Frank.