1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246
|
Mono Ahead Of Time Compiler
===========================
The Ahead of Time compilation feature in Mono allows Mono to
precompile assemblies to minimize JIT time, reduce memory
usage at runtime and increase the code sharing across multiple
running Mono application.
To precompile an assembly use the following command:
mono --aot -O=all assembly.exe
The `--aot' flag instructs Mono to ahead-of-time compile your
assembly, while the -O=all flag instructs Mono to use all the
available optimizations.
* Caching metadata
------------------
Besides code, the AOT file also contains cached metadata information which allows
the runtime to avoid certain computations at runtime, like the computation of
generic vtables. This reduces both startup time, and memory usage. It is possible
to create an AOT image which contains only this cached information and no code by
using the 'metadata-only' option during compilation:
mono --aot=metadata-only assembly.exe
This works even on platforms where AOT is not normally supported.
* Position Independent Code
---------------------------
On x86 and x86-64 the code generated by Ahead-of-Time compiled
images is position-independent code. This allows the same
precompiled image to be reused across multiple applications
without having different copies: this is the same way in which
ELF shared libraries work: the code produced can be relocated
to any address.
The implementation of Position Independent Code had a
performance impact on Ahead-of-Time compiled images but
compiler bootstraps are still faster than JIT-compiled images,
specially with all the new optimizations provided by the Mono
engine.
* How to support Position Independent Code in new Mono Ports
------------------------------------------------------------
Generated native code needs to reference various runtime
structures/functions whose address is only known at run
time. JITted code can simple embed the address into the native
code, but AOT code needs to do an indirection. This
indirection is done through a table called the Global Offset
Table (GOT), which is similar to the GOT table in the Elf
spec. When the runtime saves the AOT image, it saves some
information for each method describing the GOT table entries
used by that method. When loading a method from an AOT image,
the runtime will fill out the GOT entries needed by the
method.
* Computing the address of the GOT
Methods which need to access the GOT first need to compute its
address. On the x86 it is done by code like this:
call <IP + 5>
pop ebx
add <OFFSET TO GOT>, ebx
<save got addr to a register>
The variable representing the got is stored in
cfg->got_var. It is allways allocated to a global register to
prevent some problems with branches + basic blocks.
* Referencing GOT entries
Any time the native code needs to access some other runtime
structure/function (i.e. any time the backend calls
mono_add_patch_info ()), the code pointed by the patch needs
to load the value from the got. For example, instead of:
call <ABSOLUTE ADDR>
it needs to do:
call *<OFFSET>(<GOT REG>)
Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
For more examples on the changes required, see
svn diff -r 37739:38213 mini-x86.c
* The Program Linkage Table
As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
most architectures, call instructions use a displacement instead of an absolute address, so
they are already position independent. An PLT entry is usually a jump instruction, which
initially points to some trampoline code which transfers control to the AOT loader, which
will compile the called method, and patch the PLT entry so that further calls are made
directly to the called method.
If the called method is in the same assembly, and does not need initialization (i.e. it
doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
* The Precompiled File Format
-----------------------------
We use the native object format of the platform. That way it
is possible to reuse existing tools like objdump and the
dynamic loader. All we need is a working assembler, i.e. we
write out a text file which is then passed to gas (the gnu
assembler) to generate the object file.
The precompiled image is stored in a file next to the original
assembly that is precompiled with the native extension for a shared
library (on Linux its ".so" to the generated file).
For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
The following things are saved in the object file and can be
looked up using the equivalent to dlsym:
mono_assembly_guid
A copy of the assembly GUID.
mono_aot_version
The format of the AOT file format.
mono_aot_opt_flags
The optimizations flags used to build this
precompiled image.
method_infos
Contains additional information needed by the runtime for using the
precompiled method, like the GOT entries it uses.
method_info_offsets
Maps method indexes to offsets in the method_infos array.
mono_icall_table
A table that lists all the internal calls
references by the precompiled image.
mono_image_table
A list of assemblies referenced by this AOT
module.
methods
The precompiled code itself.
method_offsets
Maps method indexes to offsets in the methods array.
ex_info
Contains information about methods which is rarely used during normal execution,
like exception and debug info.
ex_info_offsets
Maps method indexes to offsets in the ex_info array.
class_info
Contains precomputed metadata used to speed up various runtime functions.
class_info_offsets
Maps class indexes to offsets in the class_info array.
class_name_table
A hash table mapping class names to class indexes. Used to speed up
mono_class_from_name ().
plt
The Program Linkage Table
plt_info
Contains information needed to find the method belonging to a given PLT entry.
* Performance considerations
----------------------------
Using AOT code is a trade-off which might lead to higher or
slower performance, depending on a lot of circumstances. Some
of these are:
- AOT code needs to be loaded from disk before being used, so
cold startup of an application using AOT code MIGHT be
slower than using JITed code. Warm startup (when the code is
already in the machines cache) should be faster. Also,
JITing code takes time, and the JIT compiler also need to
load additional metadata for the method from the disk, so
startup can be faster even in the cold startup case.
- AOT code is usually compiled with all optimizations turned
on, while JITted code is usually compiled with default
optimizations, so the generated code in the AOT case should
be faster.
- JITted code can directly access runtime data structures and
helper functions, while AOT code needs to go through an
indirection (the GOT) to access them, so it will be slower
and somewhat bigger as well.
- When JITting code, the JIT compiler needs to load a lot of
metadata about methods and types into memory.
- JITted code has better locality, meaning that if A method
calls B, then the native code for A and B is usually quite
close in memory, leading to better cache behaviour thus
improved performance. In contrast, the native code of
methods inside the AOT file is in a somewhat random order.
* Future Work
-------------
- Currently, when an AOT module is loaded, all of its
dependent assemblies are also loaded eagerly, and these
assemblies need to be exactly the same as the ones loaded
when the AOT module was created ('hard binding'). Non-hard
binding should be allowed.
- On x86, the generated code uses call 0, pop REG, add
GOTOFFSET, REG to materialize the GOT address. Newer
versions of gcc use a separate function to do this, maybe we
need to do the same.
- Currently, we get vtable addresses from the GOT. Another
solution would be to store the data from the vtables in the
.bss section, so accessing them would involve less
indirection.
|