1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
|
<section id="shlibs">
<title>Supporting Shared Libraries</title>
<para>
When an executable is added to the image, we want any required shared
libraries to be added automatically. The <code>SharedLibraries</code>
module determines which files are required. This section discusses
the features of kernel and compiler we need to be aware of in order
to do this reliably.
</para>
<para>
Linux executables today are in ELF format; it is defined in
<ulink url="http://www.linuxbase.org/spec/book/ELF-generic/ELF-generic.html">
<citetitle>
Generic ELF Specification ELFVERSION</citetitle></ulink>,
part of the Linux Standard Base. This is based on part of the System
V ABI: Tool Interface Standard (TIS), Executable and Linking Format
(ELF) Sepcification
</para>
<para>
ELF has consequences in different parts of the system: in
the link-editor, that needs to merge ELF object files into ELF
executables; in the kernel (<filename>fs/binfmt_elf.c</filename>),
that has to place the executable in RAM and transfer control to it,
and in the runtime loader, that is invoked when starting the
application to load the necessary shared libraries into RAM.
The idea is as follows.
</para>
<itemizedlist>
<listitem>
<para>
Executables are in ELF format, with a type of either
<code>ET_EXEC</code> (executable) or <code>ET_DYN</code> (shared
library; yes, you can execute those.) There are other types of
ELF file (core files for example) but you can't execute them.
</para>
</listitem>
<listitem>
<para>
These files contain two kind of headers: program headers and
section headers. Program headers define segments of the file that
the kernel should store consequetively in RAM; section headers define
parts of the file that should be treated by the link editor
as a single unit. Program headers normally point to a group
of adjacent sections.
</para>
</listitem>
<listitem>
<para>
The program may be statically linked or dynamically (with shared
libraries).
If it's statically linked, the kernel loads relevant segments,
then transfers control to main() in userland.
</para>
</listitem>
<listitem>
<para>
If it's dynamically linked, one of the program headers has type
<code>PT_INTERP</code>. It points to a segment that contains
the name of a (static) executable; this executable is loaded in
RAM together with the segments of the dynamic executable.
</para>
</listitem>
<listitem>
<para>
The kernel then transfers control to the userland
interpreter, passing program headers and related info in a
fourth argument to <code>main()</code>, after <code>envp</code>.
</para>
</listitem>
<listitem>
<para>
There's one interesting twist: one of the segments loaded
into RAM (<filename>linux-gate.so</filename>) does not
come from the executable, but is a piece of kernel mapped
into user space. It contains a subroutine that the kernel
provides to do a system call; the idea is that this way,
the C library does not have to know which calling convention
for system calls is supported by the kernel and optimal for
the current hardware. The link editor knows nothing about
this, only the interpreter knows that the kernel can pass the
address of this subroutine together with the program headers.
<footnote>
<para>
For more info on the kernel-supplied shared library for
system calls, see
<ulink url="http://lwn.net/Articles/18411/">
<citetitle>LWN: How to speed up system calls</citetitle></ulink>,
<ulink url="http://lwn.net/Articles/30258/">
<citetitle>LWN: Patch: i386 vsyscall DSO implementation</citetitle></ulink>,
<ulink url="http://www.uwsg.iu.edu/hypermail/linux/kernel/0306.2/0674.html">
<citetitle>LKML: common name for the kernel DSO</citetitle></ulink>.
</para>
</footnote>
</para>
</listitem>
<listitem>
<para>
The interpreter interprets the <code>.dynamic</code> section of
the dynamic executable. This is a table containing various types
of info; if the type is <code>DT_NEEDED</code>, the info is the
name of a shared library that is needed to run the executable.
Normally, it's the basename.
</para>
</listitem>
<listitem>
<para>
The interpreter searches <code>LD_LIBARY_PATH</code> for the
library and loads the first working version it finds, using a
breath-first search. Once everything is loaded, the interpreter
hands over control to main in the executable.
</para>
</listitem>
<listitem>
<para>
Except that that's not how it really works: the path that glibc
uses depends on whether threads are supported, and klibc can
function as a <code>PT_INTERP</code> but will not load additional
libraries.
</para>
</listitem>
</itemizedlist>
<para>
The <application>ldd</application> command finds the pathnames
of shared libraries used by an executable. This works
only for glibc: it invokes the interpreter
with the executable as argument plus an environment variable that
tells it to print the pathnames rather than load them. For other
C libraries, there's no guaranteed correct way to find the path of
shared libraries.
</para>
<para>
Update: <application>ldd</application> also works for another
C library, uclibc, unless you disable that support while building
the library by unsetting <code>LDSO_LDD_SUPPORT</code>.
</para>
<para>
Thus, to figure out what goes on the initial ram image, first try
<application>ldd</application>. If that gives an answer, good.
Otherwise, use a helper program to find <code>PT_INTERP</code> and
<code>DT_NEEDED</code>. If there's only <code>PT_INTERP</code>, good,
add it to the image. If there are <code>DT_NEEDED</code> libraries
as well, and they have relative rather than absolute pathnames,
we can't determine the full path, so don't generate an image.
</para>
<para>
There are a number of options to build a helper to extract the relevant
information from the executable:
<itemizedlist>
<listitem>
<para>
Build it in perl. The problem here is that unpacking 64-bit
integers is an optional part of the language.
</para>
</listitem>
<listitem>
<para>
Build a wrapper around <application>objdump</application> or
<application>readelf</application>. The drawback is that
there programs are not part of a minimal Linux distribution:
depending on them in <application>yaird</application> would
increase the footprint.
</para>
</listitem>
<listitem>
<para>
Building a C program using libbdf. This is a library
intended to simplify working with object files. Drawbacks
are that it adds complexity that is not necessary in our
context since it supports multiple executable formats;
furthermore, at least in Debian it is treated as internal
to the gcc tool chain, complicating packaging the tool.
</para>
</listitem>
<listitem>
<para>
Building a C program based on <filename>elf.h</filename>.
This turns out to be easy to do.
</para>
</listitem>
</itemizedlist>
</para>
<para>
<application>Yaird</application> uses the last approach listed.
</para>
</section>
|