1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415
|
=head1 NAME
guestfs-internals - architecture and internals of libguestfs
=head1 DESCRIPTION
This manual page is for hackers who want to understand how libguestfs
works internally. This is just a description of how libguestfs works
now, and it may change at any time in the future.
=head1 ARCHITECTURE
Internally, libguestfs is implemented by running an appliance (a
special type of small virtual machine) using L<qemu(1)>. Qemu runs as
a child process of the main program.
┌───────────────────┐
│ main program │
│ │
│ │ child process / appliance
│ │ ┌──────────────────────────┐
│ │ │ qemu │
├───────────────────┤ RPC │ ┌─────────────────┐ │
│ libguestfs ◀╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍▶ guestfsd │ │
│ │ │ ├─────────────────┤ │
└───────────────────┘ │ │ Linux kernel │ │
│ └────────┬────────┘ │
└───────────────│──────────┘
│
│ virtio-scsi
┌──────┴──────┐
│ Device or │
│ disk image │
└─────────────┘
The library, linked to the main program, creates the child process and
hence the appliance in the L<guestfs(3)/guestfs_launch> function.
Inside the appliance is a Linux kernel and a complete stack of
userspace tools (such as LVM and ext2 programs) and a small
controlling daemon called L</guestfsd>. The library talks to
L</guestfsd> using remote procedure calls (RPC). There is a mostly
one-to-one correspondence between libguestfs API calls and RPC calls
to the daemon. Lastly the disk image(s) are attached to the qemu
process which translates device access by the appliance’s Linux kernel
into accesses to the image.
A common misunderstanding is that the appliance "is" the virtual
machine. Although the disk image you are attached to might also be
used by some virtual machine, libguestfs doesn't know or care about
this. (But you will care if both libguestfs’s qemu process and your
virtual machine are trying to update the disk image at the same time,
since these usually results in massive disk corruption).
=head1 STATE MACHINE
libguestfs uses a state machine to model the child process:
|
guestfs_create / guestfs_create_flags
|
|
____V_____
/ \
| CONFIG |
\__________/
^ ^ \
| \ \ guestfs_launch
| _\__V______
| / \
| | LAUNCHING |
| \___________/
| /
| guestfs_launch
| /
__|____V
/ \
| READY |
\________/
The normal transitions are (1) CONFIG (when the handle is created, but
there is no child process), (2) LAUNCHING (when the child process is
booting up), (3) READY meaning the appliance is up, actions can be
issued to, and carried out by, the child process.
The guest may be killed by L<guestfs(3)/guestfs_kill_subprocess>, or
may die asynchronously at any time (eg. due to some internal error),
and that causes the state to transition back to CONFIG.
Configuration commands for qemu such as L<guestfs(3)/guestfs_set_path>
can only be issued when in the CONFIG state.
The API offers one call that goes from CONFIG through LAUNCHING to
READY. L<guestfs(3)/guestfs_launch> blocks until the child process is
READY to accept commands (or until some failure or timeout).
L<guestfs(3)/guestfs_launch> internally moves the state from CONFIG to
LAUNCHING while it is running.
API actions such as L<guestfs(3)/guestfs_mount> can only be issued
when in the READY state. These API calls block waiting for the
command to be carried out. There are no non-blocking versions, and no
way to issue more than one command per handle at the same time.
Finally, the child process sends asynchronous messages back to the
main program, such as kernel log messages. You can register a
callback to receive these messages.
=head1 INTERNALS
=head2 APPLIANCE BOOT PROCESS
This process has evolved and continues to evolve. The description
here corresponds only to the current version of libguestfs and is
provided for information only.
In order to follow the stages involved below, enable libguestfs
debugging (set the environment variable C<LIBGUESTFS_DEBUG=1>).
=over 4
=item Create the appliance
C<supermin --build> is invoked to create the kernel, a small initrd
and the appliance.
The appliance is cached in F</var/tmp/.guestfs-E<lt>UIDE<gt>> (or in
another directory if C<LIBGUESTFS_CACHEDIR> or C<TMPDIR> are set).
For a complete description of how the appliance is created and cached,
read the L<supermin(1)> man page.
=item Start qemu and boot the kernel
qemu is invoked to boot the kernel.
=item Run the initrd
C<supermin --build> builds a small initrd. The initrd is not the
appliance. The purpose of the initrd is to load enough kernel modules
in order that the appliance itself can be mounted and started.
The initrd is a cpio archive called
F</var/tmp/.guestfs-E<lt>UIDE<gt>/appliance.d/initrd>.
When the initrd has started you will see messages showing that kernel
modules are being loaded, similar to this:
supermin: ext2 mini initrd starting up
supermin: mounting /sys
supermin: internal insmod libcrc32c.ko
supermin: internal insmod crc32c-intel.ko
=item Find and mount the appliance device
The appliance is a sparse file containing an ext2 filesystem which
contains a familiar (although reduced in size) Linux operating system.
It would normally be called
F</var/tmp/.guestfs-E<lt>UIDE<gt>/appliance.d/root>.
The regular disks being inspected by libguestfs are the first
devices exposed by qemu (eg. as F</dev/vda>).
The last disk added to qemu is the appliance itself (eg. F</dev/vdb>
if there was only one regular disk).
Thus the final job of the initrd is to locate the appliance disk,
mount it, and switch root into the appliance, and run F</init> from
the appliance.
If this works successfully you will see messages such as:
supermin: picked /sys/block/vdb/dev as root device
supermin: creating /dev/root as block special 252:16
supermin: mounting new root on /root
supermin: chroot
Starting /init script ...
Note that C<Starting /init script ...> indicates that the appliance's
init script is now running.
=item Initialize the appliance
The appliance itself now initializes itself. This involves starting
certain processes like C<udev>, possibly printing some debug
information, and finally running the daemon (C<guestfsd>).
=item The daemon
Finally the daemon (C<guestfsd>) runs inside the appliance. If it
runs you should see:
verbose daemon enabled
The daemon expects to see a named virtio-serial port exposed by qemu
and connected on the other end to the library.
The daemon connects to this port (and hence to the library) and sends
a four byte message C<GUESTFS_LAUNCH_FLAG>, which initiates the
communication protocol (see below).
=back
=head2 COMMUNICATION PROTOCOL
Don’t rely on using this protocol directly. This section documents
how it currently works, but it may change at any time.
The protocol used to talk between the library and the daemon running
inside the qemu virtual machine is a simple RPC mechanism built on top
of XDR (RFC 1014, RFC 1832, RFC 4506).
The detailed format of structures is in F<common/protocol/guestfs_protocol.x>
(note: this file is automatically generated).
There are two broad cases, ordinary functions that don’t have any
C<FileIn> and C<FileOut> parameters, which are handled with very
simple request/reply messages. Then there are functions that have any
C<FileIn> or C<FileOut> parameters, which use the same request and
reply messages, but they may also be followed by files sent using a
chunked encoding.
=head3 ORDINARY FUNCTIONS (NO FILEIN/FILEOUT PARAMS)
For ordinary functions, the request message is:
total length (header + arguments,
but not including the length word itself)
struct guestfs_message_header (encoded as XDR)
struct guestfs_<foo>_args (encoded as XDR)
The total length field allows the daemon to allocate a fixed size
buffer into which it slurps the rest of the message. As a result, the
total length is limited to C<GUESTFS_MESSAGE_MAX> bytes (currently
4MB), which means the effective size of any request is limited to
somewhere under this size.
Note also that many functions don’t take any arguments, in which case
the C<guestfs_I<foo>_args> is completely omitted.
The header contains the procedure number (C<guestfs_proc>) which is
how the receiver knows what type of args structure to expect, or none
at all.
For functions that take optional arguments, the optional arguments are
encoded in the C<guestfs_I<foo>_args> structure in the same way as
ordinary arguments. A bitmask in the header indicates which optional
arguments are meaningful. The bitmask is also checked to see if it
contains bits set which the daemon does not know about (eg. if more
optional arguments were added in a later version of the library), and
this causes the call to be rejected.
The reply message for ordinary functions is:
total length (header + ret,
but not including the length word itself)
struct guestfs_message_header (encoded as XDR)
struct guestfs_<foo>_ret (encoded as XDR)
As above the C<guestfs_I<foo>_ret> structure may be completely omitted
for functions that return no formal return values.
As above the total length of the reply is limited to
C<GUESTFS_MESSAGE_MAX>.
In the case of an error, a flag is set in the header, and the reply
message is slightly changed:
total length (header + error,
but not including the length word itself)
struct guestfs_message_header (encoded as XDR)
struct guestfs_message_error (encoded as XDR)
The C<guestfs_message_error> structure contains the error message as a
string.
=head3 FUNCTIONS THAT HAVE FILEIN PARAMETERS
A C<FileIn> parameter indicates that we transfer a file I<into> the
guest. The normal request message is sent (see above). However this
is followed by a sequence of file chunks.
total length (header + arguments,
but not including the length word itself,
and not including the chunks)
struct guestfs_message_header (encoded as XDR)
struct guestfs_<foo>_args (encoded as XDR)
sequence of chunks for FileIn param #0
sequence of chunks for FileIn param #1 etc.
The "sequence of chunks" is:
length of chunk (not including length word itself)
struct guestfs_chunk (encoded as XDR)
length of chunk
struct guestfs_chunk (encoded as XDR)
...
length of chunk
struct guestfs_chunk (with data.data_len == 0)
The final chunk has the C<data_len> field set to zero. Additionally a
flag is set in the final chunk to indicate either successful
completion or early cancellation.
At time of writing there are no functions that have more than one
FileIn parameter. However this is (theoretically) supported, by
sending the sequence of chunks for each FileIn parameter one after
another (from left to right).
Both the library (sender) I<and> the daemon (receiver) may cancel the
transfer. The library does this by sending a chunk with a special
flag set to indicate cancellation. When the daemon sees this, it
cancels the whole RPC, does I<not> send any reply, and goes back to
reading the next request.
The daemon may also cancel. It does this by writing a special word
C<GUESTFS_CANCEL_FLAG> to the socket. The library listens for this
during the transfer, and if it gets it, it will cancel the transfer
(it sends a cancel chunk). The special word is chosen so that even if
cancellation happens right at the end of the transfer (after the
library has finished writing and has started listening for the reply),
the "spurious" cancel flag will not be confused with the reply
message.
This protocol allows the transfer of arbitrary sized files (no 32 bit
limit), and also files where the size is not known in advance
(eg. from pipes or sockets). However the chunks are rather small
(C<GUESTFS_MAX_CHUNK_SIZE>), so that neither the library nor the
daemon need to keep much in memory.
=head3 FUNCTIONS THAT HAVE FILEOUT PARAMETERS
The protocol for FileOut parameters is exactly the same as for FileIn
parameters, but with the roles of daemon and library reversed.
total length (header + ret,
but not including the length word itself,
and not including the chunks)
struct guestfs_message_header (encoded as XDR)
struct guestfs_<foo>_ret (encoded as XDR)
sequence of chunks for FileOut param #0
sequence of chunks for FileOut param #1 etc.
=head3 INITIAL MESSAGE
When the daemon launches it sends an initial word
(C<GUESTFS_LAUNCH_FLAG>) which indicates that the guest and daemon is
alive. This is what L<guestfs(3)/guestfs_launch> waits for.
=head3 PROGRESS NOTIFICATION MESSAGES
The daemon may send progress notification messages at any time. These
are distinguished by the normal length word being replaced by
C<GUESTFS_PROGRESS_FLAG>, followed by a fixed size progress message.
The library turns them into progress callbacks (see
L<guestfs(3)/GUESTFS_EVENT_PROGRESS>) if there is a callback
registered, or discards them if not.
The daemon self-limits the frequency of progress messages it sends
(see C<daemon/proto.c:notify_progress>). Not all calls generate
progress messages.
=head2 FIXED APPLIANCE
When libguestfs (or libguestfs tools) are run, they search a path
looking for an appliance. The path is built into libguestfs, or can
be set using the C<LIBGUESTFS_PATH> environment variable.
Normally a supermin appliance is located on this path (see
L<supermin(1)/SUPERMIN APPLIANCE>). libguestfs reconstructs this
into a full appliance by running C<supermin --build>.
However, a simpler "fixed appliance" can also be used. libguestfs
detects this by looking for a directory on the path containing all
the following files:
=over 4
=item * F<kernel>
=item * F<initrd>
=item * F<root>
=item * F<README.fixed> (note that it B<must> be present as well)
=back
If the fixed appliance is found, libguestfs skips supermin entirely
and just runs the virtual machine (using qemu or the current backend,
see L<guestfs(3)/BACKEND>) with the kernel, initrd and root disk from
the fixed appliance.
Thus the fixed appliance can be used when a platform or a Linux
distribution does not support supermin. You build the fixed appliance
on a platform that does support supermin using
L<libguestfs-make-fixed-appliance(1)>, copy it over, and use that
to run libguestfs.
=head1 SEE ALSO
L<guestfs(3)>,
L<guestfs-hacking(1)>,
L<guestfs-examples(3)>,
L<libguestfs-test-tool(1)>,
L<libguestfs-make-fixed-appliance(1)>,
L<http://libguestfs.org/>.
=head1 AUTHORS
Richard W.M. Jones (C<rjones at redhat dot com>)
=head1 COPYRIGHT
Copyright (C) 2009-2023 Red Hat Inc.
|