File: dumperapi.txt

package info (click to toggle)
amanda 1%3A2.5.2p1-4
links: PTS
area: main
in suites: lenny
size: 10,280 kB
ctags: 7,298
sloc: ansic: 84,433; sh: 14,160; xml: 8,467; perl: 2,821; makefile: 1,205; lex: 459; awk: 423; yacc: 240; tcl: 118; sql: 19
file content (325 lines) | stat: -rw-r--r-- 15,802 bytes

        Chapter 23. Amanda dumper API
Prev  Part V. Technical Background  Next

-------------------------------------------------------------------------------

Chapter 23. Amanda dumper API


Alexandre Oliva

Original text
AMANDA Core Team
<oliva@dcc.unicamp.br>>

Stefan G. Weichinger

XML-conversion;Updates
AMANDA Core Team
<sgw@amanda.org>
Table of Contents


  Introduction

  The_Problem

  Overview_of_the_API


        The_`support'_command


  The_`selfcheck'_command

  The_`estimate'_and_`estimate-parse'_commands

  The_`backup'_and_`backup-parse'_commands

  The_`index-from-output'_and_`index-from-image'_commands

  The_`restore'_command

  The_`print-command'_command

  Conclusion


 Introduction

This is a proposal of a mechanism for Amanda to support arbitrary backup
programs, that relies on a generic backup driver and scripts or programs that
interface with backup programs such as dump, tar, smbclient, and others. It can
also be used to introduce pre- and post-backup commands.
The interface is simple, but supports everything that is currently supported by
Amanda, and it can be consistently extended to support new abstractions that
may be introduced in the backup driver in the future.
This proposal does not imply any modification in the Amanda protocol or in
Amanda servers; only Amanda clients have to be modified. By Amanda clients, we
refer to hosts whose disks are to be backed up; an Amanda server is a host
connected to a tape unit.
Currently (as of release 2.4.1 of Amanda), Amanda clients support three
operations: selfcheck, estimate and backup.
Selfcheck is used by the server program amcheck, to check whether a client is
responding or if there are configuration or permission problems in the client
that might prevent the backup from taking place.
Estimates are requested by the Amanda planner, that runs on the server and
collects information about the expected sizes of backups of each disk at
several levels. Given this information and the amount of available tape space,
the planner can select which disks and which levels it should tell dumper to
run.
dumper is yet another server-side program; it requests clients to perform
dumps, as determined by planner, and stores these dumps in holding disks or
sends them directly to the taper program. The interaction between dumper and
taper is beyond the scope of this text.
We are going to focus on the interaction between the Amanda client program and
wrappers of dump programs. These wrappers must implement the DUMPER API. The
dumptype option `program' should name the wrapper that will be used to back up
filesystems of that dumptype. One wrapper may call another, so as to extend its
functionality.

 The Problem

Different backup programs present distinct requirements; some must be run as
super-user, whereas others can be run under other user-ids. Some require a
directory name, the root of the tree to be backed up; others prefer a raw
device name; some don't even refer to local disks (SAMBA). Some wrappers may
need to know a filesystem type in order to decide which particular backup
program to use (dump, vdump, vxdump, xfsdump, backup).
Some provide special options for estimates, whereas others must be started as
if a complete dump were to be performed, and must be killed as soon as they
print an estimate.
Furthermore, the output formats of these backup programs vary wildly. Some will
print estimates and total sizes in bytes, in 512-byte tape blocks units, in
Kbytes, Mbytes, Gbytes, and possibly Tbytes in the near future. Some will print
a timestamp for the backup; some won't.
There are also restrictions related with possible scheduling policies. For
example, some backup programs only support full backups or incrementals based
on the last full backup (0-1). Some support full backups or incrementals based
on the last backup, be it a full or an incremental backup (0-inf++). Some
support incrementals based on a timestamp (incr/date); whereas others are based
on a limited number of incremental levels, but incrementals of the same level
can be repeated, such as dump (0-9).
Amanda was originally built upon DUMP incremental levels, so this is the only
model it currently supports. Backup programs that use other incremental
management mechanisms had to be adapted to this policy. Wrapper scripts are
responsible for this adaptation.
Another important issue has to do with index generation. Some backup programs
can generate indexes, but each one lists files in its own particular format,
but they must be stored in a common format, so that the Amanda server can
manipulate them.
The DUMPER API must accomodate for all these variations.

 Overview of the API

We are going to define a standard format of argument lists that the backup
driver will provide to wrapper programs, and the expected result of the
execution of these wrappers.
The first argument to a wrapper should always be a command name. If no
arguments are given, or an unsupported command is requested, an error message
should be printed to stderr, and the program should terminate with exit status
1.

 The `support' command

As a general mechanism for Amanda to probe for features provided by a backup
program, a wrapper script must support at least the `support' command. Some
features must be supported, and Amanda won't ever ask about them. Others will
be considered as extensions, and Amanda will ask the wrapper whether they are
supported before issuing the corresponding commands.

 The `level-incrementals' subcommand

For example, before requesting for an incremental backup of a given level,
Amanda should ask the wrapper whether the backup program supports level-based
incrementals. We don't currently support backup programs that don't, but we may
in the future, so it would be nice if wrappers already implemented the command
`support level-incrementals', by returning a 0 exit status, printing, say, the
maximum incremental level it supports, i.e., 9. A sample session would be:

  % /usr/local/amanda/libexec/wrappers/dump support level-incrementals hda0 9
  	

Note that the result of this support command may depend on filesystem
information, so the disklist filesystem entry should be specified as a command
line argument. In the next examples, we are not going to use full pathnames to
wrapper scripts any more.
We could have defined a `support' command for full backups, but I can't think
of a backup program that does not support full backups...

 The `index' subcommand

The ability to produce index files is also subject to an invocation of
`support' command. When the support sub-command is `index', like in the
invocation below, the wrapper must print a list of valid indexing mechanisms,
one per line, most preferred first. If indexing is not supported, nothing
should be printed, and the exit status should be 1.
DUMP support index hda0
The currently known indexing mechanisms are:
output: implies that the command `index-from-output' generates an index file
from the output produced by the backup program (for example, from tar -cv).
image: implies that the command `index-from-image' generates an index file from
a backup image (for example, tar -t).
direct: implies that the `backup' command can produce an index file as it
generates the backup image.
parse: implies that the `backup-parse' command can produce an index file as it
generates the backup formatted output .
The indexing mechanisms will be explicitly requested with the additionnal
option `index-<mode>' in the `backup' and `backup-parse' command invocation.
`index-from-image' should be supported, if possible, even if other index
commands are not, since it can be used in the future to create index files from
previously backed up filesystems.

 The `parse-estimate' subcommand

The `parse-estimate' support subcommand print a list of valid mechanisms to
parse the estimate output and write the estimate size to its output, the two
mechanisms are:
direct: implies that the `estimate' command can produce the estimate output.
parse: implies that the `estimate-parse' command can produce the estimate
output when fed with the `estimate' output.
The estimate parsing mechanisms will be explicitly requested with the
additionnal option `estimate-<mode>' in the `estimate' and `estimate-parse'
command invocation.

 The `parse-backup' subcommand

The `parse-backup' support subcommand print a list of valid mechanisms to parse
the backup stderr, the two mechanisms are:
direct: implies that the `backup' command can produce the backup-formatted-
ouput.
parse: implies that the `backup-parse' command can produce the backup-
formatted-ouput when fed with the `backup' stderr.
The backup parsing mechanisms will be explicitly requested with the additional
option `backup-<mode>' in the `backup' and `backup-parse' command invocation.

 Other subcommands

Some other standard `support' sub-commands are `exclude' and `exclude-list'.
One may think (and several people did :-) ) that there should be only one
support command, that would print information about all supported commands. The
main arguments against this proposal have to do with extensibility:
The availability of commands might vary from filesystem to filesystem. No, I
don't have an example, I just want to keep it as open as possible :-) one
support subcommand may require command line arguments that others don't, and we
can't know in advance what these command line arguments are going to be
The output format and exit status conventions of a support command may vary
from command to command; the only pre-defined convention is that, if a wrapper
does not know about a support subcommand, it should return exit status 1,
implying that the inquired feature is not supported.

 The `selfcheck' command

We should support commands to perform self-checks, run estimates, backups and
restores (for future extensions of the Amanda protocol so as to support
restores)
A selfcheck request would go like this:
DUMP selfcheck hda0 option option=value ...
The options specified as command-line arguments are dumptype options enabled
for that disk, such as `index', `norecord', etc. Unknown options should be
ignored. For each successful check, a message such as:
OK [/dev/hda0 is readable] OK [/usr/sbin/dump is executable]
Errors should be printed as:
ERROR [/etc/dumpdates is not writable]
A wrapper script will certainly have to figure out either the disk device name
or its mount point, given a filesystem name such as `hda0', as specified in the
disklist. In order to help these scripts, Amanda provides a helper program that
can guess device names, mount points and filesystem types, when given disklist
entries.
The filesystem type can be useful on some operation systems, in which more than
one dump program is available; this information can help automatically
selecting the appropriate dump program.
The exit status of selfcheck and of this alternate script are probably going to
be disregarded. Anyway, for consistency, selfcheck should return exit status 0
for complete success, 1 if any failures have occurred.

 The `estimate' and `estimate-parse' commands

Estimate requests can be on several different forms. An estimate of a full
backup may be requested, or estimates for level- or timestamp-based
incrementals:
DUMP estimate full hda0 option ... DUMP estimate level 1 hda0 option ... DUMP
estimate diff 1998:09:24:01:02:03 hda0 option ...
If requested estimate type is not supported, exit status 3 should be returned.
If the option `estimate-direct' is set, then the `estimate' command should
write to stdout the estimated size, in bytes, a pair of numbers that,
multiplied by one another, yield the estimated size in bytes.
If the option `estimate-parse' is set, then the `estimate' command should write
to stdout the informations needed by the `estimate-parse' command, that should
extract from its input the estimated size.
The syntax of `estimate-parse' is identical to that of `estimate'.
Both `estimate' and `estimate-parse' can output the word `KILL', after printing
the estimate. In this case, Amanda will send a SIGTERM signal to the process
group of the `estimate' process. If it does not die within a few seconds, a
SIGKILL will be issued.
If `estimate' or `estimate-parse' succeed, they should exit 0, otherwise exit
1, except for the already listed cases of exit status 3.

 The `backup' and `backup-parse' commands

The syntax of `backup' is the same as that of `estimate'. The backup image
should be written to standard output, whereas stderr should be used for the
user-oriented output of the backup program and other messages.
If the option `backup-direct' is set, then the `backup' command should write to
stderr a formatted-output-backup.
If the option `backup-parse' is set, then the `backup' command should write to
stderr the informations needed by the `backup-parse' command, that should edit
its input so that it prints to standard output a formatted-output-backup.
If the option `no-record' is set, then the `backup' command should not modify
its state file (ex. dump should not modify /etc/dumpdates).
The syntax of `backup-parse' is identical to that of `backup'.
The syntax of the formatted-output-backup is as follow: All lines should start
with either `| ' for normal output, `? ' for strange output or `& ' for error
output. If the wrapper can determine the total backup size from the output of
the backup program, it should print a line starting with `# ', followed by the
total backup size in bytes or by a pair of numbers that, multiplied, yield the
total backup size; this number will be used for consistency check.
The option `index-direct' should cause commands `backup' to output the index
directly to file descriptor 3. The option `index-parse' should cause commands
`backup-parse' to output the index directly to file descriptor 3. The syntax of
the index file is described in the next section.

 The `index-from-output' and `index-from-image' commands

The syntax of the `index-from-output' and `index-from-image' commands is
identical to the one of `backup'. They are fed the backup output or image, and
they must produce a list of files and directories, one per line, to the
standard output. Directories must be identified by the `/' termination.
After the file name and a blank space, any additional information about the
file or directory, such as permission data, size, etc, can be added. For this
reason, blanks and backslashes within filenames should be quoted with
backslashes. Linefeeds should be represented as `\n', although it is not always
possible to distinguish linefeeds in the middle of filenames from ones that
separate one file from another, in the output of, say `restore -t'. It is not
clear whether we should also support quoting mechanisms such as `\xHH', `\OOO'
or `\uXXXX'.

 The `restore' command

Yet to be specified.

 The `print-command' command

This command must be followed by a valid backup or restore command, and it
should print a shell-command that would produce an equivalent result, i.e.,
that would perform the backup to standard output, or that would restore the
whole filesystem reading from standard input. This command is to be included in
the header of backup images, to ease crash-recovery.

 Conclusion

Well, that's all. Drop us a note at the amanda-hackers mailing list (mailto://
amanda-hackers@amanda.org) if you have suggestions to improve this document
and/or the API. Some help on its implementation would be welcome, too.

Note

Refer to http://www.amanda.org/docs/dumperapi.html for the current version of
this document.
-------------------------------------------------------------------------------

Prev                                     Up                           Next
Chapter 22. How Amanda uses UDP and TCP Home  Chapter 24. Amanda Internals
ports