1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
|
% bup-split(1) Bup %BUP_VERSION%
% Avery Pennarun <apenwarr@gmail.com>
% %BUP_DATE%
# NAME
bup-split - save individual files to bup backup sets
# SYNOPSIS
bup split \[-t\] \[-c\] \[-n *name*\] COMMON\_OPTIONS
bup split -b COMMON\_OPTIONS
bup split --copy COMMON\_OPTIONS
bup split --noop \[-t|-b\] COMMON\_OPTIONS
COMMON\_OPTIONS
~ \[-r *host*:*path*\] \[-v\] \[-q\] \[-d *seconds-since-epoch*\] \[\--bench\]
\[\--max-pack-size=*bytes*\] \[-#\] \[\--bwlimit=*bytes*\]
\[\--max-pack-objects=*n*\] \[\--fanout=*count*\]
\[\--keep-boundaries\] \[\--git-ids | filenames...\]
# DESCRIPTION
`bup split` concatenates the contents of the given files
(or if no filenames are given, reads from stdin), splits
the content into chunks of around 8k using a rolling
checksum algorithm, and saves the chunks into a bup
repository. Chunks which have previously been stored are
not stored again (ie. they are 'deduplicated').
Because of the way the rolling checksum works, chunks
tend to be very stable across changes to a given file,
including adding, deleting, and changing bytes.
For example, if you use `bup split` to back up an XML dump
of a database, and the XML file changes slightly from one
run to the next, nearly all the data will still be
deduplicated and the size of each backup after the first
will typically be quite small.
Another technique is to pipe the output of the `tar`(1) or
`cpio`(1) programs to `bup split`. When individual files
in the tarball change slightly or are added or removed, bup
still processes the remainder of the tarball efficiently.
(Note that `bup save` is usually a more efficient way to
accomplish this, however.)
To get the data back, use `bup-join`(1).
# MODES
These options select the primary behavior of the command, with -n
being the most likely choice.
-n, \--name=*name*
: after creating the dataset, create a git branch
named *name* so that it can be accessed using
that name. If *name* already exists, the new dataset
will be considered a descendant of the old *name*.
(Thus, you can continually create new datasets with
the same name, and later view the history of that
dataset to see how it has changed over time.) The original data
will also be available as a top-level file named "data" in the VFS,
accessible via `bup fuse`, `bup ftp`, etc.
-t, \--tree
: output the git tree id of the resulting dataset.
-c, \--commit
: output the git commit id of the resulting dataset.
-b, \--blobs
: output a series of git blob ids that correspond to the chunks in
the dataset. Incompatible with -n, -t, and -c.
\--noop
: read the data and split it into blocks based on the "bupsplit"
rolling checksum algorithm, but don't store anything in the repo.
Can be combined with -b or -t to compute (but not store) the git
blobs or tree ids for the dataset. This is mostly useful for
benchmarking and validating the bupsplit algorithm. Incompatible
with -n and -c.
\--copy
: like `--noop`, but also write the data to stdout. This can be
useful for benchmarking the speed of read+bupsplit+write for large
amounts of data. Incompatible with -n, -t, -c, and -b.
# OPTIONS
-r, \--remote=*host*:*path*
: save the backup set to the given remote server. If *path* is
omitted, uses the default path on the remote server (you still
need to include the ':'). The connection to the remote server is
made with SSH. If you'd like to specify which port, user or
private key to use for the SSH connection, we recommend you use
the `~/.ssh/config` file. Even though the destination is remote,
a local bup repository is still required.
-d, \--date=*seconds-since-epoch*
: specify the date inscribed in the commit (seconds since 1970-01-01).
-q, \--quiet
: disable progress messages.
-v, \--verbose
: increase verbosity (can be used more than once).
\--git-ids
: stdin is a list of git object ids instead of raw data.
`bup split` will read the contents of each named git
object (if it exists in the bup repository) and split
it. This might be useful for converting a git
repository with large binary files to use bup-style
hashsplitting instead. This option is probably most
useful when combined with `--keep-boundaries`.
\--keep-boundaries
: if multiple filenames are given on the command line,
they are normally concatenated together as if the
content all came from a single file. That is, the
set of blobs/trees produced is identical to what it
would have been if there had been a single input file.
However, if you use `--keep-boundaries`, each file is
split separately. You still only get a single tree or
commit or series of blobs, but each blob comes from
only one of the files; the end of one of the input
files always ends a blob.
\--bench
: print benchmark timings to stderr.
\--max-pack-size=*bytes*
: never create git packfiles larger than the given number
of bytes. Default is 1 billion bytes. Usually there
is no reason to change this.
\--max-pack-objects=*numobjs*
: never create git packfiles with more than the given
number of objects. Default is 200 thousand objects.
Usually there is no reason to change this.
\--fanout=*numobjs*
: when splitting very large files, try and keep the number
of elements in trees to an average of *numobjs*.
\--bwlimit=*bytes/sec*
: don't transmit more than *bytes/sec* bytes per second
to the server. This is good for making your backups
not suck up all your network bandwidth. Use a suffix
like k, M, or G to specify multiples of 1024,
1024\*1024, 1024\*1024\*1024 respectively.
-*#*, \--compress=*#*
: set the compression level to # (a value from 0-9, where
9 is the highest and 0 is no compression). The default
is 1 (fast, loose compression)
# EXAMPLES
$ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
tar: Removing leading /' from member names
Indexing objects: 100% (196/196), done.
$ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
1961
# SEE ALSO
`bup-join`(1), `bup-index`(1), `bup-save`(1), `bup-on`(1), `ssh_config`(5)
# BUP
Part of the `bup`(1) suite.
|