1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276
|
Copyright (c) 2002-2008 by Heinz-Josef Claes (see README)
Published under the GNU General Public License v3 or any later version
Before explaining some examples, here are some important aspects about
how storeBackup works: (The following explains the principle
mechanisms, for performance reasons it's implemented a little bit
different. There are several waiting queues, parallelisms and a tiny
scheduler inside which are not described here.)
storeBackup uses two internal flat files in each generated backup:
.md5CheckSums.info - general information about the backup
.md5CheckSums[.bz2] - information about every file (dir, etc.) saved
When starting storeBackup.pl, it will basically do (beside some other things):
1) read the contents of the previous .md5CheckSums[.bz2] file and store it
in two dbm databases: dbm(md5sum) and dbm(filename)
(dbm(md5sum) means, that md5sum is the key)
2) read the contents of other .md5CheckSums[.bz2] files (otherBackupDirs)
and store it to dbm(md5sum). Always store the file with the lowest
inode number in the dbm file if two different files (e.g. from
different backup series) are identical. This assures, that multiple
versions of the same file in different backups are unified in the
future.
A) (describes backup without sharing of files, examples below 1 and 2)
In a loop over all files to backup it will do:
1) look into dbm(filename) -- which contains all files from the previous
backup -- if the exact same file exists and has not
changed. In this case, the needed information are the values of
dbm(filename).
If it existed in the previous backup(s), make a hard link and go to 3)
2) calculate the md5 sum of the file to backup
look into dbm(md5sum) for that md5 sum
if it exists there, make a hard link
if it doesn't exist, copy or compress the file
3) write the information of the new file to the corresponding
.md5CheckSums[.bz2] file
B) (describes backup with sharing of files, examples below 3 and 4)
In a loop over all files to backup it will do:
1) look into dbm(filename) -- which contains all files from the previous
backup -- if the exact same file exists and has not
changed. In this case, the needed information are the values of
dbm(filename).
(Now, because we have independant backups, it is possible, that
a file with the same contents exists in another backup series. So we
have to look into the dbm(md5sum) to ensure linking to the same file
from all different backup series.)
2) calculate the md5 sum of the file to backup if not known from
step 1)
look into dbm(md5sum) for that md5 sum
if it exists there, make a hard link
if it doesn't exist, copy or compress the file
3) write the information of the new file to the corresponding
.md5CheckSums[.bz2] file
C) (describes using Option --lateLinks, example 6 below)
If you save your backup via NFS to a server, then most of the time will
be spent for setting hard links. Setting a hard link is very fast, but
if you have many thousands of them it takes some time.
You can avoid waiting for hard linking if you use the option --lateLinks:
1) make a backup with storeBackup and set --lateLinks (or set
lateLinks = yes
in the configuration file. It will not generate one hard link, only
a file will be written with the information what has to be linked.
2) as a separate step, call storeBackupUpdateBackup to set all the
required hard links to make full backups out of these incomplete
backups. Please see also "how does it work with 'latelinks'" in
the README file for a more detailed explanation.
Conclusions:
1) Everything depends on the existence of valid .md5CheckSums files!
You have to preconceive this when making backups with otherBackupDirs.
2) Do not delete a backup to which the hard links are not yet generated.
Use storeBackupUpdateBackup to set the hard links and check consistency.
It's a good idea to only use storeBackup or storeBackupDel for the
deletition of old backups.
3) All sharing of data in the backups is done via hard links. This means:
- A backup series cannot be split over different partitions.
- If you want to share data between different backup series, all backups
must reside on the same partition.
4) Every information of a backup in the .md5CheckSums is stored with relative
paths. It does not matter if you change the absolute path to the backup
or backup with a different machine (server makes backup from client via NFS
--- client makes backup to server via NFS).
If you have additional ideas or any questions, feel free to contact me
(hjclaes@web.de).
The examples are explaned with command line parameters. But it is a
good idea to use a configuration file!!!
Simply call:
# storeBackup.pl --generate <configFile>
Edit the configuration file and call storeBackup in the following way:
# storeBackup.pl -f <configFile>
You can override settings in the configuration file via command line
(see EXAMPLE 6).
EXAMPLE 1:
==========
simple backup without any special requirement
---------------------------------------------
backup source tree '/home/jim' to '/backup/jim':
# storeBackup.pl -s /home/jim --backupDir /backup/jim \
-l /tmp/storeBackup.log
will do the job and write the log to '/tmp/storeBackup.log'
EXAMPLE 2:
==========
backup of more than one directory at the same time
--------------------------------------------------
Unfortunately, for historical reasons, storeBackup can only handle one
directory to backup, but there is another mechanism to overcome this
problem:
you will backup '/home/jim', '/etc' and '/home/greg/important'
to '/backup/stbu'
1) make a special directory, eg. mkdir '/opt/stbu'
2) cd /opt/stbu
3) ln -s . stbu
4) ln -s /home/jim home_jim
5) ln -s /etc etc
6) ln -s /home/greg/important home_greg_important
7) write a short script 'backup.sh':
#! /bin/sh
<PATH>/storeBackup.pl -s /opt/stbu --backupDir /backup/stbu \
-l /tmp/storeBackup.log --followLinks 1
8) chmod 755 backup.sh
Whenever you start this script, you will backup the wanted directories
and your short script. You need to be root to have the required
permissions to read the directories in this example.
(Step 2 will result in a directory identical to stbu in your backup)
EXAMPLE 3:
==========
make a backup of the whole machine once a week and small backups every day
-------------------------------------------------------------------------
1) your machine mounts files from other servers at '/net' (you don't
want to backup this)
2) you don't want to save '/tmp' and '/var/tmp'
3) you want to save the whole machine once a week to
'/net/server/backup/weekly' (which takes some time)
4) you want to save '/home/jim' and '/home/tom/texts' to
'/net/server/backup/daily' more quickly after you finished your work
5) naturally, you want to share the data between the two backup series
6) You should not start both backup scripts at the same time! This can
result in a not 100% sharing of files between the two backups. But
this is automatically corrected over time and does not cause any
problems.
To perform the steps described above, you need to do the following:
1) for the daily backup, you make a special directory:
mkdir /opt/small-backup
cd /opt/small-backup
ln -s . small-backup
ln -s /home/jim home_jim
ln -s /home/tom/texts home_tom_texts
and write a backup script 'myBackup.sh':
#! /bin/sh
<PATH>/storeBackup.pl -s /opt/small-backup --backupDir /net/server/backup \
-S daily -l /tmp/storeBackup.log --followLinks 1 0:weekly
2) script for weekly backup:
#! /bin/sh
<PATH>/storeBackup.pl -s / --backupDir /net/server/backup -S weekly \
-l /tmp/storeBackup.log --exceptDirs /net -e /tmp -e /var/tmp \
-e /proc -e /sys -e /dev 0:daily
The '0' before the paths (like '0:weekly') means to
take the last backup of the other backup series to check for identical
files.
EXAMPLE 4:
==========
make backups from different machines (not coordinated) and share the data
-------------------------------------------------------------------------
1) you have a server called 'server' with a separate disk which is mounted
at '/disk1' to '/disk1/server'
2) you want to backup machine 'client1' which mounts disk1 of the server at
'/net/server/disk1' to '/net/server/disk1/client1'
3) you want to backup machine 'client2' which mounts disk1 of the server at
'/net/server/disk1' to '/net/server/disk1/client2'
4) the backup of the server runs nightly, independent of the other backups
5) the backups of the clients run uncoordinated, that means perhaps at the
same time
6) you want to share all the data in the backup
7) you can also make small backups of parts or the source (with data sharing),
but that's the same mechanism and not detailed in this example
1) script for the server:
#! /bin/sh
<PATH>storeBackup.pl -s / --backupDir /disk1 -S server -l /tmp/storeBackup.log \
-e /tmp -e /var/tmp -e /disk1 -e /sys -e /dev -e /proc 0:client1 0:client2
2) scripts for client1:
#! /bin/sh
<PATH>/storeBackup.pl -s / --backupDir /net/server/disk1 -S client1 \
-l /tmp/storeBackup.log -e /tmp -e /var/tmp -e /disk1 -e /sys -e /dev \
-e /proc 0:client1 0:client2
3) scripts for client2:
#! /bin/sh
<PATH>/storeBackup.pl -s / --backupDir /net/server/disk1 -S client2 \
-l /tmp/storeBackup.log -e /tmp -e /var/tmp -e /disk1 -e /sys -e /dev \
-e /proc 0:server 0:client1
EXAMPLE 5:
==========
make a backup with different keepTimes for some directories
-----------------------------------------------------------
You can do this very easy and obvious with the following (from the
previous examples) known trick. Lets say you want to keep your backup
for 60 days and all files in the directory 'notimportant' for only 7
days.
Simply make two backups, one with --keepAll 60d and exclude directory
'notimportant'. Make the second backup with --keepAll 7d for the
missing directory. Like described in EXAMPLE 3, create a relationship
between the backups. So, if you move or copy a file between
'notimportant' and the rest of your saved directories, you will not
use additional space for the file.
EXAMPLE 6:
==========
make a backup via NFS as fast as possible (with lateLinks)
----------------------------------------------------------
1) Configure storeBackup to make a backup to you backup diretory via
NFS. You configure all options in the configuration file <cf1>
and you set among others:
lateLinks = yes
lateCompress = yes
doNotDelete = yes
If you have a low bandwidth, there is no need to set lateCompress to 'yes'.
Because of 'doNotDelete = yes' you will not have to wait for the
deletition of old backups.
2) Make your backup(s). (Like always, the very first backup will be slow.)
You do not have to do anything more from your client (NFS client).
3) Start (via cron) on the server (NFS server = backup server):
storeBackupUpdateBackup.pl -f <cf1> --topLevel <topLevelDir> \
-l /tmp/stbuUpdate.log
This will overwrite the topLevel path in the configuration file which
probably will be different on the server.
4) Start (via cron) on the server:
storeBackupDel.pl -f <cf1> --topLevel <topLevelDir> \
--unset doNotDelete
This will overwrite (unset) also the doNotDelete flag in the configuration
file.
|