File: backup.rst.txt

package info (click to toggle)
cyrus-imapd 3.10.0~beta1-3
links: PTS, VCS
area: main
in suites: experimental
size: 86,332 kB
sloc: ansic: 284,810; perl: 135,824; javascript: 9,562; sh: 5,728; yacc: 2,565; cpp: 2,147; makefile: 2,133; lex: 662; xml: 621; awk: 303; python: 279; asm: 262
file content (925 lines) | stat: -rw-r--r-- 38,414 bytes
parent folder | download | duplicates (14)
.. _imap-developer-thoughts-backup:

..  Note: This document was converted from the original by Nic Bernstein
    (Onlight).  Any formatting mistakes are my fault and not the
    original author's.

Notes for backup implementation
===============================

Backup index database (one per user):

chunk::

    int id
    timestamp ts
    int offset
    int length
    text file_sha1              -> sha1 of (compressed) data prior to this chunk
    text data_sha1              -> sha1 of (uncompressed) data contained in this chunk

mailbox::

    int id
    int last_chunk_id           -> chunk that knows the current state
    char uniqueid               -> unique
    char mboxname               -> altered by a rename
    char mboxtype
    int last_uid
    int highestmodseq
    int recentuid
    timestamp recenttime
    timestamp last_appenddate
    timestamp pop3_last_login
    timestamp pop3_show_after
    timestamp uidvalidity
    char partition
    char acl
    char options
    int sync_crc
    int sync_crc_annot
    char quotaroot
    int xconvmodseq
    char annotations
    timestamp deleted           -> time that it was unmailbox'd, or NULL if still alive

message::

    int id
    char guid
    char partition              -> this is used to set the spool directory for the temp file - we might not need it
    int chunk_id
    int offset                  -> offset within chunk of dlist containing this message
    int size                    -> size of this message (n.b. not length of dlist)

mailbox_message::

    int mailbox_id
    int message_id
    int last_chunk_id           -> chunk that has a RECORD in a MAILBOX for this
    int uid
    int modseq
    timestamp last_updated
    char flags
    timestamp internaldate
    int size
    char annotations
    timestamp expunged          -> time that it was expunged, or NULL if still alive

subscription::

    int last_chunk_id           -> chunk that knows the current state
    char mboxname               -> no linkage to mailbox table, users can be sub'd to nonexistent
    timestamp unsubscribed      -> time that it was unsubscribed, or NULL if still alive

seen::

    int last_chunk_id           -> chunk that knows the current state
    char uniqueid               -> mailbox (not necessarily ours) this applies to
    timestamp lastread
    int lastuid
    timestamp lastchange
    char seenuids               -> a uid sequence encoded as a string

sieve::

    int chunk_id
    timestamp last_update
    char filename
    char guid
    int offset                  -> offset within chunk of the dlist containing this script
    timestamp deleted           -> time that it was deleted, or NULL if still alive


sieve scripts and messages are both identified by a GUID
but APPLY SIEVE doesn't take a GUID, it seems to be generated locally?
the GUID in the response to APPLY SIEVE is generated in the process of
reading the script from disk (sync_sieve_list_generate)

can't activate scripts because only bytecode files are activated, but
we neither receive bytecode files over sync protocol nor do we compile
them ourselves.

possibly reduce index size by breaking deleted/expunged values into their
own tables, such that we only store a deleted value for things that are
actually deleted.  use left join + is null to find undeleted content

messages
--------

APPLY MESSAGE is a list of messages, not necessarily only one message.
Actually, it's a list of messages for potentially multiple users, but we avoid
this by rejecting GET MESSAGES requests that span multiple users (so that
sync_client retries at USER level, and so we only see APPLY MESSAGE requests
for a single user).

Cheap first implementation is to index the start/end of the entire APPLY
MESSAGE command identically for each message within it, and at restore time
we grab that chunk and loop over it looking for the correct guid.

Ideal implementation would be to index the offset and length of each message
exactly (even excluding the dlist wrapper), but this is rather complicated
by the dlist API.

For now, we just index the offset of the dlist entry for the message,
and we can parse the pure message data back out later from that, when
we need to.  Slightly less efficient on reads, but works->good->fast.  We
need to loop over the entries in the MESSAGE dlist to find the one with the
desired GUID.

The indexed length needs to be the length of the message, not the length of the
dlist wrapper, because we need to know this cheaply to supply RECORDs in
MAILBOX responses.

renames
-------

APPLY RENAME %(OLDMBOXNAME old NEWMBOXNAME new PARTITION p UIDVALIDITY 123)

We identify mboxes by uniqueid, so when we start seeing sync data for the same
uniqueid with a new mboxname we just transparently update it anyway, without
needing to handle the APPLY RENAME.  Not sure if this is a problem...  Do we
need to record an mbox's previous names somehow?

I think it's possible to use this to rename a USER though, something like:

APPLY RENAME %(OLDMBOXNAME example.com!user.smithj NEWMBOXNAME example.com!user.jsmith ...)

-- in which case, without special handling of the RENAME command itself, there
will be a backup for the old user that ends with the RENAME, and a backup of
the new user that (probably) duplicates everything again (except for stuff
that's been expunged).

And if someone else gets given the original name, like

APPLY RENAME %(OLDMBOXNAME example.com!user.samantha-mithj NEWMBOXNAME example.com!user.smithj ...)

Then anything that was expunged from the original user but still available in
backup disappears?  Or the two backups get conflated, and samantha can
"restore" the original smithj's old mail?

Uggh.

if there's a mailboxes database pointing to the backup files, then the backup
file names don't need to be based on the userid, they could e.g. be based on
the user's inbox's uniqueid.  this would make it easier to deal with user
renames because the backup filename wouldn't need to change.  but this depends
on the uniqueid(s) in question being present on most areas of the sync
protocol, otherwise when starting a backup of a brand new user we won't be
able to tell where to store it.  workaround in the meantime could be to make
some kind of backup id from the mailboxes database, and base the filename on
this.

actually, using "some kind of backup id from the mailboxes database" is probably
the best solution.  otherwise the lock complexity of renaming a user while making
sure their new backup filename doesn't already exist is frightful.

maybe do something with mkstemp()?

furthermore: what if a mailbox is moved from one user to another?  like:

APPLY RENAME %(OLD... example.com!user.foo.something NEW... example.com!user.bar.something ...)

when a different-uid rename IS a rename of a user (and not just a folder
being moved to a different user), what does it look like?
* does it do a single APPLY RENAME for the user, and expect their folders to shake out of that?
* does it do an APPLY RENAME for each of their folders?

in the latter case, we need to append each of those RENAMEs to the old backup
so they can take effect correctly, and THEN rename the backup file itself. but
how to tell when the appends are finished?

how can we tell the difference between folder(s) moved to a different user vs
user has been renamed?

there is a setting: 'allowusermoves: 0' which, when enabled, allows users to
be renamed via IMAP rename/xfer commands.  but the default is that this is
disabled.  we could initially require this to be disabled while using backups...

not sure what the workflow looks like for renaming a user if this is not enabled.

not sure what the sync flow looks like in either case.

looking at sync_apply_rename and mboxlist_renamemailbox, it seems like we'll
see an APPLY RENAME for each affected mbox when a recursive rename is occurring.

there doesn't seem to be anything preventing user/a/foo -> user/b/foo in the
general (non-INBOX) case.

renames might be a little easier to handle if the index replicated the mailbox
hierarchy rather than just being a flat structure.  though this adds complexity
wrt hiersep handling.  something like:

mailbox:

    mboxname
        # just the name of this mbox

    parent_id
        # fk to parent mailbox

    full_mboxname
        # cached value, parent.full_mboxname + mboxname

locking
-------

just use a normal flock/fcntl lock on the data file and only open the index
if that lock succeeded

*   backup:   needs to append foo and update foo.index
*   reindex:  only needs to read foo, but needs a write lock to prevent
              writes while it does so. needs to write to (replace) foo.index
*   compact:  needs to re-write foo and foo.index
*   restore:  needs to read


verifying index
---------------

how to tell whether the .index file is the correct one for the backup data it
ostensibly represents?

one way to do this would be to have backup_index_end() store a checksum of
the corresponding data contents in the index.

when opening a backup, verify this checksum against the data, and refuse to
load the index if it doesn't match.

- sha1sum of (compressed) contents of file prior to each chunk

how to tell whether the chunk data is any good?  store a checksum of the chunk
contents along with the rest of the chunk index

- sha1sum of (uncompressed) contents of each chunk


mailboxes database
------------------

bron reckons use twoskip for this
userid -> backup_filename

lib/cyrusdb module implements this, look into that

look at conversations db code to see how to use it

need a tool:
* given a user, show their backup filename
* dump/undump
* rebuild based on files discovered in backup directory

where does this fit into the locking scheme?


reindex
-------

* convert user mailbox name to backup name
* complain if there's no backup data file?
* lock, rename .index to .index.old, init new .index
* foreach file chunk:
*   timestamp is from first line in chunk
*   complain if timestamp has gone backwards?
*   index records from chunk
* unlock
* clean up .index.old

on error:
* discard partial new index
* restore .index.old
* bail out


backupd
-------

cmdloop:
* (periodic cleanup)
* read command, determine backup name
* already holding lock ? bump timestamp : obtain lock
* write data to gzname, flush immediately
* index data

periodic cleanup:
* check timestamp of each held lock
* if stale (define: stale?), release
* FIXME if we've appended more than the chunk size we would compact to, release

sync restart:
* release each held lock

exit:
* release each held lock

need a "backup_index_abort" to complete the backup_index_start/end set.
_start should create a transaction, _end should commit it, and _abort should
roll it back.  then, if backupd fails to write to the gzip file for some
reason, the (now invalid) index info we added can be discarded too.

flushing immediately on write results in poor gzip compression, but for
incremental backups that's not a problem.  when the compact process hits the
file it will recompress the data more efficiently.


questions
---------
* what does it look like when uidvalidity changes?


restore
-------

restoration is effectively a reverse-direction replication (replicating TO master),
which means we can't necessarily supply things like uid, modseq, etc without racing
against normal message arrivals.  so instead we add an extra command to the protocol
to restore a message to a folder but let the destination determine the tasty bits.

protocol flow looks something like:

c: APPLY RESERVE ... # as usual
s: * MISSING (foo bar)
s: OK
c: APPLY MESSAGE ... # as usual
s: OK
c: RESTORE MAILBOX ... # new sync proto command
s: OK

we introduce a new command, RESTORE MAILBOX, which is similar to the existing
APPLY MAILBOX.  it specifies, for a mailbox, the mailbox state plus the message
records relevant to the restore.

the imapd/sync_server receiving the RESTORE command creates the mailbox if necessary,
and then adds the message records to it as new records (i.e. generating new uid etc).
this will end up generating new events in the backup channel's sync log, and then the
messages will be backed up again with their new uids, etc.  additional wire transfer
of message data should be avoided by keeping the same guid.

if the mailbox already exists but its uniqueid does not match the one from the backup,
then what?  this probably means user has deleted folder and contents, then made new
folder with same name.  so it's probably v common for mailbox uniqueid to not match
like this.  so we don't care about special handling for this case.  just add any
messages that aren't already there.

if the mailbox doesn't already exist on the destination (e.g. if rebuilding a server
from backups) then it's safe and good to reuse uidvalidity, uniqueid, uid, modseq etc,
such that connecting clients can preserve their state.  so the imapd/sync_server
receiving the restore request accepts these fields as optional, but only preserves
them if it's safe to do so.

* restore: sbin program for selecting and restoring messages

restore command needs options:
+ whether or not to trim deletedprefix off mailbox names to be restored
+ whether or not to restore uniqueid, highestmodseq, uid and so on
+ whether or not to limit to/exclude expunged messages
+ whether or not to restore sub-mailboxes
+ sync_client-like options (servername, local_only, partition, ...)
+ user/mailbox/backup file(s) to restore from
+ mailbox to restore to (override location in backup)
+ override acl?

can we heuristically determine whether an argument is an mboxname, uniqueid or guid?
    => libuuid uniqueid is 36 bytes of hyphen (at fixed positions) and hex digits
    => non-libuuid uniqueid is 24 bytes of hex digits
    => mboxname usually contains at least one . somewhere
    => guid is 40 bytes of hex digits

usage:
    restore [options] server [mode] backup [mboxname | uniqueid | guid]...

options:
    -A acl              # apply specified acl to restored mailboxes
    -C alt_config       # alternate config file
    -D                  # don't trim deletedprefix before restoring
    -F input-file       # read mailboxes/messages from file rather than argv
    -L                  # local mailbox operations only (no mupdate)
    -M mboxname         # restore messages to specified mailbox
    -P partition        # restore mailboxes to specified partition
    -U                  # try to preserve uniqueid, uid, modseq, etc
    -X                  # don't restore expunged messages
    -a                  # try to restore all mailboxes in backup
    -n                  # calculate work required but don't perform restoration
    -r                  # recurse into submailboxes
    -v                  # verbose
    -w seconds          # wait before starting (useful for attaching a debugger)
    -x                  # only restore expunged messages (not sure if useful?)
    -z                  # require compression (abort if compression unavailable)

mode:
    -f                  # specified backup interpreted as filename
    -m                  # specified backup interpreted as mboxname
    -u                  # specified backup interpreted as userid (default)


compact
--------

# finding messages that are to be kept (either exist as unexpunged somewhere,
# or exist as expunged but more recently than threshold)
# (to get unique rows, add "distinct" and remove mm.expunged from fields)
sqlite> select m.*, mm.expunged from message as m join mailbox_message as mm on m.id = mm.message_id and (mm.expunged is null or mm.expunged > 1437709300);
id|guid|partition|chunk_id|offset|length|expunged
1|1c7cca361502dfed2d918da97e506f1c1e97dfbe|default|1|458|2159|
1|1c7cca361502dfed2d918da97e506f1c1e97dfbe|default|1|458|2159|1446179047
1|1c7cca361502dfed2d918da97e506f1c1e97dfbe|default|1|458|2159|1446179047

# finding chunks that are still needed (due to containing last state
# of mailbox or mailbox_message, or containing a message)
sqlite> select * from chunk where id in (select last_chunk_id from mailbox where deleted is null or deleted > 1437709300 union select last_chunk_id from mailbox_message where expunged is null or expunged > 1437709300 union select chunk_id from message as m join mailbox_message as mm on m.id = mm.message_id and (mm.expunged is null or mm.expunged > 1437709300));
id|timestamp|offset|length|file_sha1|data_sha1
1|1437709276|0|3397|da39a3ee5e6b4b0d3255bfef95601890afd80709|6836d0110252d08a0656c14c2d2d314124755491
3|1437709355|1977|2129|fee183c329c011ead7757f59182116500776eaaf|a5677cfa1f5f7b627763652f4bb9b99f5970748c
4|1437709425|2746|1719|3d9f02135bf964ff0b6a917921b862c3420e48f0|7b64ec321457715ee61fe238f178f5d72adaef64
5|1437709508|3589|2890|0cee599b1573110fee428f8323690cbcb9589661|90d104346ef3cba9e419461dd26045035f4cba02

remember: a single APPLY MESSAGE line can contain many messages!

thoughts:

* need a heuristic for quickly determining whether a backup needs to be compacted

    * sum(chunks to discard, chunks to combine, chunks to split) > threshold
    * can we detect chunks that are going to significantly reduce in size as result of discarding individual lines?

* "quick" vs "full" compaction

settings:

* backup retention period
* chunk combination size (byte length or elapsed time)

combining chunks:
* size threshold below which adjacent chunks can be joined
* size threshold above which chunks should be split
* duration threshold below which adjacent chunks can be joined
* duration threshold above which chunks should be split
backup_min_chunk_size: 0 for no minimum
backup_max_chunk_size: 0 for no maximum
backup_min_chunk_duration: 0 for no minimum
backup_max_chunk_duration: 0 for no maximum
priority: size or duration??

data we absolutely need to keep:

* the most recent APPLY MAILBOX for each mailbox we're keeping (mailbox state)
* the APPLY MAILBOX containing the most recent RECORD for each message we're keeping (record state)
* the APPLY MESSAGE for each message we're keeping (message data)

data that we should practically keep:

* all APPLY MAILBOXes for a given mailbox from the chunk identified as its last
* all APPLY MAILBOXes containing a RECORD for a given message from the chunk identified as its last
* the APPLY MESSAGE for each message we're keeping

four kinds of compaction (probably at least two simultaneously):

* removing unused chunks
* combining adjacent chunks into a single chunk (for better gz compression)
* removing unused message lines from within a chunk (important after combining)
* removing unused messages from within a message line

"unused messages"
    messages for which all records have been expunged for longer
    than the retention period

"unused chunks"
    chunks which contain only unused messages

algorithm:

*   open (and lock) backup and backup.new (or bail out)
*   use backup index to identify chunks we still need
*   create a chunk in backup.new
*   foreach chunk we still need:
*       foreach line in the chunk:
*           next line if we don't need to keep it
*           create new line
*           foreach message in line:
*               if we still need the message, or if we're not doing message granularity
*                   add the message to the new line
*           write and index tmp line to backup.new
*       if the new chunk is big enough, or if we're not combining
*           end chunk and start a new one
*   end the new chunk
*   rename backup->backup.old, backup.new->backup
*   close (and unlock) backup.old and backup


command line locking utility
----------------------------

command line utility to lock a backup (for e.g. safely poking around in the
.index on a live system).

example failure:
$ctl_backups lock -f /path/to/backup
* Trying to obtain lock on /path/to/backup...
NO some error
<EOF>

example success:
$ctl_backups lock -f /path/to/backup
* Trying to obtain lock on /path/to/backup...
[potentially a delay here if we need to wait for another process to release the lock]
OK locked
[waits for its stdin to close, then unlocks and exits]

if you need to rummage around in backup.index, run this program in another
shell, do your work, then ^D it when you're finished.

you could also call this from e.g. perl over a bidirectional pipe - wait to
read "OK locked", then you've got your lock.  close the pipe to unlock when
you're finished working.  if you don't read "OK locked" before the pipe closes
then something went wrong and you didn't get the lock.

specify backups by -f filename, -m mailbox, -u userid
default run mode as above
-s to fork an sqlite of the index (and unlock when it exits)
-x to fork a command of your choosing (and unlock when it exits)


reconstruct
-----------

rebuilding backups.db from on disk files

scan each backup partition for backup files:
  * skip timestamped files (i.e. backups from compact/reindex)
  * skip .old files (old backups from reindex)
  * .index files => skip???
  * skip unreadable files
  * skip empty files
  * skip directories etc

what's the correct procedure for repopulating a cyrus database?
keep copy of the previous (presumably broken) one?

trim off mkstemp suffix (if any) to find userid
can we use a recognisable character to delimit the mkstemp suffix?

what if there's multiple backup files for a given userid? precedence?

verify found backups before recording.  reindex?

locking? what if something has a filename and does stuff with it while
reconstruct runs?

backupd always uses db for opens, so as long as reconstruct keeps the db
locked while it works, the db won't clash.  but backupd might have backups
still open from before reconstruct started, which it will write to quite
happily, even though reconstruct might decide that some other file is the
correct one for that user...

a backup server would generally be used only for backups, and sync_client
is quite resilient when the destination isn't there, so it's actually
no problem to just shut down cyrus while reconstruct runs.  no outage to
user-facing services, just maybe some sync backlog to catch up on once
cyrus is restarted.


ctl_backups
-------------

sbin tool for mass backup/index/database operations

needs:
    * rebuild backups.db from disk contents
    * list backups/info
    * rename a backup
    * delete a backup
    * verify a backup (check all sha1's, not just most recent)

not sure if these should be included, or separate tools:
    * reindex a backup (or more)
    * compact a backup (or more)
    * lock a backup
    * some sort of rolling compaction?

usage:
    ctl_backups [options] reconstruct                       # reconstruct backups.db from disk files
    ctl_backups [options] list [list_opts] [[mode] backup...] # list backup info for given/all users
    ctl_backups [options] move new_fname [mode] backup      # rename a backup (think about this more)
    ctl_backups [options] delete [mode] backup              # delete a backup
    ctl_backups [options] verify [mode] backup...           # verify specified backups
    ctl_backups [options] reindex [mode] backup...          # reindex specified backups
    ctl_backups [options] compact [mode] backup...          # compact specified backups
    ctl_backups [options] lock [lock_opts] [mode] backup    # lock specified backup

options:
    -C alt_config       # alternate config file
    -F                  # force (run command even if not needed)
    -S                  # stop on error
    -v                  # verbose
    -w                  # wait for locks (i.e. don't skip locked backups)

mode:
    -A                  # all known backups (not valid for single backup commands)
    -D                  # specified backups interpreted as domains (nvfsbc)
    -P                  # specified backups interpreted as userid prefixes (nvfsbc)
    -f                  # specified backups interpreted as filenames
    -m                  # specified backups interpreted as mboxnames
    -u                  # specified backups interpreted as userids (default)

lock_opts:
    -c                  # exclusively create backup
    -s                  # lock backup and open index in sqlite
    -x cmd              # lock backup and execute cmd
    -p                  # lock backup and wait for eof on stdin (default)

list_opts:
    -t [hours]          # "stale" (no update in hours) backups only (default: 24)


cyr_backup
----------

sbin tool for inspecting backups

needs:
    * better name?
    * list stuff
    * show stuff
    * dump stuff
    * restore?

* should lock/move/delete (single backup commands) from ctl_backups be moved here?

usage:
    cyr_backup [options] [mode] backup list [all | chunks | mailboxes | messages]...
    cyr_backup [options] [mode] backup show chunks [id...]
    cyr_backup [options] [mode] backup show messages [guid...]
    cyr_backup [options] [mode] backup show mailboxes [mboxname | uniqueid]...
    cyr_backup [options] [mode] backup dump [dump_opts] chunk id
    cyr_backup [options] [mode] backup dump [dump_opts] message guid
    cyr_backup [options] [mode] backup json [chunks | mailboxes | messages]...

options:
    -C alt_config       # alternate config file
    -v                  # verbose

mode:
    -f                  # backup interpreted as filename
    -m                  # backup interpreted as mboxname
    -u                  # backup interpreted as userid (default)

commands:
    list: table of contents, one per line
    show: indexed details of listed items, one per paragraph, detail per line
    dump: relevant contents from backup stream
    json: indexed details of listed items in json format

dump options:
    -o filename         # dump to named file instead of stdout


partitions
----------

not enough information in sync protocol to handle partitions easily?

we know what the partition is when we do an APPLY operation (mailbox, message,
etc), but the initial GET operations don't include it.  so we need to already
know where the appropriate backup is partitioned in order to find the backup
file in order to look inside it to respond to the GET request

if we have a mailboxes database (indexed by mboxname, uniqueid and userid) then
maybe that would make it feasible?  if it's not in the mailboxes database then
we don't have a backup for it yet, so we respond accordingly, and get sent
enough information to create it.

does that mean the backup api needs to take an mbname on open, and it handles
the job of looking it up in the mailboxes database to find the appropriate
thing to open?

can we use sqlite for such a database, or is the load on it going to be too
heavy?  locking?  we have lots of database formats up our sleeves here, so
even though we use sqlite for the backup index there isn't any particular
reason we're beholden to it for the mailboxes db too

if we have a mailboxes db then we need a reconstruct tool for that, too

what if we support multiple backup partitions, but don't expect these
to necessarily correspond with mailbox partitions.  they're just for spreading
disk usage around.

* when creating a backup for a previously-unseen user we'd pick a random
  partition to put them on
* ctl_backups would need a command to move an existing backup to a
  given partition
* ctl_backups would need a command to pre-create a user backup on a
  given partition for initial distribution
* instead of "backup_data_path" setting, have one-or-more
  "backuppartition-<name>" settings, ala partition- and friends

see imap/partlist.[ch] for partition list management stuff.  it's complicated
and doesn't have a test suite, so maybe save this implementation until needed.

but... maybe rename backup_data_path to backuppartition-default in the meantime,
so that when we do add this it's not a complicated reconfig to update?

partlist_local_select (and lazy-loaded partlist_local_init) are where the
mailbox partitions come from (see also mboxlist_create_partition), do something
similar for backup partitions


data corruption
---------------

backups.db:
    * can be reconstructed from on disk files at any time
    * how to detect corruption? does cyrus_db detect/repair on its own?

backup indexes:
    * can be reindexed at any time from backup data
    * how to detect corruption? assume sqlite will notice, complain?

backup data:
    * what's zlib's failure mode? do we lose the entire chunk or just the corrupt bit?
    * verify will notice sha1sum mismatches
    * dlist format will reject some kinds of corruption (but not all)
    * reindex: should skip unparseable dlist lines
    * message data has its own checksums (guid)
    * reindex: should skip messages that don't match their own checksums
    * compact: "full" compact will only keep useful data according to index
    * backupd: will sync anything that's in user mailbox but not in backup index

i think this means that if a message or mailbox state becomes corrupted in
the backup data file, and it still exists in the user's real mailbox, you
recover from the corruption by reindexing and then letting the sync process
copy the missing data back in again.  and you can tidy up the data file by
running a compact over it.

you detect data corruption in most recent chunk reactively as soon as the
backup system needs to open it again (quick verify on open)

you detect data corruption in older chunks reactively by trying to restore from
it.  may be too late: if a message needs restoring it's because user mailbox no
longer has it

you detect data corruption preemptively by running the verify tool over it.
recommend scheduling this in EVENTS/cron?

if data corruption occurs in message that's no longer in user's mailbox, that
message is lost.  it was going to be deleted from the backup after $retention
period anyway (by compact), but if it needs restoring in the meantime, sorry


installation instructions
-------------------------

(obviously, most of this won't work at this point, because the code doesn't
exist.  but this is, approximately, where things are heading.)

on your backup server:
    * compile with --enable-backup configure option and install
    * imapd.conf:
        backuppartition-default: /var/spool/backup  # FIXME better example
        backup_db: twoskip
        backup_db_path: /var/imap/backups.db
        backup_staging_path: /var/spool/backup
        backup_retention_days: 7
    * cyrus.conf SERVICES:
        backupd cmd="backupd" listen="csync" prefork=0
        (remove other services, most likely)
        (should i create a master/conf/backup.conf example file?)
    * cyrus.conf EVENTS:
        compact cmd="ctl_backups compact -A" at=0400
    * start server as usual
    * do i want a special port for backupd?

on your imap server:
    * imapd.conf:
        sync_log_channels: backup
        sync_log: 1
        backup_sync_host: backup-server.example.com
        backup_sync_port: csync
        backup_sync_authname: ...
        backup_sync_password: ...
        backup_sync_repeat_interval: ... # seconds, smaller value = livelier backups but more i/o
        backup_sync_shutdown_file: ....
    * cyrus.conf STARTUP:
        backup_sync cmd="sync_client -r -n backup"
    * cyrus.conf SERVICES:
        restored cmd="restored" [...]
    * start/restart master

files and such:
    {configdirectory}/backups.db                        - database mapping userids to backup locations
    {backuppartition-name}/<hash>/<userid>_XXXXXX       - backup data stream for userid
    {backuppartition-name}/<hash>/<userid>_XXXXXX.index - index into userid's backup data stream

do i want rhost in the path?
    * protects from issue if multiple servers are trying to back up their own version of same user
      (though this is its own problem that the backup system shouldn't have to compensate for)
    * but makes location of undifferentiated user unpredictable
    * so probably not, actually


chatting about implementation 20/10
-----------------------------------

::

    09:54 @elliefm
    here's a fun sync question
    APPLY MESSAGE provides a list of messages
    can a single APPLY MESSAGE contain messages for multiple mailboxes and/or users?
    my first hunch is that it doesn't cross users, since the broadest granularity for a single sync run is USER
    10:06 kmurchison
    We'd have to check with Bron, but I *think* messages can cross mailboxes for a single user
    10:06 @brong
    yes
    APPLY MESSAGE just adds it to the reserve list
    10:07 @elliefm
    nah apply message uploads the message, APPLY RESERVE adds it to the reserve list :P
    10:07 @brong
    same same
    APPLY RESERVE copies it from a local mailbox
    APPLY MESSAGE uploads it
    10:07 @elliefm
    yep
    10:07 @brong
    they both wind up in the reserve list
    10:07 @elliefm
    ahh i see what you mean, gotcha
    10:07 @brong
    until you send a RESTART
    ideally you want it reserve in the same partition, but it will copy the message over if it's not on the same partition
    there's no restriction on which mailbox it came from/went to
    good for user renames, and good for an append to a bunch of mailboxes in different users / shared space all at once
    (which LMTP can do)
    10:10 @elliefm
    i can handle the case where a single APPLY MESSAGE contains messages for multiple mailboxes belonging to the same user
    but i'm in trouble if a single APPLY MESSAGE can contain messages belonging to different users
    10:14 @brong
    @elliefm: why?
    10:14 @brong
    you don't have to keep them if they aren't used
    10:15 @elliefm
    for backups - when i see the apply, i need to know which user's backup to add it to.  that's easy enough if it doesn't cross users but gets mega fiddly if it does
    i'm poking around in sync client to see if it's likely to be an issue or not
    11:00 @brong_
    @elliefm: I would stage it, and add it to users as it gets refcounted in by an index file
    11:07 @elliefm
    that's pretty much what we do for ordinary sync and delivery stuff yeah?
    11:08 @brong_
    yep
    and it's what the backup thing does
    11:09 @elliefm
    i'm pretty sure that APPLY RESERVE and APPLY MESSAGE don't give a damn about users, they're just "here's every message you might not have already had since last time we spoke" and it lets the APPLY MAILBOX work out where to attach them later
    11:09 @brong_
    yep
    11:09 @elliefm
    so yeah, i'll need to do something here
    i've been working so far on the idea that a single user's backup consists of 1) an append-only gzip stream of the sync protocol chat that built it, and 2) an index that tracks current state of mailboxes, and offsets within (1) of message data
    that gets us good compression (file per user, not file per message), and if the index gets corrupted or lost, it's rebuildable purely from (1), it doesn't need a live copy of the original mailbox
    11:12 @brong
    yep, that all works
    11:12 @elliefm
    (so if you lose your imap server, you're not unable to rebuild a broken index on the backup)
    11:13 @brong
    it's easy enough to require the sync protocol stream to only contain messages per user
    though "apply reserve" is messy
    because you need to return "yes, I have that message"
    11:13 @elliefm
    with that implementation i can't (easily) keep user.a's messages from not existing in user.b's data stream (though they won't be indexed)
    11:14 @brong
    I'm not too adverse to the idea of just unpacking each message as it comes off the wire into a temporary directory
    11:14 @elliefm
    (because at the time i'm receiving the sync data i don't know which it needs to go in, so if they come in in the same reserve i'd need to append them to both data streams)
    which isn't a huge problem, just… irks me a bit
    11:14 @brong
    and then reading the indexes as they come in, checking against the state DB to see if we already have them, and streaming them into the gzip if they aren't there yet
    what we can do is something like the current format, where files go into a tar
    11:16 @elliefm
    i guess the fiddly bit there is that there's one more moving part to keep synchronised across failure states
    a backup for a single user becomes 1) data stream + 2) any messages that were uploaded but not yet added to a mailbox + 3) index (which doesn't know what to do with (2))
    which in the general case is fine, the next sync will update the mailboxes, which will push (2) into (1) and index it nicely, and on we go
    but it's just a little bit more mess if there's a failure that you need to recover from between those states — it's no longer a simple case of "it's in the backup and we know everything about it" or "it doesn't exist", there's a third case of "well we might have the data but don't really know what to do with it"
    the other fiddly bit is that the process of appending to the data stream is suddenly in the business of crafting output rather than simply dumping what it gets, which isn't really burdensome, but it is one more little crack for bugs to crawl into
    i guess in terms of sync protocol, one thing i could do on my end is identify apply operations that seem to contain multiple users' data, and just return an error on those.  the sync client on the other end will promote them until they're eventually user syncs, which i think are always user granularity
    11:50 @elliefm
    i think for now, first stage implementation will be to stream the reserve/message commands in full to every user backup they might apply to.  and optimising that down so that each stream only contains messages belonging to that user can be a future optimisation


todo list
---------

* clean up error handling
* perl tool to anonymise sync proto talk
* verification step to check entire data stream for errors (even chunks that aren't indexed)
* prot_fill_cb: extra argument to pass back an error string to prot_fill
* ctl_backups verify: set level
* backupd: don't block on locked backups, return mailbox locked -- but sync_client doesn't handle this
* test multiple backup partitions
* configure: error if backups requested and we don't have zlib
* valgrind
* finish reconstruct
* compact: split before append?

compact implementation steps:
    1 remove unused chunks, keep everything else as is
    2 join adjacent chunks if small enough, split large chunks
    3 parse/rebuild message lines
    4 discard unused mailbox lines