File: muchsync.1.md

package info (click to toggle)
muchsync 6-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 580 kB
  • sloc: cpp: 3,860; sh: 982; makefile: 17
file content (332 lines) | stat: -rw-r--r-- 14,632 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
% muchsync(1)
% David Mazieres
% 

# NAME

muchsync - synchronize maildirs and notmuch databases

# SYNOPSIS

muchsync _options_ \
muchsync _options_ _server-name_ _server-options_ \
muchsync _options_ --init _maildir_ _server-name_ _server-options_

# DESCRIPTION

muchsync synchronizes the contents of maildirs and notmuch tags across
machines.  Any given execution runs pairwise between two replicas, but
the system scales to an arbitrary number of replicas synchronizing in
arbitrary pairs.  For efficiency, version vectors and logical
timestamps are used to limit synchronization to items a peer may not
yet know about.

To use muchsync, both muchsync and notmuch should be installed
someplace in your PATH on two machines, and you must be able to access
the remote machine via ssh.

In its simplest usage, you have a single notmuch database on some
server `SERVER` and wish to start replicating that database on a
client, where the client currently does not have any mailboxes.  You
can initialize a new replica in `$HOME/inbox` by running the following
command:

    muchsync --init $HOME/inbox SERVER

This command may take some time, as it transfers the entire contents
of your maildir from the server to the client and creates a new
notmuch index on the client.  Depending on your setup, you may be
either bandwidth limited or CPU limited.  (Sadly, the notmuch library
on which muchsync is built is non-reentrant and forces all indexing to
happen on a single core at a rate of about 10,000 messages per
minute.)

From then on, to synchronize the client with the server, just run:

    muchsync SERVER

Since muchsync replicates the tags in the notmuch database itself, you
should consider disabling maildir flag synchronization by executing:

    notmuch config set maildir.synchronize_flags=false

The reason is that the synchronize\_flags feature only works on a
small subset of pre-defined flags and so is not all that useful.
Moreover, it marks flags by renaming files, which is not particularly
efficient.  muchsync was largely motivated by the need for better flag
synchronization.  If you are satisfied with the synchronize\_flags
feature, you might consider a tool such as offlineimap as an
alternative to muchsync.


## Synchronization algorithm

muchsync separately synchronizes two classes of information:  the
message-to-directory mapping (henceforth link counts) and the
message-id-to-tag mapping (henceforth tags).  Using logical
timestamps, it can detect update conflicts for each type of
information.  We describe link count and tag synchronization in turn.

Link count synchronization consists of ensuring that any given message
(identified by its collision-resistant content hash) appears the same
number of times in the same subdirectories on each replica.  Generally
a message will appear only once in a single subdirectory.  However, if
the message is moved or deleted on one replica, this will propagate to
other replicas.

If two replicas move or copy the same file between synchronization
events (or one moves the file and the other deletes it), this
constitutes an update conflict.  Update conflicts are resolved by
storing in each subdirectory a number of copies equal to the maximum
of the number of copies in that subdirectory on the two replicas.
This is conservative, in the sense that a file will never be deleted
after a conflict, though you may get extra copies of files.  (muchsync
uses hard links, so at least these copies will not use too much disk
space.)

For example, if one replica moves a message to subdirectory .box1/cur
and another moves the same message to subdirectory .box2/cur, the
conflict will be resolved by placing two links to the message on each
replica, one in .box1/cur and one in .box2/cur.  To respect the
structure of maildirs, subdirectories ending `new` and `cur` are
special-cased; conflicts between sibling `new` and `cur`
subdirectories are resolved in favor of `cur` without creating
additional copies of messages.

Message tags are synchronized based on notmuch's message-ID (usually
the Message-ID header of a message), rather than message contents.  On
conflict, tags are combined as follows.  Any tag in the notmuch
configuration parameter `muchsync.and_tags` is removed from the
message unless it appears on both replicas.  Any other tag is added if
it appears on any replica.  In other words, tags in
`muchsync.and_tags` are logically anded, while all other flags are
logically ored.  (This approach will give the most predictable results
if `muchsync.and_tags` has the same value in all your replicas.  The
`--init` option ensures uniform configurations initially, but
subsequent changes to `muchsync.and_tags` must be manually
propagated.)

If your configuration file does not specify a value for
`muchsync.and_tags`, the default is to use the set of tags specified
in the `new.tags` configuration option.  This should give intuitive
results unless you use a two-pass tagging system such as the afew
tool, in which case `new.tags` is used to flag input to the second
pass while you likely want `muchsync.and_tags` to reflect the output
of the second pass.

## File deletion

Because publishing software that actually deletes people's email is a
scary prospect, muchsync for the moment never actually deletes mail
files.  Though this may change in the future, for the moment muchsync
moves any deleted messages to the directory `.notmuch/muchsync/trash`
under your mail directory (naming deleted messages by their content
hash).  If you really want to delete mail to reclaim disk space or for
privacy reasons, you will need to run the following on each replica:

    cd "$(notmuch config get database.path)"
    rm -rf .notmuch/muchsync/trash

# OPTIONS

\-C _file_, \--config _file_
:   Specify the path of the notmuch configuration file to use.  If
    none is specified, the default is to use the contents of the
    environment variable \$NOTMUCH_CONFIG, or if that variable is
    unset, the value \$HOME/.notmuch-config.  (These are the same
    defaults as the notmuch command itself.)

\-F
:   Check for modified files.  Without this option, muchsync assumes
    that files in a maildir are never edited.  -F disables certain
    optimizations so as to make muchsync at least check the timestamp
    on every file, which will detect modified files at the cost of a
    longer startup time.  If muchsync dies with the error "message
    received does not match hash," you likely need to run it with the
    -F option.

    Note that if your software regularly modifies the contents of mail
    files (e.g., because you are running offlineimap with "synclabels
    = yes"), then you will need to use -F each time you run muchsync.
    Specify it as a server option (after the server name) if the
    editing happens server-side.

\-r /path/to/muchsync
:   Specifies the path to muchsync on the server.  Ordinarily, muchsync
    should be in the default PATH on the server so this option is not
    required.  However, this option is useful if you have to install
    muchsync in a non-standard place or wish to test development
    versions of the code.

\-s ssh-cmd
:   Specifies a command line to pass to /bin/sh to execute a command on
    another machine.  The default value is "ssh -CTaxq".  Note that
    because this string is passed to the shell, special characters
    including spaces may need to be escaped.

\-v
:   The -v option increases verbosity.  The more times it is specified,
    the more verbose muchsync will become.

\--help
:   Print a brief summary of muchsync's command-line options.

\--init _maildir_
:   This option clones an existing mailbox on a remote server into
    _maildir_ on the local machine.  Neither _maildir_ nor your
    notmuch configuration file (see ```--config``` above) should exist
    when you run this command, as both will be created.  The
    configuration file is copied from the server (adjusted to reflect
    the local maildir), while _maildir_ is created as a replica of the
    maildir you have on the server.

\--nonew
:   Ordinarily, muchsync begins by running "notmuch new".  This option
    says not to run "notmuch new" before starting the muchsync
    operation.  It can be passed as either a client or a server
    option.  For example:  The command "```muchsync myserver
    --nonew```" will run "```notmuch new```" locally but not on
    myserver.

\--noup, \--noupload
:   Transfer files from the server to the client, but not vice versa.

\--upbg
:   Transfer files from the server to the client in the foreground.
    Then fork into the background to upload any new files from the
    client to the server.  This option is useful when checking new
    mail, if you want to begin reading your mail as soon as it has
    been downloaded while the upload continues.

\--self
:   Print the 64-bit replica ID of the local maildir replica and exit.
    Potentially useful in higher-level scripts, such as the emacs
    notmuch-poll-script variable for identifying on which replica one
    is running, particularly if network file systems allow a replica
    to be accessed from multiple machines.

\--newid
:   Muchsync requires every replica to have a unique 64-bit identifier.
    If you ever copy a notmuch database to another machine, including
    the muchsync state, bad things will happen if both copies use
    muchsync, as they will both have the same identifier.  Hence,
    after making such copy and before running muchsync to synchronize
    mail, run `muchsync --newid` to change the identifier of one of
    the copies.

\--version
:   Report on the muchsync version number

# EXAMPLES

To initialize a the muchsync database, you can run:

    muchsync -vv

This first executes "`notmuch new`", then builds the initial muchsync
database from the contents of your maildir (the directory specified as
`database.path` in your notmuch configuration file).  This command may
take several minutes the first time it is run, as it must compute a
content hash of every message in the database.  Note that you do not
need to run this command, as muchsync will initialize the database the
first time a client tries to synchronize anyway.

    muchsync --init ~/maildir myserver

First run "notmuch new" on myserver, then create a directory
`~/maildir` containing a replica of your mailbox on myserver.  Note
that neither your configuration file (by default `~/.notmuch-config`)
nor `~/maildir` should exist before running this command, as both will
be created.

To create a `notmuch-poll` script that fetches mail from a remote
server `myserver`, but on that server just runs `notmuch new`, do the
following:  First, run `muchsync --self` on the server to get the
replica ID.  Then take the ID returned (e.g., `1968464194667562615`)
and embed it in a shell script as follows:

    #!/bin/sh
    self=$($HOME/muchsync --self) || exit 1
    if [ "$self" = 1968464194667562615 ]; then
        exec notmuch new
    else
        exec $HOME/muchsync -r ./muchsync --upbg myserver
    fi

The path of such a script is a good candidate for the emacs
`notmuch-poll-script` variable.

Alternatively, to have the command ``notmuch new`` on a client
automatically fetch new mail from server `myserver`, you can place the
following in the file ``.notmuch/hooks/post-new`` under your mail
directory:

    #!/bin/sh
    muchsync --nonew --upbg myserver

# FILES

The default notmuch configuration file is `$HOME/.notmuch-config`.

muchsync keeps all of its state in a subdirectory of your top maildir
called ```.notmuch/muchsync```.

# SEE ALSO

notmuch(1).

# BUGS

muchsync expects initially to create replicas from scratch.  If you
have created a replica using another tool such as offlineimap and you
try to use muchsync to synchronize them, muchsync will assume every
file has an update conflict.  This is okay if the two replicas are
identical; if they are not, it will result in artifacts such as files
deleted in only one replica reappearing.  Ideally notmuch needs an
option like `--clobber` that makes a local replica identical to the
remote one without touching the remote one, so that an old version of
a mail directory can be used as a disposable cache to bootstrap
initialization.

muchsync never deletes directories.  If you want to remove a
subdirectory completely, you must manually execute rmdir on all
replicas.  Even if you manually delete a subdirectory, it will live on
in the notmuch database.

To synchronize deletions and re-creations properly, muchsync never
deletes content hashes and their message IDs from its database, even
after the last copy of a message has disappeared.  Such stale hashes
should not consume an inordinate amount of disk space, but could
conceivably pose a privacy risk if users believe deleting a message
removes all traces of it.

Message tags are synchronized based on notmuch's message-ID (usually
the Message-ID header of a message), rather than based on message
contents.  This is slightly strange because very different messages
can have the same Message-ID header, meaning the user will likely only
read one of many messages bearing the same Message-ID header.  It is
conceivable that an attacker could suppress a message from a mailing
list by sending another message with the same Message-ID.  This bug is
in the design of notmuch, and hence not something that muchsync can
work around.  muchsync itself does not assume Message-ID equivalence,
relying instead on content hashes to synchronize link counts.  Hence,
any tools used to work around the problem should work on all replicas.

Because notmuch and Xapian do not keep any kind of modification time
on database entries, every invocation of muchsync requires a complete
scan of all tags in the Xapian database to detect any changed tags.
Fortunately muchsync heavily optimizes the scan so that it should take
well under a second for 100,000 mail messages.  However, this means
that interfaces such as those used by notmuch-dump are not efficient
enough (see the next paragraph).

muchsync makes certain assumptions about the structure of notmuch's
private types `notmuch_message_t` and `notmuch_directory_t`.  In
particular, it assumes that the Xapian document ID is the second field
of these data structures.  Sadly, there is no efficient and clean way
to extract this information from the notmuch library interface.
muchsync also makes other assumptions about how tokens are named in
the Xapian database.  These assumptions are necessary because the
notmuch library interface and the notmuch dump utility are too slow to
support synchronization every time you check mail.