File: DESIGN

package info (click to toggle)
syncmaildir 1.2.6.2-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 1,188 kB
  • ctags: 479
  • sloc: ansic: 6,321; sh: 1,969; makefile: 258; xml: 14
file content (296 lines) | stat: -rw-r--r-- 12,453 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
DESIGN
======

What follows is my best effort in giving the big-ascii-picture
of what happens when `smd-pull` is run. `smd-push` simply 
swaps `smd-server` and `smd-client`. Note that the sync direction is 
from `smd-server` to `smd-client`, so running them on the opposite hosts
inverts the sync direction.

    Your mail server           Your laptop
    ----------------           -----------
    
    
          --- sync direction ---> smd-pull
                                    |
                                    |
    smd-server ------- ssh ----- smd-client
        |                           |
        |                           |
      mddiff                     (mddiff)

`smd-client` uses `mddiff` only compute sha1 sums, not to compute
a diff as `smd-server` does.

Both endpoints hold a file (the `db-file` described below) that describes the
status of the mailbox on which they previously agreed. The server will compute
the difference between the current mailbox status and the previous one on which
the client agreed. This diff is sent to the client, that tries to apply it,
possibly requesting some data to the server.  If the client succeeds, both the
server and the client now agree on the new mailbox status.

END USER tools 
==============

smd-pull and smd-push
---------------------

The idea is quite simple. If `===` is a double pipe (a pair of pipes, one
for `stdin` and one for `stdout`), `smd-pull` simply performs the following

    smd-client $CLIENTNAME $MAILBOX === tee log === \
      ssh $SERVERNAME smd-server $CLIENTNAME $MAILBOX

The `tee` command is used only for logging, and if $DEBUG is `false` it is
replaced by `cat`. Viceversa `smd-push` performs what follows

    smd-server $CLIENTNAME $MAILBOX === tee log === \
      ssh $SERVERNAME smd-client $CLIENTNAME $MAILBOX

They are both implemented in `bash`, since their main activity is to
redirect standard file descriptors and call other tools, check their exit
status and eventually notify the user with an extract of their logs.

smd-loop
--------

The idea is to mimic cron, but retry a failed sync attempt if the 
given error is transient. `smd-client` and `smd-server` output TAGS
that specify if the occurred error needs human intervention or not, and
also suggest some actions, like retry. `smd-loop` understands these tags,
and gives a second chance to a command that fails with an error that does not
require human intervention and for which the suggested action is retry.

It is implemented in `bash`, since it is mostly a while true loop. Arrays
(non POSIX shell compliant) are used to record failures, and give only a
second chance to every `smd-push` or `smd-pull` command.

smd-applet
----------

To write an hopefully eye-candy applet for GNOME, the language Vala was an
intriguing choice, since it is based on smart and sound ideas (that is
to avoid the C++ non-standardized calling conventions) to provide a modern
object oriented programming language built around gobject and glib. Bindings
for GTK+, GConf, libnotify, etc... are available, and require no compiled 
glue code, just bare text `.vapi` files. 

If you are used to languages where writing bindings is not a trivial task,
you'd better look at Vala, where bindings are simple by design.

SERVER/CLIENT interaction
=========================

A server software (`smd-server`) and a client software (`smd-client`) are
respectively used to transmit the diff generated by `mddiff` and eventually
mails header or body, and to apply a diff eventually requesting necessary
data to the other endpoint.

Since they mostly implement policies, like deciding if a diff can be
applied or not, are implemented in an high level scripting language called
[Lua](http://www.lua.org).  The language choice is almost arbitrary, there
are no strong reasons for adopting Lua instead of python or others, but its
installation is pretty small and it executes quite fast. Moreover, its
syntax is particularly simple, making it understandable to non Lua experts
too. Finally, I find it elegant.

They send and receive data on their standard input and output channels,
delegating to external tools the transmission of data across a network, and
optimizations like compressing the data, or encrypting it.
[OpenSSH](http://www.openssh.com/) can do both, and is adopted by
`smd-pull` and `smd-push` to connect `smd-client` to `smd-server`.

A simple protocol defines how `smd-client` requests data to `smd-server`
and how `smd-client` notifies `smd-server` that all changes have been
applied correctly.

The protocol
------------

The protocol is line oriented for commands, chunk oriented for data
transmission.

1. Both client and server send the following two messages, and check that
   they are equal to the ones sent by the other endpoint

        protocol NUMBER
        dbfile SHA1

   This part of the protocol is called handshake

2. The server sends the output of `mddiff` (that is line oriented)
   and then the following message to conclude the first phase of the protocol,
   now the client is  expected to reply

        END

3. The client, from now on, can at any time send the following (alternative)
   messages

        ABORT
        COMMIT

   The former informs the server that the client was unable to apply the
   diff generated by `mddiff`, while the latter informs the server that all
   changes were applied successfully.

4. In response to a `COMMIT` message, se server will transmit an `xdelta`
   patch the client has to apply to its db-file.

5. The client replies with `DONE` to complete the synchronization

6. After point 2. and before point 3. the client can send the following
   commands to the server, that can reply transmitting data or with
   `ABORT`. NAME is not URL encoded.

        GET NAME
        GETHEADER NAME
        GETBODY NAME

### Transmission

The server can transmit data or refuse. In the latter case it just sends
`ABORT`. In the former case it sends
 
    chunk NUMBER
    ...DATA...

First it declares with `chunk` the number of bytes to be sent, then 
its sends the data.

MAILDIR DIFF
============

Maildir diff (`mddiff`) computes the delta from an old status of a maildir
(previously recorded in the db-file) and the current status, generating a
set of commands (a diff) that a third party software can apply to
synchronize a (remote) copy of the maildir.

How it works
------------

This software uses sha1 to compute snapshots of a maildir, and computes a
set of actions a client should perform to sync with the mailbox status.
This software alone is unable to synchronize two maildirs. It has to be
supported but an higher level tool implementing the application of actions
and data transfer over the network if the twin maildir is remote.

To cache the expensive sha1 calculation, a cache file is used.  Every run
the program generates a new status file (appending .new) that must
substitute the old one if generated actions are committed to the other
maildir. Cache files are specific to the twin maildir, if you have more
than one, you must use a different cache file for each of them.

The db-file (say db.txt) is paired with a timestamp (db.txt.mtime) that
is used to store the timestamp of the last run and files whose mtime
does not exceed this timestamp will not be (re)processed next time
mddiff is run.

The .mtime companion file is updated only server side, since the mtime
concept is local to the host running mddiff.

The db-file format
------------------

The db-file is composed by two files, a real database file (extension .txt)
and a timestamp (extension .txt.mtime). The latter contains just a number
(date +%s). The former is line oriented, every line has 3, space separated,
fields:
- the sha1 sum of the header
- the sha1 sum of the body
- the name of the file, not URL encoded

The commands
------------

From now on, name refers to a file name, hsha to the sha1 sum of its header
and bsha to the sha1 sum of its body.

- `ADD name hsha bsha` is generated whenever a new mail message is found,
  and there is no mail message with a different name but the same body.
- `COPY name hsha bsha TO newname` is generated if a new message is found,
  and the mailbox contains a copy of it.
- `MOVE name hsha bsha TO newname` is generated if a new message is found,
  and the mailbox does not contain a copy of it but it used do.
- `COPYBODY name bsha TO newname newhsha` is generated when a new file is 
  created, and that file has the same body of an already existent file. 
  In case mail has been moved, this message is followed by a `DELETE` command.
  This happens when a new message is moved to another directory and marked
  in some way changing its header (for example when a new message is 
  moved to the trash bin)
- `DELETE name hsha bsha` is emitted when a message is no longer present.
- `REPLACEHEADER name hsha bsha WITH newhsha` is emitted whenever 
  a message that was already present has a different header but the same body.
- `REPLACE name hsha bsha WITH newhsha newbsha` is emitted whenever the body
  (and eventually the header) of mailmassage change. This never happens
  in practice, since MUAs should do a copy of the edited message, not replace 
  it.
- `ERROR message` is emitted whenever an error is encountered; message is
  intended to be human readable.

Messages should be processed in order, with the exception of `ADD` that can be 
safely postponed. In particular `DELETE` messages are always sent last, and
`COPY` or `COPYBODY` messages preceeding them may refer to the same file 
`name`. Performing deletions in advance is still sound (since the client
can always ask the servevr for the message) but clearly suboptimal, since
a local copy does not involve any network traffic.

File names are URL encoded escaping only `' '` (`%20`) and `'%'` (`%25`).

`mddiff` as an hashing server
-----------------------------

`mddiff` is also used by the client to compute the sha1 sums of header
and body of local mails, for example to check that the source of a copy
command holds the intended content. Since this operation may be really
frequent, `mddiff` can operate in server mode. If the argument is a single
file name and that file is a fifo, then `mddiff` reads file names not URL
encoded, separated by `\n` from that fifo and outputs the sha1 sums of
their header and body.

`mddiff` as an `mkdir -p; ln` server
------------------------------------

`mddiff` is also used by the client to create the indirection layer
needed to ranme mailbox folders. If the argument is a single
file name and that file is a fifo and the `-s` flag is passed, then `mddiff`
reads directories names not URL encoded, separated by `\n`, 2 at a time, 
from that fifo. The first one is the source path, the latter the target.
Then it behaves like `mkdir -p $(dirname $target); ln -s $source $target`.
For example if source is `~/Mail/foo/cur` and the target is `Maildir/.foo/cur`
then `mddiff` will create the direcotries `Maildir` and `Maildir/.foo`
and place in the latter a link named `cur` to `~/Mail/foo/cur`.

Easy to parse output messages
=============================

`smd-pull` and `smd-push` prefix all error messages with `ERROR:`, but
what follows is meant to be read by a human being. To make other tools able to
parse and react to error messages, a more formal output is given.
A single line, prefixed with `TAGS:` is output if requested (`-v` option). 
It can be followed by `error::` or `stats::`, that denote an error message or a
statistical one respectively. Then a list of improperly called tags is output.
Their meaning should be easy to guess.

    <M>    ::= "error::" <ET> | "stats::" <ST> | "stats::" <DR>
    <ET>   ::= "context(" <STR> ")" 
               "probable-cause(" <STR> ")"
               "human-intervention(" <HI> ")"
               <SA>
    <SA>   ::= | "suggested-actions(" <ACTS> ")"
    <STR>  ::= `[^)]+`
    <HI>   ::= "necessary" | "avoidable"
    <ACT>  ::= <A> | <A> <ACTS>
    <A>    ::= "run(" <STR> ")" 
            |  "display-mail(" <STR> ")" 
            |  "display-permissions(" <STR> ")"
    <ST>   ::= "new-mails(" <NUM> ")" <SPC>
               "del-mails(" <NUM> ")" <SPC>
               "bytes-received(" <NUM> ")" <SPC>
               "xdelta-received(" <NUM> ")" <SPC>
               "xdelta-received(" <NUM> ")"
    <DR>   ::= "mail-transferred(" <ML> ")"
    <ML>   ::= <STR> | <STR> " , " <ML>
    <NUM>  ::= `[0-9]+`
    <SPC>  ::= ` *,? *`