File: mairix.texi

package info (click to toggle)
mairix 0.23%2Bgit20131125-0.3
  • links: PTS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 1,296 kB
  • ctags: 1,299
  • sloc: ansic: 11,345; sh: 1,000; yacc: 185; makefile: 135; lex: 87; perl: 34
file content (885 lines) | stat: -rwxr-xr-x 36,801 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
\input texinfo
@c {{{ Main header stuff
@afourwide
@paragraphindent 0
@setfilename mairix.info
@settitle User guide for the mairix program
@c @setchapternewpage off

@ifinfo
@dircategory Utilities
@direntry
* mairix: (mairix).			Indexing/searching utility for maildir folders
@end direntry
@end ifinfo

@titlepage
@sp 10
@title The mairix program
@subtitle This manual describes how to use
@subtitle the mairix program for indexing and
@subtitle searching email messages stored in maildir folders.
@author Richard P. Curnow
@page
@vskip 0pt plus 1filll
Copyright @copyright{} 2002,2003,2004,2005 Richard P. Curnow
@end titlepage

@contents
@c }}}

@ifnottex
@node Top
@top
@menu
* Introduction::    
* Installation::    Compiling and installing the software
* Use::             Quickstart guide and examples of use
@end menu
@end ifnottex

@node Introduction
@chapter Introduction
@menu
* Background::      How mairix came to be written.
@end menu

@node Background
@section Background
The @emph{mairix} program arose from a need to index and search 100's or 1000's
of email messages in an efficient way.  It began life supporting just Maildir
format folder, but now MH and mbox formats are also supported.

I use the @emph{mutt} email client.  @emph{mutt} has a feature called
@emph{limit}, where the display of messages in the current folder can be
filtered based on matching regular expressions in particular parts of the
messages.  I find this really useful.  But there is a snag - it only works on
the current folder.  If you have messages spread across many folders, you're
out of luck with limit.  OK - so why not keep all messages in a single folder?
The problem is that the performance drops badly.  This is true regardless of
folder format - mbox, maildir etc, though probably worse for some formats than
others depending on the sizes of messages in the folders.

So on the one hand, we want small folders to keep the performance high.  But on
the other hand, we want useful searching.

I use the maildir format for my incoming folders.  This scheme has one file per
message.  On my inboxes@footnote{of which I have many, because I (naturally)
use @emph{procmail} to split my incoming mail}, I like this for 2 reasons :

@itemize @bullet
@item Fast deletion of messages I don't want to keep (spam, circulars, mailing
list threads I'm not interested in etc).  (Compare mbox, where the whole file
would need to be rewritten.)
@item No locking issues whatever.  Maybe I'm over cautious, but I don't really
trust all that locking stuff to protect a single mbox file in all cases, and a
single file seems just too vulnerable to corruption.)  Also, I sometimes read
the mail over NFS mounted filesystems, where locking tends to be a real
disaster area.
@end itemize

Since I'm using maildir for inboxes, I've traditionally used it for all my
folders, for uniformity.

So, I hear you ask, if you use a one-file-per-message format, why not just use
find + egrep to search for messages?  I saw the following problems with this:

@itemize @bullet
@item What if I want to find all messages to/cc me, from Homer Simpson, dated
between 1 and 2 months ago, with the word "wubble" in the body?  This would
involve a pretty nasty set of regexps in a pipeline of separate egreps (and
bear in mind, headers could be split over line boundaries...)
@item What if the message body has quoted-printable (or worse, base64) transfer
encoding?  The egrep for "wubble" could come very unstuck.
@item How would the matching messages be conveniently arranged into a new
folder to allow browsing with mutt?
@item What if I wanted to see all messages in the same threads as those
matching the above condition?
@item If I had 1000's of messages, this wasn't going to be quick, especially if
I wanted to keep tuning the search condition.@footnote{This may be a non-issue
for people with the lastest technology under their desk, but at the time I
started writing mairix, I had a 1996 model 486 at home}.
@end itemize

So find + egrep was a non-starter.  I looked around for other technology.  I
found @emph{grepmail}, but this only works for mbox format folders, and
involved scanning each message every time (so lost on the speed issue).

I decided that this was going to be my next project, and mairix was born.  By
the way, the name originally came from abbreviating @emph{MAildIR IndeX}, but
this is now an anachronism since MH and mbox are supported too.

@node Installation
@chapter Installation

There is not much to this.  In the simplest case you can just do

@example
./configure
make
make install
@end example

You need to be root to run the final step unless you're installing under your
own home directory somewhere.

However, you might want to tune the options further.  The @file{configure}
script shares its common options with the usual autoconf-generated scripts,
even though it's not autoconf-generated itself.  For example, a fuller build could use

@example
CC=gcc CFLAGS="-O2 -Wall" ./configure \
    --prefix=/opt/mairix \
    --infodir=/usr/share/info
make
make install
make docs
make install_docs
@end example

The final step is to create a @file{~/.mairixrc} file.  An example is included
in the file @file{dotmairixrc.eg}.  Just copy that to @file{~/.mairixrc} and edit
it.

@node Use
@chapter Use

@menu
* use_intro::       Overview of use
* capabilities::    Indexing strategy and search capabilities
* mairixrc::        The @file{~/.mairixrc} file 
* mfolder_setup::   Setting up the match folder
* command_line::    Command line options
* date_syntax::     Syntax used for date searches
@end menu

@node use_intro
@section Overview of use

@emph{mairix} has two modes of use : index building and searching.  The
searching mode runs whenever the command line contains any expressions to
search for.  Otherwise, the indexing mode is run.

To begin with, an indexing run must be performed before searching will work at
all.  Otherwise your search will be operating on an empty database and won't
produce any output.

The output of the search mode is usually placed in a @emph{match folder}. You
can select the type of folder that is used.  For Maildir, it is just a normal
maildir directory (i.e. containing @file{new}, @file{tmp} and @file{cur})
subdirectories.  If you select MH it is a directory containing entries with
numerical filenames, so you can open it as a normal MH folder in your mail
program.  If you select mbox, it is a single file in mbox format.

You configure the path for the match folder in your @file{~/.mairixrc} file.
When writing to a mfolder in maildir or MH format, mairix will populate it with
symbolic links pointing to the paths of the real messages that were matched by
the search expression.@footnote{Although symlinks use up more inodes than hard
links, I decided they were more useful because it makes it possible to see the
filenames of the original messages via @command{ls -l}.}  If a message in a
mbox folder matches, mairix will copy the message contents to a single file in
the mfolder directory.

If the mfolder is in mbox format, mairix will copy the message contents of each
matching message into the mfolder file.  (There is no way of exploiting
symlinks to avoid the copying in this case.)

If desired, mairix can produce just a list of files that match the search
expression and omit the building of the match folder (the so-called 'raw'
output mode).  This mode of operation may be useful in communicating the
results of the search to other programs.

@node capabilities
@section Indexing strategy and search capabilities

@emph{mairix} works exclusively in terms of @emph{words}.  The index that's
built in non-search mode contains a table of which words occur in which
messages.  Hence, the search capability is based on finding messages that
contain particular words.  @emph{mairix} defines a word as any string of
alphanumeric characters + underscore.  Any whitespace, punctuation, hyphens etc
are treated as word boundaries.

@emph{mairix} has special handling for the @t{To:}, @t{Cc:} and @t{From:}
headers.  Besides the normal word scan, these headers are scanned a second
time, where the characters @samp{@@}, @samp{-} and @samp{.} are also treated as
word characters.  This allows most (if not all) email addresses to appear in
the database as single words.  So if you have a mail from
@t{wibble@@foobar.zzz}, it will match on both these searches

@example
mairix f:foobar
mairix f:wibble@@foobar.zzz
@end example

It should be clear by now that the searching cannot be used to find messages
matching general regular expressions.  Personally, I don't find that much use
anyway for locating old messages - I'm far more likely to remember particular
keywords that were in the messages, or details of the recipients, or the
approximate date.

It's also worth pointing out that there is no 'locality' information stored, so
you can't search for messages that have one words 'close' to some other word.
For every message and every word, there is a simple yes/no condition stored -
whether the message contains the word in a particular header or in the body.
So far this has proved to be adequate.  mairix has a similar feel to using an
Internet search engine.

There are three further searching criteria that are supported (besides word
searching):

@itemize @bullet
@item Searching for messages whose @t{Date:} header is in a particular range
@item Searching for messages whose size is in a particular range.  (I see this
being used mainly for finding 'huge' messages, as you're most likely to want to
cull these to recover disc space.)
@item Searching for messages with a particular substring in their paths.  You
can use this feature to limit the search to particular folders in your mail
hierarchy, for example.
@end itemize

@node mairixrc
@section The @file{~/.mairixrc} file

@subsection Overview

This file contains information about where you keep your mail folders, where
you want the index file to be stored and where you want the match folder to
be, into which the search mode places the symlinks.

mairix searches for this file at @file{~/.mairixrc} unless you specify the
@samp{-f} command line option.

If a # character appears in the file, the rest of that line is ignored.  This
allows you to specify comments.

There are 3 entries (@samp{base}, @samp{mfolder} and @samp{database}) that must
appear in the file.  Also at least one of @samp{maildir}, @samp{mh} and
@samp{mbox} must appear.  Optionally, the @samp{mformat} entry may
appear.  An example illustrates:

@example
base=/home/richard/mail
maildir=new-mail:new-chrony
maildir=recent...:ancient...
mh=an_mh_folder
mbox=archive1:archive2
mfolder=mfolder
mformat=maildir
database=/home/richard/.mairix_database
@end example

@subsection mairixrc file keys
The keys are as follows:

@table @asis
@item base
This is the path to the common parent directory of all your maildir folders.
@item maildir
This is a colon-separated list of the Maildir folders (relative to @samp{base})
that you want indexed.  Any entry that ends @samp{...} is recursively scanned
to find any Maildir folders underneath it.

More than one line starting with @samp{maildir} can be included.  In this case,
mairix joins the lines together with colons as though a single list of folders had
been given on a single very long line.

Each colon-separated entry may be a wildcard.  See the discussion under mbox (below) for the
wildcard syntax.  For example

@example
maildir=zzz/foo*...
@end example

will match maildir folders like these (relative to the folder_base)

@example
zzz/foobar/xyz
zzz/fooquux
zzz/foo
zzz/fooabc/u/v/w
@end example

and 

@example
maildir=zzz/foo[abc]*
@end example

will match maildir folders like these (relative to the folder_base)

@example
zzz/fooa
zzz/fooaaaxyz
zzz/foobcd
zzz/fooccccccc
@end example

If a folder name contains a colon, you can write this by using the sequence
@samp{\:} to escape the colon.  Otherwise, the backslash character is treated
normally.  (If the folder name actually contains the sequence @samp{\:}, you're
out of luck.)

@item mh
This is a colon-separated list of the MH folders (relative to @samp{base}) that
you want indexed.  Any entry that ends @samp{...} is recursively scanned to
find any MH folders underneath it.

More than one line starting with @samp{mh} can be included.  In this case,
mairix joins the lines together with colons as though a single list of folders had
been given on a single very long line.

Each colon-separated entry may be a wildcard, see the discussion under maildir
(above) and mbox (below) for the syntax and semantics of specifying wildcards.

@item mbox
This is a colon-separated list of the mbox folders (relative to @samp{base}) that
you want indexed.

Each colon-separated item in the list can be suffixed by @samp{...}.  If the
item matches a regular file, that file is treated as a mbox folder and the
@samp{...} suffix is ignored.  If the item matches a directory, a recursive
scan of everything inside that directory is made, and all regular files are
initially considered as mbox folders.  (Any directories found in this scan are
themselves scanned, since the scan is recursive.)

Each colon-separated item may contain wildcard operators, but only in its final
path component.  The wildcard operators currently supported are

@table @asis
@item *
Match zero or more characters (each character matched is arbitrary)
@item ?
Match exactly one arbitrary character
@item [abcs-z]
Character class : match a single character from the set a, b, c, s, t, u, v, w,
x, y and z.

To include a literal @samp{]} in the class, place it immediately after the opening @samp{[}.
To include a literal @samp{-} in the class, place it immediately before the closing @samp{]}.

@end table

If these metacharacters are included in non-final path components, they have no
special meaning.

Here are some examples

@table @asis
@item mbox=foo/bar*
matches @file{foo/bar}, @file{foo/bar1}, @file{foo/barrrr} etc
@item mbox=foo*/bar*
matches @file{foo*/bar}, @file{foo*/bar1}, @file{foo*/barrrr} etc
@item mbox=foo/*
matches @file{foo/bar}, @file{foo/bar1}, @file{foo/barrrr}, @file{foo/foo}, @file{foo/x} etc
@item mbox=foo...
matches any regular file in the tree rooted at @file{foo}
@item mbox=foo/*...
same as before
@item mbox=foo/[a-z]*...
matches @file{foo/a}, @file{foo/aardvark/xxx}, @file{foo/zzz/foobar},
@file{foo/w/x/y/zzz}, but @b{not} @file{foo/A/foobar}
@end table

Regular files that are mbox folder candidates are examined internally.  Only
files containing standard mbox @samp{From } separator lines will be scanned for
messages.

If a regular file has a name ending in @file{.gz}, and gzip support is compiled
into the mairix binary, the file will be treated as a gzipped mbox.  

If a regular file has a name ending in @file{.bz2}, and bzip support is compiled
into the mairix binary, the file will be treated as a bzip2'd mbox.  

More than one line starting with @samp{mbox} can be included.  In this case,
mairix joins the lines together with colons as though a single list of folders had
been given on a single very long line.

mairix performs @b{no} locking of mbox folders when it is accessing them.  If a
mail delivery program is modifying the mbox at the same time, it is likely that
one or messages in the mbox will never get indexed by mairix (until the
database is removed and recreated from scratch, anyway.)  The assumption is
that mairix will be used to index archive folders rather than incoming ones, so
this is unlikely to be much of a problem in reality.

@emph{mairix} can support a maximum of 65536 separate mboxes, and a maximum of
65536 messages within any one mbox.

@item omit 
This is a colon-separated list of glob patterns for folders to be
omitted from the indexing.  This allows wide wildcards to be used in the
@emph{maildir}, @emph{mh} and @emph{mbox} arguments, with the @emph{omit}
option used to selectively remove unwanted folders from the folder lists.
Within the glob patterns, a single @samp{*} matches any sequence of characters
other than @samp{/}.  However @samp{**} matches any sequence of characters
including @samp{/}.  This allows glob patterns to be constructed which have a
wildcard for just one directory component, or for any number of directory
components.

The @emph{omit} option can be specified as many times as required so that the
list of patterns doesn't all have to fit on one line.

As an example,

@example
mbox=bulk...
omit=bulk/spam*
@end example

will index all mbox folders at any level under the @file{bulk} subdirectory of
the base folder, except for those folders whose names start @file{bulk/spam},
e.g. @file{bulk/spam}, @file{bulk/spam2005} etc.  In constrast, 

@example
mbox=bulk...
omit=bulk/spam**
@end example

will index all mbox folders at any level under the @file{bulk} subdirectory of
the base folder, except for those folders whose names start @file{bulk/spam},
e.g. @file{bulk/spam}, @file{bulk/spam2005}, @file{bulk/spam/2005},
@file{bulk/spam/2005/jan} etc.

@item nochecks
This takes no arguments.  If a line starting with @samp{nochecks} is present,
it is the equivalent of specifying the @samp{-Q} flag to every indexing run.

@item mfolder
This defines the name of the @emph{match} folder (within the directory
specified by @samp{base}) into which the search mode writes its output.
(If the mformat used is @samp{raw}, then this setting is not
used and may be excluded.)

If the first character of the @b{mfolder} value is @samp{/} or @samp{.}, it is
taken as a pathname in its own right.  This allows you to specify absolute
paths and paths relative to the current directory where the mfolder should be
written.  Otherwise, the value of @b{mfolder} is appended to the value of
@b{base}, in the same way as for the source folders.

@item mformat
This defines the type of folder used for the @emph{match folder} where the
search results go.  There are four valid settings for this @samp{mh},
@samp{maildir}, @samp{mbox} or @samp{raw}.  If the @samp{raw} setting is used then
mairix will just print out the path names of the files that match and
no match folder will be created.  @samp{maildir} is the default if this
option is not defined.  The setting is case-insensitive.

@item database
This defines the path where mairix's index database is kept.  You can keep this
file anywhere you like.
@end table

It is illegal to have a folder listed twice.  Once mairix has built a list of
all the messages currently in your folders, it will search for duplicates
before proceeding.  If any duplicates are found (arising from the same folder
being specified twice), it will give an error message and exit.  This is to
prevent corrupting the index database file.

@subsection mairixrc expansions

The part of each line in @file{.mairixrc} following the equals sign can contain
the following types of expansion:

@table @asis
@item Home directory expansion
If the sequence @samp{~/} appears at the start of the text after the equals
sign, it is expanded to the user's home directory.  Example:

@example
database=~/Mail/mairix_database
@end example

@item Environment expansion
If a @samp{$} is followed by a sequence of alpha-numeric characters (or
@samp{_}), the whole string is replaced by looking up the corresponding
environment variable.  Similarly, if @samp{$} is followed by an open brace
(@samp{@{}), everything up to the next close brace is looked up as an
environment variable and the result replaces the entire sequence.

Suppose in the shell we do
@example
export FOO=bar
@end example

and the @file{.mairixrc} file contains
@example
maildir=xxx/$FOO
mbox=yyy/a$@{FOO@}b
@end example

this is equivalent to
@example
maildir=xxx/bar
mbox=yyy/abarb
@end example

If the specified environment variable is not set, the replacement is the empty
string.

@end table

@node mfolder_setup
@section Setting up the match folder
If the match folder does not exist when running in search mode, it is
automatically created.  For @samp{mformat=maildir} (the default), this
should be all you need to do.  If you use @samp{mformat=mh}, you may
have to run some commands before your mailer will recognize the folder.  e.g.
for mutt, you could do

@example
mkdir -p /home/richard/Mail/mfolder
touch /home/richard/Mail/mfolder/.mh_sequences
@end example

which seems to work.  Alternatively, within mutt, you could set @var{mbox_type}
to @samp{mh} and save a message to @samp{+mfolder} to have mutt set up the
structure for you in advance.

If you use Sylpheed, the best way seems to be to create the new folder from
within Sylpheed before letting mairix write into it.  This seems to be all you
need to do.

@node command_line
@section Command line options

The command line syntax is

For indexing mode:
@example
mairix [-f path] [-p] [-v] [-Q]
@end example
For search mode
@example
mairix [-f path] [-t] [-v] [-a] [-r] [-o mfolder] expr1 [expr2] ... [exprn]
@end example
For database dump mode
@example
mairix [-f path] -d
@end example

The @samp{-f} or @samp{--rcfile} flag allows a different path to the
@file{mairixrc} file to be given, replacing the default of @file{~/.mairixrc}.

The @samp{-p} or @samp{--purge} flag is used in indexing mode.  Indexing works
incrementally.  When new messages are found, they are scanned and information
about the words they contain is appended onto the existing information.  When
messages are deleted, holes are normally left in the message sequence.  These
holes take up space in the database file.  This flag will compress the deleted
paths out of the database to save space.  Additionally, where @samp{mbox}
folders are in use, information in the database about folders that no longer
exist, or which are no longer referenced in the rc-file, will be purged also.

The @samp{-v} or @samp{--verbose} flag is used in indexing mode.  It causes
more information to be shown during the indexing process.  In search mode, it
causes debug information to be shown if there are problems creating the
symlinks.  (Normally this would be an annoyance.  If a message matches multiple
queries when using @samp{-a}, mairix will try to create the same symlink
multiple times.  This prevents the same message being shown multiple times in
the match folder.)

The @samp{-Q} or @samp{--no-integrity-checks} flag is used in indexing mode.
Normally, mairix will do various integrity checks on the database after loading
it in, and before writing the modified database out again.  The checking helps
to detect mairix bugs much earlier, but it has a performance penalty.  This
flag skips the checks, at the cost of some loss in robustness.  See also the
@samp{nochecks} directive in @ref{mairixrc}.

The @samp{--unlock} flag is used in any mode.  mairix dot-locks the database
file to prevent corruption due to concurrent accesses.  If the process holding
the lock exits prematurely for any reason, the lockfile will be left behind.
By using the @samp{--unlock} option, an unwanted lockfile can be conveniently
removed.

The @samp{-t} or @samp{--threads} option applies to search mode.  Normally,
only the messages matching all the specified expressions are included in the
@emph{match folder} that is built.  With the @samp{-t} flag, any message in
the same thread as one of the matched messages will be included too.  Note, the
threading is based on processing the @t{Message-ID}, @t{In-Reply-To} and
@t{References} headers in the messages.  Some mailers don't generate these
headers in a co-operative way and will cause problems with this threading
support.  (Outlook seems to be one culprit.)  If you are plagued by this
problem, the 'edit threads' patch to mutt may be useful to you.

The @samp{-d} or @samp{--dump} option causes mairix to dump the database
contents in human-readable form to stdout.  It is mainly for use in debugging.
If this option is specified, neither indexing nor searching are performed.

The @samp{-a} or @samp{--augment} option also applies to search mode.
Normally, the first action of the search mode is to clear any existing message
links from the match folder.  With the @samp{-a} flag, this step is
suppressed.  It allows the folder contents to be built up by matching with 2 or
more diverse sets of match expressions.  If this mode is used, and a message
matches multiple queries, only a single symlink will be created for it.

The @samp{-r} or @samp{--raw-output} option is used to force the raw output
mode for a particular search, in preference to the output format defined by the
@samp{mformat} line in the @file{mairixrc} file.  This may be useful for
identifying which mbox contains a particular match, since there is way to see
this when the matching messages are placed in the mfolder in this case.  (Note
for matches in maildir and MH folders when @samp{mformat} is maildir or MH, the
symbolic links in the mfolder will show the path to the matching message.)

The @samp{-o} or @samp{--mfolder} option is used in search mode to specify a
match folder different to the one specified in the @file{mairixrc} to be
used.  The path given by the @samp{mfolder} argument after this flag is
relative to the folder base directory given in the @file{mairixrc} file, in the
same way as the directory in the mfolder specification in that file is.  So if
your @file{mairixrc} file contains

@example
base=/home/foobar/Mail
@end example

and you run mairix like this

@example
mairix -o mfolder2 make,money,fast
@end example

mairix will find all of your saved junk emails containing these three words and
put the results into @file{/home/foobar/Mail/mfolder2}.

The @samp{-o} argument obeys the same conventions regarding initial @samp{/}
and @samp{.} characters as the @b{mfolder} line in the @file{.mairixrc} file
does.

@emph{Mairix} will refuse to output search results (whether specified
by the @samp{-o} or in the @file{.mairixrc} file) into one of the
folders that are indexed; it figures out that list by looking in the
@file{.mairixrc} file, or in the file you specify using the @samp{-f}
option.  This sanity check prevents you inadvertantly destroying one
of your important folders (but won't catch all such cases, sadly).

The search mode runs when there is at least one search expression.  Search
expressions can take forms such as (in increasing order of complexity):

@itemize @bullet
@item A date expression.  The format for specifying the date is described in section @ref{date_syntax}.

@item A size expression.  This matches all messages whose size in bytes is in a
particular range.  For example, to match all messages bigger than 1 Megabyte
the following command can be used

@example
mairix z:1m-
@end example

To match all messages between 10kbytes and 20kbytes in size, the following
command can be used:

@example
mairix z:10k-20k
@end example

@item A word, e.g. @samp{pointer}.  This matches any message with the word
@samp{pointer} in the @t{To}, @t{Cc}, @t{From} or @t{Subject} headers, or in
the message body.@footnote{Message body is taken to mean any body part of type
text/plain or text/html.  For text/html, text within meta tags is ignored.  In
particular, the URLs inside <A HREF="..."> tags are not currently indexed.
Non-text attachments are ignored.  If there's an attachment of type
message/rfc822, this is parsed and the match is performed on this sub-message
too.  If a hit occurs, the enclosing message is treated as having a hit.}

@item A word in a particular part of the message, e.g. @samp{s:pointer}.  This
matches any message with the word @samp{pointer} in the subject.  The
qualifiers for this are :

@table @asis
@item @t{t:pointer}
to match @samp{pointer} in the @t{To:} header, 
@item @t{c:pointer}
to match @samp{pointer} in the @t{Cc:} header, 
@item @t{a:pointer}
to match @samp{pointer} in the @t{To:}, @t{Cc:} or @t{From:} headers (@samp{a} meaning @samp{address}), 
@item @t{f:pointer}
to match @samp{pointer} in the @t{From:} header, 
@item @t{s:pointer}
to match @samp{pointer} in the @t{Subject:} header, 
@item @t{b:pointer}
to match @samp{pointer} in the message body.
@item @t{m:pointer}
to match messages having a Message-ID header of @samp{pointer}. 
@end table

Multiple fields may be specified, e.g. @t{sb:pointer} to match in the
@t{Subject:} header or the body.

@item A negated word, e.g. @samp{s:~pointer}.  This matches all messages that
don't have the word @samp{pointer} in the subject line.

@item A substring match, e.g. @samp{s:point=}.  This matches all messages
containing a word in their subject line where the word has @samp{point} as a
substring, e.g. @samp{pointer}, @samp{disappoint}.

@item An approximate match, e.g. @samp{s:point=1}.  This matches all messages
containing a word in their subject line where the word has @samp{point} as a
substring with at most one error, e.g. @samp{jointed} contains @samp{joint}
which can be got from @samp{point} with one letter changed.  An error can be a
single letter changed, inserted or deleted.

@item A left-anchored substring match, e.g. @samp{s:^point=}.  This matches all
messages containing a word in their subject line where the word begins with the
string @samp{point}.  (This feature is intended to be useful for inflected
languages where the substring search is used to avoid the grammatical ending on
the word.)  This left-anchored facility can be combined with the approximate
match facility, e.g. @samp{s:^point=1}.  

Note, if the @samp{^} prefix is used without the @samp{=} suffix, it is ignored.
For example, @samp{s:^point} means the same thing as @samp{s:point}.

@item A disjunction, e.g. @samp{s:pointer/dereference}.  This matches all
messages with one or both of the words @samp{pointer} and @samp{dereference} in
their subject lines.

@item Each disjunction may be a conjunction, e.g.
@samp{s:null,pointer/dereference=2} matches all messages whose subject lines
either contain both the words @samp{null} and @samp{pointer}, or contain the
word @samp{dereference} with up to 2 errors (or both).

@item A path expression.  This matches all messages with a particular substring
in their path.  The syntax is very similar to that for words within the message
(above), and all the rules for @samp{+}, @samp{,}, approximate matching etc are
the same.  The word prefix used for a path expression is @samp{p:}.  Examples:

@example
mairix p:/archive/
@end example

matches all messages with @samp{/archive/} in their path, and

@example
mairix p:wibble=1 s:wibble=1
@end example

matches all messages with @samp{wibble} in their path and in their subject
line, allowing up to 1 error in each case (the errors may be different for a
particular message.)

Path expressions always use substring matches and never exact matches (it's
very unlikely you want to type in the whole of a message path as a search
expression!)  The matches are always @b{case-sensitive}.  (All matches on words
within messages are case-insensitive.)  There is a limit of 32 characters on
the match expression.

@end itemize

The binding order of the constructions is:

@enumerate
@item Individual command line arguments define separate conditions which are
AND-ed together

@item Within a single argument, the letters before the colon define which
message parts the expression applies to.  If there is no colon, the expression
applies to all the headers listed earlier and the  body.

@item After the colon, commas delineate separate disjuncts, which are OR-ed together.

@item Each disjunct may contain separate conjuncts, which are separated by plus
signs.  These conditions are AND-ed together.

@item Each conjunct may start with a tilde to negate it, and may be followed by
a slash to indicate a substring match, optionally followed by an integer to
define the maximum number of errors allowed.

@end enumerate

Now some examples.  Suppose my email address is @email{richard@@doesnt.exist}.

The following will match all messages newer than 3 months from me with the word
@samp{chrony} in the subject line:

@example
mairix d:3m- f:richard+doesnt+exist s:chrony
@end example

Suppose I don't mind a few spurious matches on the address, I want a wider date
range, and I suspect that some messages I replied to might have had the subject
keyword spelt wrongly (let's allow up to 2 errors):

@example
mairix d:6m- f:richard s:chrony=2
@end example

@node date_syntax
@section Syntax used for specifying dates
This section describes the syntax used for specifying dates when searching
using the @samp{d:} option.

Dates are specified as a range.  The start and end of the range can both be
specified.  Alternatively, if the start is omitted, it is treated as being the
beginning of time.  If the end is omitted, it is treated as the current time.

There are 4 basic formats:
@table @samp
@item d:start-end
Specify both start and end explicitly
@item d:start-
Specify start, end is the current time
@item d:-end
Specify end, start is 'a long time ago' (i.e. early enough to include any message).
@item d:period
Specify start and end implicitly, as the start and end of the period given.
@end table

The start and end can be specified either absolute or relative.  A relative
endpoint is given as a number followed by a single letter defining the scaling:

@multitable @columnfractions 0.15 0.2 0.2 0.45
@item @b{letter} @tab @b{meaning} @tab @b{example} @tab @b{meaning}
@item d @tab days  @tab 3d @tab 3 days
@item w @tab weeks @tab 2w @tab 2 weeks (14 days)
@item m @tab months @tab 5m @tab 5 months (150 days)
@item y @tab years @tab 4y @tab 4 years (4*365 days)
@end multitable

Months are always treated as 30 days, and years as 365 days, for this purpose.

Absolute times can be specified in a lot of forms.  Some forms have different
meanings when they define a start date from that when they define an end date.
Where a single expression specifies both the start and end (i.e. where the
argument to d: doesn't contain a @samp{-}), it will usually have different
interpretations in the two cases.

In the examples below, suppose the current date is Sunday May 18th, 2003 (when
I started to write this material.)

@multitable @columnfractions 0.24 0.24 0.24 0.28
@item @b{Example} @tab @b{Start date} @tab @b{End date} @tab @b{Notes}
@item d:20030301@minus{}20030425 @tab March 1st, 2003 @tab 25th April, 2003
@item d:030301@minus{}030425 @tab March 1st, 2003 @tab April 25th, 2003 @tab century assumed
@item d:mar1@minus{}apr25    @tab March 1st, 2003 @tab April 25th, 2003
@item d:Mar1@minus{}Apr25    @tab March 1st, 2003 @tab April 25th, 2003 @tab case insensitive
@item d:MAR1@minus{}APR25    @tab March 1st, 2003 @tab April 25th, 2003 @tab case insensitive
@item d:1mar@minus{}25apr    @tab March 1st, 2003 @tab April 25th, 2003 @tab date and month in either order
@item d:2002          @tab January 1st, 2002 @tab December 31st, 2002 @tab whole year
@item d:mar           @tab March 1st, 2003 @tab March 31st, 2003 @tab most recent March
@item d:oct           @tab October 1st, 2002 @tab October 31st, 2002 @tab most recent October
@item d:21oct@minus{}mar     @tab October 21st, 2002 @tab March 31st, 2003 @tab start before end
@item d:21apr@minus{}mar     @tab April 21st, 2002 @tab March 31st, 2003 @tab start before end
@item d:21apr@minus{}        @tab April 21st, 2003 @tab May 18th, 2003 @tab end omitted
@item d:@minus{}21apr        @tab January 1st, 1900 @tab April 21st, 2003 @tab start omitted
@item d:6w@minus{}2w         @tab April 6th, 2003 @tab May 4th, 2003 @tab both dates relative
@item d:21apr@minus{}1w      @tab April 21st, 2003 @tab May 11th, 2003 @tab one date relative
@item d:21apr@minus{}2y      @tab April 21st, 2001 @tab May 11th, 2001 @tab start before end
@item d:99@minus{}11         @tab January 1st, 1999 @tab May 11th, 2003 @tab 2 digits are a day of the month if possible, otherwise a year
@item d:99oct@minus{}1oct    @tab October 1st, 1999 @tab October 1st, 2002 @tab end before now, single digit is a day of the month
@item d:99oct@minus{}01oct   @tab October 1st, 1999 @tab October 31st, 2001 @tab 2 digits starting with zero treated as a year
@item d:oct99@minus{}oct1    @tab October 1st, 1999 @tab October 1st, 2002 @tab day and month in either order
@item d:oct99@minus{}oct01   @tab October 1st, 1999 @tab October 31st, 2001 @tab year and month in either order
@end multitable

The principles in the table work as follows.
@itemize @bullet
@item 
When the expression defines a period of more than a day (i.e. if a month or
year is specified), the earliest day in the period is taken when the start date
is defined, and the last day in the period if the end of the range is being
defined.
@item
The end date is always taken to be on or before the current date.
@item 
The start date is always taken to be on or before the end date.
@end itemize

@bye
@c vim:cms=@c\ %s:fdm=marker:fdc=5:syntax=off