File: mimestring.mli

package info (click to toggle)
netstring 0.10.1-3
  • links: PTS
  • area: main
  • in suites: woody
  • size: 1,000 kB
  • ctags: 895
  • sloc: ml: 8,389; xml: 416; makefile: 188; sh: 103
file content (680 lines) | stat: -rw-r--r-- 28,654 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
(* $Id: mimestring.mli,v 1.8 2000/08/13 00:04:36 gerd Exp $
 * ----------------------------------------------------------------------
 *
 *)

(**********************************************************************)
(* Collection of auxiliary functions to parse MIME headers            *)
(**********************************************************************)


val scan_header : 
       ?unfold:bool ->
       string -> start_pos:int -> end_pos:int -> 
         ((string * string) list * int)
    (* let params, i2 = scan_header s i0 i1:
     *
     * DESCRIPTION
     *
     * Scans the MIME header that begins at position i0 in the string s
     * and that must end somewhere before position i1. It is intended
     * that in i1 the character position following the end of the body of the
     * MIME message is passed.
     * Returns the parameters of the header as (name,value) pairs (in
     * params), and in i2 the position of the character following
     * directly after the header (i.e. after the blank line separating
     * the header from the body).
     * The following normalizations have already been applied:
     * - The names are all in lowercase
     * - Newline characters (CR and LF) have been removed (unless
     *   ?unfold:false has been passed)
     * - Whitespace at the beginning and at the end of values has been
     *   removed (unless ?unfold:false is specified)
     * The rules of RFC 2047 have NOT been applied.
     * The function fails if the header violates the header format
     * strongly. (Some minor deviations are tolerated, e.g. it is sufficient
     * to separate lines by only LF instead of CRLF.)
     *
     * OPTIONS:
     *
     * unfold: If true (the default), folded lines are concatenated and
     *   returned as one line. This means that CR and LF characters are
     *   deleted and that whitespace at the beginning and the end of the
     *   string is removed.
     *   You may set ?unfold:false to locate individual characters in the
     *   parameter value exactly.
     *
     * ABOUT MIME MESSAGE FORMAT:
     *
     * This is the modern name for messages in "E-Mail format". Messages
     * consist of a header and a body; the first empty line separates both
     * parts. The header contains lines "param-name: param-value" where
     * the param-name must begin on column 0 of the line, and the ":"
     * separates the name and the value. So the format is roughly:
     *
     * param1-name: param1-value
     * ...
     * paramN-name: paramN-value
     *
     * body
     *
     * This function wants in i0 the position of the first character of
     * param1-name in the string, and in i1 the position of the character
     * following the body. It returns as i2 the position where the body
     * begins. Furthermore, in 'params' all parameters are returned that
     * exist in the header.
     *
     * DETAILS
     *
     * Note that parameter values are restricted; you cannot represent
     * arbitrary strings. The following problems can arise:
     * - Values cannot begin with whitespace characters, because there
     *   may be an arbitrary number of whitespaces between the ':' and the
     *   value.
     * - Values (and names of parameters, too) must only be formed of
     *   7 bit ASCII characters. (If this is not enough, the MIME standard
     *   knows the extension RFC 2047 that allows that header values may
     *   be composed of arbitrary characters of arbitrary character sets.)
     * - Header values may be broken into several lines, the continuation
     *   lines must begin with whitespace characters. This means that values
     *   must not contain line breaks as semantical part of the value.
     *   And it may mean that ONE whitespace character is not distinguishable
     *   from SEVERAL whitespace characters.
     * - Header lines must not be longer than 76 characters. Values that
     *   would result into longer lines must be broken into several lines.
     *   This means that you cannot represent strings that contain too few
     *   whitespace characters.
     * - Some gateways pad the lines with spaces at the end of the lines.
     *
     * This implementation of a MIME scanner tolerates a number of
     * deviations from the standard: long lines are not rejected; 8 bit
     * values are accepted; lines may be ended only with LF instead of
     * CRLF.
     * Furthermore, header values are transformed:
     * - leading and trailing spaces are always removed
     * - CRs and LFs are deleted; it is guaranteed that there is at least
     *   one space or tab where CR/LFs are deleted.
     * Last but not least, the names of the header values are converted
     * to lowercase; MIME specifies that they are case-independent.
     *
     * COMPATIBILITY WITH THE STANDARD
     *
     * This function can parse all MIME headers that conform to RFC 822.
     * But there may be still problems, as RFC 822 allows some crazy
     * representations that are actually not used in practice.
     * In particular, RFC 822 allows it to use backslashes to "indicate"
     * that a CRLF sequence is semantically meant as line break. As this
     * function normally deletes CRLFs, it is not possible to recognize such
     * indicators in the result of the function.
     *)

(**********************************************************************)

(* The following types and functions allow it to build scanners for
 * structured MIME values in a highly configurable way.
 *
 * WHAT ARE STRUCTURED VALUES?
 *
 * RFC 822 (together with some other RFCs) defines lexical rules
 * how formal MIME header values should be divided up into tokens. Formal
 * MIME headers are those headers that are formed according to some
 * grammar, e.g. mail addresses or MIME types.
 *    Some of the characters separate phrases of the value; these are
 * the "special" characters. For example, '@' is normally a special
 * character for mail addresses, because it separates the user name
 * from the domain name. RFC 822 defines a fixed set of special
 * characters, but other RFCs use different sets. Because of this,
 * the following functions allow it to configure the set of special characters.
 *    Every sequence of characters may be embraced by double quotes,
 * which means that the sequence is meant as literal data item;
 * special characters are not recognized inside a quoted string. You may
 * use the backslash to insert any character (including double quotes)
 * verbatim into the quoted string (e.g. "He said: \"Give it to me!\"").
 * The sequence of a backslash character and another character is called
 * a quoted pair.
 *    Structured values may contain comments. The beginning of a comment
 * is indicated by '(', and the end by ')'. Comments may be nested.
 * Comments may contain quoted pairs. A
 * comment counts as if a space character were written instead of it.
 *    Control characters are the ASCII characters 0 to 31, and 127.
 * RFC 822 demands that MIME headers are 7 bit ASCII strings. Because
 * of this, this function also counts the characters 128 to 255 as
 * control characters.
 *    Domain literals are strings embraced by '[' and ']'; such literals
 * may contain quoted pairs. Today, domain literals are used to specify
 * IP addresses.
 *    Every character sequence not falling in one of the above categories
 * is an atom (a sequence of non-special and non-control characters).
 * When recognized, atoms may be encoded in a character set different than
 * US-ASCII; such atoms are called encoded words (see RFC 2047).
 *
 * EXTENDED INTERFACE:
 *
 * In order to scan a string containing a MIME value, you must first
 * create a mime_scanner using the function create_mime_scanner.
 * The scanner contains the reference to the scanned string, and a 
 * specification how the string is to be scanned. The specification
 * consists of the lists 'specials' and 'scan_options'.
 *
 * The character list 'specials' specifies the set of special characters.
 * These characters are returned as Special c token; the following additional
 * rules apply:
 *
 * - Spaces:
 *   If ' ' in specials: A space character is returned as Special ' '.
 *       Note that there may also be an effect on how comments are returned
 *       (see below).
 *   If ' ' not in specials: Spaces are ignored.
 *
 * - Tabs, CRs, LFs:
 *   If '\t' in specials: A tab character is returned as Special '\t'.
 *   If '\t' not in specials: Tabs are ignored.
 *
 *   If '\r' in specials: A CR character is returned as Special '\r'.
 *   If '\r' not in specials: CRs are ignored.
 *
 *   If '\n' in specials: A LF character is returned as Special '\n'.
 *   If '\n' not in specials: LFs are ignored.
 *
 * - Comments:
 *   If '(' in specials: Comments are not recognized. The character '('
 *       is returned as Special '('.
 *   If '(' not in specials: Comments are recognized. How comments are
 *       returned, depends on the following:
 *       If Return_comments in scan_options: Outer comments are returned as
 *           Comment (note that inner comments count but
 *           are not returned as tokens)
 *       If otherwise ' ' in specials: Outer comments are returned as
 *           Special ' '
 *       Otherwise: Comments are recognized but ignored.
 *
 * - Quoted strings:
 *   If '"' in specials: Quoted strings are not recognized, and double quotes
 *       are returned as Special '"'.
 *   If '"' not in specials: Quoted strings are returned as QString tokens.
 *
 * - Domain literals:
 *   If '[' in specials: Domain literals are not recognized, and left brackets
 *       are returned as Special '['.
 *   If '[' not in specials: Domain literals are returned as DomainLiteral
 *       tokens.
 *
 * Note that the rule for domain literals is completely new in netstring-0.9.
 * It may cause incompatibilities with previous versions if '[' is not
 * special.
 *
 * The general rule for special characters: Every special character c is
 * returned as Special c, and any additional scanning functionality 
 * for this character is turned off.
 *
 * If recognized, quoted strings are returned as QString s, where
 * s is the string without the embracing quotes, and with already
 * decoded quoted pairs.
 *
 * Control characters c are returned as Control c.
 *
 * If recognized, comments may either be returned as spaces (in the case
 * you are not interested in the contents of comments), or as Comment tokens.
 * The contents of comments are not further scanned; you must start a
 * subscanner to analyze comments as structured values.
 *
 * If recognized, domain literals are returned as DomainLiteral s, where
 * s is the literal without brackets, and with decoded quoted pairs.
 *
 * Atoms are returned as Atom s where s is a longest sequence of
 * atomic characters (all characters which are neither special nor control
 * characters nor delimiters for substructures). If the option
 * Recognize_encoded_words is on, atoms which look like encoded words
 * are returned as EncodedWord tokens. (Important note: Neither '?' nor
 * '=' must be special in order to enable this functionality.)
 *
 * After the mime_scanner has been created, you can scan the tokens by
 * invoking scan_token which returns one token at a time, or by invoking
 * scan_token_list which returns all following tokens.
 *
 * There are two token types: s_token is the base type and is intended to
 * be used for pattern matching. s_extended_token is a wrapper that 
 * additionally contains information where the token occurs.
 *
 * SIMPLE INTERFACE
 *
 * Instead of creating a mime_scanner and calling the scan functions,
 * you may also invoke scan_structured_value. This function returns the
 * list of tokens directly; however, it is restricted to s_token.
 *
 * EXAMPLES
 *
 * scan_structured_value "user@domain.com" [ '@'; '.' ] []
 *   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 *
 * scan_structured_value "user @ domain . com" [ '@'; '.' ] []
 *   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 *
 * scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ] []
 *   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 *
 * scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ] 
 *     [ Return_comments ]
 *   = [ Atom "user"; Comment; Special '@'; Atom "domain"; Special '.'; 
 *       Atom "com" ]
 *
 * scan_structured_value "user (Do you know him?) @ domain . com" 
 *     [ '@'; '.'; ' ' ] []
 *   = [ Atom "user"; Special ' '; Special ' '; Special ' '; Special '@'; 
 *       Special ' '; Atom "domain";
 *       Special ' '; Special '.'; Special ' '; Atom "com" ]
 *
 * scan_structured_value "user (Do you know him?) @ domain . com" 
 *     [ '@'; '.'; ' ' ] [ Return_comments ]
 *   = [ Atom "user"; Special ' '; Comment; Special ' '; Special '@'; 
 *       Special ' '; Atom "domain";
 *       Special ' '; Special '.'; Special ' '; Atom "com" ]
 *
 * scan_structured_value "user @ domain . com" [ '@'; '.'; ' ' ] []
 *   = [ Atom "user"; Special ' '; Special '@'; Special ' '; Atom "domain";
 *       Special ' '; Special '.'; Special ' '; Atom "com" ]
 *
 * scan_structured_value "user(Do you know him?)@domain.com" ['@'; '.'; '(']
 *     []
 *   = [ Atom "user"; Special '('; Atom "Do"; Atom "you"; Atom "know";
 *       Atom "him?)"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 *
 * scan_structured_value "\"My.name\"@domain.com" [ '@'; '.' ] []
 *   = [ QString "My.name"; Special '@'; Atom "domain"; Special '.';
 *       Atom "com" ]
 *
 * scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" 
 *     [ ] [ ] 
 *   = [ Atom "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" ]
 *
 * scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" 
 *     [ ] [ Recognize_encoded_words ] 
 *   = [ EncodedWord("ISO-8859-1", "Q", "Keld_J=F8rn_Simonsen") ]
 *
 *)



type s_token =
    Atom of string
  | EncodedWord of (string * string * string)
  | QString of string
  | Control of char
  | Special of char
  | DomainLiteral of string
  | Comment
  | End

(* - Words are: Atom, EncodedWord, QString.
 * - Atom s: The character sequence forming the atom is contained in s
 * - EncodedWord(charset, encoding, encoded_string) means:
 *   * charset is the (uppercase) character set
 *   * encoding is either "Q" or "B"
 *   * encoded_string: contains the text of the word; the text is represented
 *     as octet string following the conventions for character set charset and 
 *     then encoded either as "Q" or "B" string.
 * - QString s: Here, s are the characters inside the double quotes after
 *   decoding any quoted pairs (backslash + character pairs)
 * - Control c: The control character c
 * - Special c: The special character c
 * - DomainLiteral s: s contains the characters inside the brackets after
 *   decoding any quoted pairs
 * - Comment: if the option Return_comments is specified, this token
 *   represents the whole comment.
 * - End: Is returned after the last token
 *)


type s_option =
    No_backslash_escaping
      (* Do not handle backslashes in quoted string and comments as escape
       * characters; backslashes are handled as normal characters.
       * For example: "C:\dir\file" will be returned as
       * QString "C:\dir\file", and not as QString "C:dirfile".
       * - This is a common error in many MIME implementations.
       *)
  | Return_comments
      (* Comments are returned as token Comment (unless '(' is included
       * in the list of special characters, in which case comments are
       * not recognized at all).
       * You may get the exact location of the comment by applying
       * get_pos and get_length to the extended token.
       *)
  | Recognize_encoded_words
      (* Enables that encoded words are recognized and returned as
       * EncodedWord(charset,encoding,content) instead of Atom.
       *)

type s_extended_token
  (* An opaque type containing s_token plus:
   * - where the token occurs
   * - RFC-2047 access functions
   *)

val get_token : s_extended_token -> s_token
    (* Return the s_token within the s_extended_token *)

val get_decoded_word : s_extended_token -> string
val get_charset : s_extended_token -> string
    (* Return the decoded word (the contents of the word after decoding the
     * "Q" or "B" representation), and the character set of the decoded word
     * (uppercase).
     * These functions not only work for EncodedWord:
     * - Atom: Returns the atom without decoding it
     * - QString: Returns the characters inside the double quotes, and
     *   decodes any quoted pairs (backslash + character)
     * - Control: Returns the one-character string
     * - Special: Returns the one-character string
     * - DomainLiteral: Returns the characters inside the brackets, and
     *   decodes any quoted pairs
     * - Comment: Returns ""
     * The character set is "US-ASCII" for these tokens.
     *)

val get_pos : s_extended_token -> int
    (* Return the byte position where the token starts in the string 
     * (the first byte has position 0)
     *)

val get_line : s_extended_token -> int
    (* Return the line number where the token starts (numbering begins
     * usually with 1) 
     *)

val get_column : s_extended_token -> int
    (* Return the column of the line where the token starts (first column
     * is number 0)
     *)

val get_length : s_extended_token -> int
    (* Return the length of the token in bytes *)

val separates_adjacent_encoded_words : s_extended_token -> bool
    (* True iff the current token is white space (Special ' ', Special '\t',
     * Special '\r' or Special '\n') and the last non-white space token
     * was EncodedWord and the next non-white space token will be
     * EncodedWord.
     * Such spaces do not count and must be ignored by any application.
     *)


type mime_scanner

val create_mime_scanner : 
      specials:char list -> 
      scan_options:s_option list -> 
      ?pos:int ->
      ?line:int ->
      ?column:int ->
      string -> 
        mime_scanner
    (* Creates a new mime_scanner scanning the passed string.
     * specials: The list of characters recognized as special characters.
     * scan_options: The list of global options modifying the behaviour
     *   of the scanner
     * pos: The position of the byte where the scanner starts in the
     *   passed string. Defaults to 0.
     * line: The line number of this byte. Defaults to 1.
     * column: The column number of this byte. Default to 0.
     *
     * The optional parameters pos, line, column are intentionally after
     * scan_options and before the string argument, so you can specify
     * scanners by partially applying arguments to create_mime_scanner
     * which are not yet connected with a particular string:
     * let my_scanner_spec = create_mime_scanner my_specials my_options in
     * ...
     * let my_scanner = my_scanner_spec my_string in 
     * ...
     *)

val get_pos_of_scanner : mime_scanner -> int
val get_line_of_scanner : mime_scanner -> int
val get_column_of_scanner : mime_scanner -> int
    (* Return the current position, line, and column of a mime_scanner.
     * The primary purpose of these functions is to simplify switching
     * from one mime_scanner to another within a string:
     *
     * let scanner1 = create_mime_scanner ... s in
     * ... now scanning some tokens from s using scanner1 ...
     * let scanner2 = create_mime_scanner ... 
     *                  ?pos:(get_pos_of_scanner scanner1)
     *                  ?line:(get_line_of_scanner scanner1)
     *                  ?column:(get_column_of_scanner scanner1)
     *                  s in
     * ... scanning more tokens from s using scanner2 ...
     *
     * RESTRICTION: These functions are not available if the option
     * Recognize_encoded_words is on. The reason is that this option
     * enables look-ahead scanning; please use the location of the last
     * scanned token instead.
     * It is currently not clear whether a better implementation is needed
     * (costs a bit more time).
     *
     * Note: To improve the performance of switching, it is recommended to
     * create scanner specs in advance (see the example my_scanner_spec
     * above).
     *)

val scan_token : mime_scanner -> (s_extended_token * s_token)
    (* Returns the next token, or End if there is no more token. *)

val scan_token_list : mime_scanner -> (s_extended_token * s_token) list
    (* Returns all following tokens as a list (excluding End) *)

val scan_structured_value : string -> char list -> s_option list -> s_token list
    (* This function is included for backwards compatibility, and for all
     * cases not requiring extended tokens.
     *
     * It scans the passed string according to the list of special characters
     * and the list of options, and returns the list of all tokens.
     *)

val specials_rfc822 : char list
val specials_rfc2045 : char list
    (* The sets of special characters defined by the RFCs 822 and 2045.
     *
     * CHANGE in netstring-0.9: '[' and ']' are no longer special because
     * there is now support for domain literals.
     * '?' and '=' are not special in the rfc2045 version because there is
     * already support for encoded words.
     *)


(**********************************************************************)

(* Widely used scanners: *)


val scan_encoded_text_value : string -> s_extended_token list
    (* Scans a "text" value. The returned token list contains only
     * Special, Atom and EncodedWord tokens. 
     * Spaces, TABs, CRs, LFs are returned unless
     * they occur between adjacent encoded words in which case
     * they are ignored.
     *)


val scan_value_with_parameters : string -> s_option list ->
                                   (string * (string * string) list)
    (* let name, params = scan_value_with_parameters s options:
     * Scans phrases like
     *    name ; p1=v1 ; p2=v2 ; ...
     * The scan is done with the set of special characters [';', '='].
     *)

val scan_mime_type : string -> s_option list ->
                       (string * (string * string) list)
    (* let name, params = scan_mime_type s options:
     * Scans MIME types like
     *    text/plain; charset=iso-8859-1
     * The name of the type and the names of the parameters are converted
     * to lower case.
     *)


(**********************************************************************)

(* Scanners for MIME bodies *)

val scan_multipart_body : string -> start_pos:int -> end_pos:int -> 
                            boundary:string ->
                            ((string * string) list * string) list
    (* let [params1, value1; params2, value2; ...]
     *   = scan_multipart_body s i0 i1 b
     *
     * Scans the string s that is the body of a multipart message.
     * The multipart message begins at position i0 in s and i1 the position
     * of the character following the message. In b the boundary string
     * must be passed (this is the "boundary" parameter of the multipart
     * MIME type, e.g. multipart/mixed;boundary="some string" ).
     *     The return value is the list of the parts, where each part
     * is returned as pair (params, value). The left component params
     * is the list of name/value pairs of the header of the part. The
     * right component is the RAW content of the part, i.e. if the part
     * is encoded ("content-transfer-encoding"), the content is returned
     * in the encoded representation. The caller must himself decode
     * the content.
     *     The material before the first boundary and after the last
     * boundary is not returned.
     *
     * MULTIPART MESSAGES
     *
     * The MIME standard defines a way to group several message parts to
     * a larger message (for E-Mails this technique is known as "attaching"
     * files to messages); these are the so-called multipart messages.
     * Such messages are recognized by the major type string "multipart",
     * e.g. multipart/mixed or multipart/form-data. Multipart types MUST
     * have a boundary parameter because boundaries are essential for the
     * representation.
     *    Multipart messages have a format like
     *
     * ...Header...
     * Content-type: multipart/xyz; boundary="abc"
     * ...Header...
     *
     * Body begins here ("prologue")
     * --abc
     * ...Header part 1...
     *
     * ...Body part 1...
     * --abc
     * ...Header part 2...
     *
     *
     * ...Body part 2
     * --abc
     * ...
     * --abc--
     * Epilogue
     *
     * The parts are separated by boundary lines which begin with "--" and
     * the string passed as boundary parameter. (Note that there may follow
     * arbitrary text on boundary lines after "--abc".) The boundary is
     * chosen such that it does not occur as prefix of any line of the
     * inner parts of the message.
     *     The parts are again MIME messages, with header and body. Note
     * that it is explicitely allowed that the parts are even multipart
     * messages.
     *     The texts before the first boundary and after the last boundary
     * are ignored.
     *     Note that multipart messages as a whole MUST NOT be encoded.
     * Only the PARTS of the messages may be encoded (if they are not
     * multipart messages themselves).
     *
     * Please read RFC 2046 if want to know the gory details of this
     * brain-dead format.
     *)

val scan_multipart_body_and_decode : string -> start_pos:int -> end_pos:int -> 
                                        boundary:string ->
                                        ((string * string) list * string) list
    (* Same as scan_multipart_body, but decodes the bodies of the parts
     * if they are encoded using the methods "base64" or "quoted printable".
     * Fails, if an unknown encoding is used.
     *)

val scan_multipart_body_from_netstream
    : Netstream.t ->
      boundary:string ->
      create:((string * string) list -> 'a) ->
      add:('a -> Netstream.t -> int -> int -> unit) ->
      stop:('a -> unit) ->
      unit
    (* scan_multipart_body_from_netstream s b create add stop:
     *
     * Reads the MIME message from the netstream s block by block. The
     * parts are delimited by the boundary b.
     *
     * Once a new part is detected and begins, the function 'create' is
     * called with the MIME header as argument. The result p of this function
     * may be of any type.
     *
     * For every chunk of the part that is being read, the function 'add'
     * is invoked: add p s k n.
     * Here, p is the value returned by the 'create' invocation for the
     * current part. s is the netstream. The current window of s contains
     * the read chunk completely; the chunk begins at position k of the
     * window (relative to the beginning of the window) and has a length
     * of n bytes.
     *
     * When the part has been fully read, the function 'stop' is
     * called with p as argument.
     *
     * That means, for every part the following is executed:
     * - let p = create h
     * - add p s k1 n1
     * - add p s k2 n2
     * - ...
     * - add p s kN nN
     * - stop p
     *
     * IMPORTANT PRECONDITION:
     * - The block size of the netstream s must be at least
     *   String.length b + 3
     *
     * EXCEPTIONS:
     * - Exceptions can happen because of ill-formed input, and within
     *   the callbacks of the functions 'create', 'add', 'stop'.
     * - If the exception happens while part p is being read, and the
     *   'create' function has already been called (successfully), the
     *   'stop' function is also called (you have the chance to close files).
     *)


(* THREAD-SAFETY:
 * The functions are thread-safe as long as the threads do not share
 * values.
 *)

(* ======================================================================
 * History:
 *
 * $Log: mimestring.mli,v $
 * Revision 1.8  2000/08/13 00:04:36  gerd
 * 	Encoded_word -> EncodedWord
 * 	Bugfixes.
 *
 * Revision 1.7  2000/08/07 00:25:00  gerd
 * 	Major update of the interface for structured field lexing.
 *
 * Revision 1.6  2000/06/25 22:34:43  gerd
 * 	Added labels to arguments.
 *
 * Revision 1.5  2000/06/25 21:15:48  gerd
 * 	Checked thread-safety.
 *
 * Revision 1.4  2000/05/16 22:29:12  gerd
 * 	New "option" arguments specifying the level of MIME
 * compatibility.
 *
 * Revision 1.3  2000/04/15 13:09:01  gerd
 * 	Implemented uploads to temporary files.
 *
 * Revision 1.2  2000/03/02 01:15:30  gerd
 * 	Updated.
 *
 * Revision 1.1  2000/02/25 15:21:12  gerd
 * 	Initial revision.
 *
 *
 *)