1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
|
Input Format
============
`headerparser` accepts a syntax that is intended to be a simplified superset of
the Internet Message (e-mail) Format specified in :rfc:`822`, :rfc:`2822`, and
:rfc:`5322`. Specifically:
- Everything in the input up to (but not including) the first blank line (i.e.,
a line containing only a line ending) constitutes a :dfn:`stanza` or
:dfn:`header section`. Everything after the first blank line is a free-form
:dfn:`message body`. If there are no blank lines, the entire input is used
as the header section, and there is no body.
.. note::
By default, blank lines at the beginning of a document are interpreted as
the ending of a zero-length stanza. Such blank lines can instead be
ignored by setting the ``skip_leading_newlines`` `Scanner` option to true.
- A stanza or header section is composed of zero or more :dfn:`header fields`.
A header field is composed of one or more lines, with all lines after the
first beginning with a space or tab. Additionally, the first line must
contain a colon (optionally surrounded by whitespace); everything before the
colon is the :dfn:`header field name`, while everything after (including
subsequent lines) is the :dfn:`header field value`.
.. note::
Name-value separators other than a colon can be used by setting the
``separator_regex`` `Scanner` option appropriately.
.. note::
`headerparser` only recognizes CR, LF, and CR LF sequences as line endings.
An example::
Key: Value
Foo: Bar
Bar:Whitespace around the colon is optional
Baz : Very optional
Long-Field: This field has a very long value, so I'm going to split it
across multiple lines.
The above line is all whitespace. This counts as line folding, and so
we're still in the "Long Field" value, but the RFCs consider such lines
obsolete, so you should avoid using them.
.
One alternative to an all-whitespace line is a line with just indentation
and a period. Debian package description fields use this.
Foo: Wait, I already defined a value for this key. What happens now?
What happens now: It depends on whether the `multiple` option for the "Foo"
field was set in the HeaderParser.
If multiple=True: The "Foo" key in the dictionary returned by
HeaderParser.parse() would map to a list of all of Foo's values
If multiple=False: A ParserError is raised
If multiple=False but there's only one "Foo" anyway:
The "Foo" key in the result dictionary would map to just a single string.
Compare this to: the standard library's `email` package, which accepts
multi-occurrence fields, but *which* occurrence Message.__getitem__
returns is unspecified!
Are we still in the header: no
There was a blank line above, so we're now in the body, which isn't
processed for headers.
Good thing, too, because this isn't a valid header line.
On the other hand, this is not a valid RFC 822-style document::
An indented first line — without a "Name:" line before it!
A header line without a colon isn't good, either.
Does this make up for the above: no
|