File: format.rst

package info (click to toggle)
python-headerparser 0.5.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 404 kB
  • sloc: python: 3,133; makefile: 6; sh: 4
file content (71 lines) | stat: -rw-r--r-- 3,194 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
Input Format
============
`headerparser` accepts a syntax that is intended to be a simplified superset of
the Internet Message (e-mail) Format specified in :rfc:`822`, :rfc:`2822`, and
:rfc:`5322`.  Specifically:

- Everything in the input up to (but not including) the first blank line (i.e.,
  a line containing only a line ending) constitutes a :dfn:`stanza` or
  :dfn:`header section`.  Everything after the first blank line is a free-form
  :dfn:`message body`.  If there are no blank lines, the entire input is used
  as the header section, and there is no body.

  .. note::

    By default, blank lines at the beginning of a document are interpreted as
    the ending of a zero-length stanza.  Such blank lines can instead be
    ignored by setting the ``skip_leading_newlines`` `Scanner` option to true.

- A stanza or header section is composed of zero or more :dfn:`header fields`.
  A header field is composed of one or more lines, with all lines after the
  first beginning with a space or tab.  Additionally, the first line must
  contain a colon (optionally surrounded by whitespace); everything before the
  colon is the :dfn:`header field name`, while everything after (including
  subsequent lines) is the :dfn:`header field value`.

  .. note::

    Name-value separators other than a colon can be used by setting the
    ``separator_regex`` `Scanner` option appropriately.

  .. note::

    `headerparser` only recognizes CR, LF, and CR LF sequences as line endings.

An example::

    Key: Value
    Foo: Bar
    Bar:Whitespace around the colon is optional
    Baz  :  Very optional
    Long-Field: This field has a very long value, so I'm going to split it
      across multiple lines.
      
      The above line is all whitespace.  This counts as line folding, and so
      we're still in the "Long Field" value, but the RFCs consider such lines
      obsolete, so you should avoid using them.
      .
      One alternative to an all-whitespace line is a line with just indentation
      and a period.  Debian package description fields use this.
    Foo: Wait, I already defined a value for this key.  What happens now?
    What happens now: It depends on whether the `multiple` option for the "Foo"
      field was set in the HeaderParser.
    If multiple=True: The "Foo" key in the dictionary returned by
      HeaderParser.parse() would map to a list of all of Foo's values
    If multiple=False: A ParserError is raised
    If multiple=False but there's only one "Foo" anyway:
      The "Foo" key in the result dictionary would map to just a single string.
    Compare this to: the standard library's `email` package, which accepts
      multi-occurrence fields, but *which* occurrence Message.__getitem__
      returns is unspecified!

    Are we still in the header: no
      There was a blank line above, so we're now in the body, which isn't
      processed for headers.
    Good thing, too, because this isn't a valid header line.

On the other hand, this is not a valid RFC 822-style document::

        An indented first line — without a "Name:" line before it!
    A header line without a colon isn't good, either.
    Does this make up for the above: no