1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
|
- Should string `default` values be passed through `type` etc. like in
argparse?
- Rethink how the original exception data is attached to `FieldTypeError`s
- Include everything from `sys.exc_info()`?
- Rename `NormalizedDict.normalized_dict()` to something that doesn't imply it
returns a `NormalizedDict`?
- Add docstrings to private classes and attributes
- Write more tests
- different header name normalizers (identity, hyphens=underscores,
titlecase?, etc.)
- `add_additional`
- calling `add_additional` multiple times (some times with
`allow=False`)
- `add_additional(False, extra arguments ...)`
- `add_additional` when a header has a `dest` that's just a normalized
form of one of its names
- calling `add_field`/`add_additional` on a `HeaderParser` after a previous
call raised an error
- scanning & parsing Unicode
- normalizer that returns a non-string
- non-string keys in `NormalizedDict` with the default normalizer
- equality of `HeaderParser` objects
- Test that `HeaderParser.parse_stream()` won't choke on non-string inputs
- passing scanner options to `HeaderParser`
- scanning files not opened in universal newlines mode
- Improve documentation & examples
- Contrast handling of multi-occurrence fields with that of the standard
library
- Draw attention to the case-insensitivity of field names when parsing and
when retrieving from the dict
- Give examples of custom normalization (or at least explain what it is and
why it's worth having)
- Add `action` examples
- Add example recipes to the documentation of `HeaderParser`s for common
mail-like formats
- Write more user-friendly documentation that goes through `HeaderParser`
feature by feature like `attrs`' documentation
Features
========
- Add some sort of handling for "From " lines
- Give `NormalizedDict` a `from_line` attribute
- Give the scanner a `from_line_regex` parameter; if the first line of a
stanza matches the regex, it is assumed to be a "From" line
- Create a "`SpecialHeader`" enum with `FromLine` and `Body` values for use
as the first element of `(header, value)` pairs yielded by the scanner
representing "From " lines and bodies
- Use the enum values as keys in `NormalizedDict`s instead of having
dedicated `from_line` and `body` attributes?
- Give the parser an option for requiring a "From " line
- Export premade regexes for matching Unix mail "From " lines, HTTP
request lines, and HTTP response status lines
- Write an entry point for converting RFC822-style files/headers to JSON
- name: `mail2json`? `headers2json`?
- include options for:
- parsing multiple stanzas into an array of JSON objects
- setting the key name for the "message body"
- handling of multiple occurrences of the same header in a single
stanza; choices:
- raise an error
- combine multi-occurrence headers into an array of values
- use an array of values for all headers regardless of multiplicity
(default?)
- output an array of `{"header": ..., "value": ...}` objects
- handling of non-ASCII characters and the various ways in which they
can be escaped
- handling of "From " lines (and/or other non-header headers like the
first line of an HTTP request or response?)
- handling of header lettercases?
Scanning
--------
- Give the scanner options for:
- definition of "whitespace" for purposes of folding (standard: 0x20 and
TAB)
- line separator/terminator (default: CR, LF, and CRLF; standard: only
CRLF, with lone CR and LF being obsolete)
- using Unicode definitions of line endings and horizontal whitespace
- stripping leading whitespace from folded lines? (standard: no)
- handling "From " lines and the like
- ignoring all blank lines?
- comments? (cf. robots.txt)
- internationalization of header names
- treating `---` as a blank line?
- Error handling:
- header lines without a colon or indentation (options: error, header
with empty value, or start of body)
- empty header name (options: error, header with empty name, look for
next colon, or start of body)
- all-whitespace line (considered obsolete by RFC 5322)
Parsing
-------
- Include utility callables for header types:
- RFC822 dates, addresses, etc.
- Content-Type-style "parameterized" headers
- Include an `object_pairs_hook` for the parameters?
- cf. `cgi.parse_header()`
- internationalized strings
- converting lines with just '.' to blank lines
- Somehow support the types in `email.headerregistry`
- Provide a `Normalizer` class with options for casing, trimming
whitespace, squashing whitespace, converting hyphens and underscores to
the same character, squashing hyphens & underscores, etc.
- unfolding if & only if the first line of the value contains any
non-whitespace? (cf. most multiline fields in Debian control files)
- DKIM headers?
- removing RFC 822 comments?
- comma-and-space-separated lists?
- cf. `urllib.request.parse_http_list()`?
- New `add_field` and `add_additional` options to add:
- `default_action=callable` for defining what to do when a header is absent
- `multiple_type` and `multiple_action` — like `type` and `action`, but
called on a list of all values encountered for a `multiple` field
- `i18n=bool` — turns on decoding of internationalized mail headers before
passing to `type` (Do this via a custom type instead?)
- Give `add_additional` an option for controlling whether to normalize
additional header names before adding them to the dict?
- Requiring/forbidding nonempty/non-whitespace bodies
- Add public methods for removing, inspecting, & modifying header definitions
- Make the `body`, `scanner_opts`, etc. attributes public
- Support constructing a complete `HeaderParser` in a single expression from a
`dict` rather than having to make multiple calls to `add_field`
- Support converting a `HeaderParser` instance to such a `dict`
- Support modifying a `HeaderParser`'s field definitions after they're defined?
- Allow two different named fields to have the same `dest` if they both have
`multiple=True`? (or both `multiple=False`?)
- Give `add_additional` an argument for putting all additional fields in a
given subdict (or a presupplied arbitrary mapping object?) so that named
fields can still use custom dests?
- Give parsers a way to store parsed fields in a presupplied arbitrary mapping
object (or one created from a `dict_factory`/`dict_cls` callable?) instead of
creating a new NormalizedDict?
- Give `HeaderParser` an option (`body_key`?) for storing the body in a given
`dict` key
- Create a `BODY` token to use as a `dict` key for storing bodies instead of
storing them as an attribute?
- Add an option/method for ignoring & discarding any unknown/"additional"
fields
- Add handling for fields that can either occur in the header or be the body
(e.g., "Description" in Python packaging METADATA)
- Require scanner options to be passed to `HeaderParser`'s constructor in a
`scanner_opts={}` `dict` instead of as `**kwargs`
|