File: multipart.rst

package info (click to toggle)
python-aiohttp 0.17.2-1~bpo8%2B1
  • links: PTS, VCS
  • area: main
  • in suites: jessie-backports
  • size: 2,368 kB
  • sloc: python: 19,899; makefile: 205
file content (339 lines) | stat: -rw-r--r-- 11,662 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
.. highlight:: python

.. module:: aiohttp.multipart

.. _aiohttp-multipart:

Working with Multipart
======================

`aiohttp` supports a full featured multipart reader and writer. Both
are designed with steaming processing in mind to avoid unwanted
footprint which may be significant if you're dealing with large
payloads, but this also means that most I/O operation are only
possible to be executed a single time.

Reading Multipart Responses
---------------------------

Assume you made a request, as usual, and want to process the response multipart
data::

    >>> resp = yield from aiohttp.request(...)

First, you need to wrap the response with a
:meth:`MultipartReader.from_response`. This needs to keep the implementation of
:class:`MultipartReader` separated from the response and the connection routines
which makes it more portable::

    >>> reader = aiohttp.MultipartReader.from_response(resp)

Let's assume with this response you'd received some JSON document and multiple
files for it, but you don't need all of them, just a specific one.

So first you need to enter into a loop where the multipart body will
be processed::

    >>> metadata = None
    >>> filedata = None
    >>> while True:
    ...     part = yield from reader.next()

The returned type depends on what the next part is: if it's a simple body part
then you'll get :class:`BodyPartReader` instance here, otherwise, it will
be another :class:`MultipartReader` instance for the nested multipart. Remember,
that multipart format is recursive and supports multiple levels of nested body
parts. When there are no more parts left to fetch, ``None`` value will be
returned - that's the signal to break the loop::

    ...     if part is None:
    ...         break

Both :class:`BodyPartReader` and :class:`MultipartReader` provides access to
body part headers: this allows you to filter parts by their attributes::

    ...     if part.headers[aiohttp.hdrs.CONTENT-TYPE] == 'application/json':
    ...         metadata = yield from part.json()
    ...         continue

Nor :class:`BodyPartReader` or :class:`MultipartReader` instances doesn't
read the whole body part data without explicitly asking for.
:class:`BodyPartReader` provides a set of helpers methods
to fetch popular content types in friendly way:

- :meth:`BodyPartReader.text` for plain text data;
- :meth:`BodyPartReader.json` for JSON;
- :meth:`BodyPartReader.form` for `application/www-urlform-encode`

Each of these methods automatically recognizes if content is compressed by
using `gzip` and `deflate` encoding (while it respects `identity` one), or if
transfer encoding is base64 or `quoted-printable` - in each case the result
will get automatically decoded. But in case you need to access to raw binary
data as it is, there are :meth:`BodyPartReader.read` and
:meth:`BodyPartReader.read_chunk` coroutine methods as well to read raw binary
data as it is all-in-single-shot or by chunks respectively.

When you have to deal with multipart files, the :attr:`BodyPartReader.filename`
property comes to help. It's a very smart helper which handles
`Content-Disposition` handler right and extracts the right filename attribute
from it::

    ...     if part.filename != 'secret.txt':
    ...         continue

If current body part doesn't matches your expectation and you want to skip it
- just continue a loop to start a next iteration of it. Here is where magic
happens. Before fetching the next body part ``yield from reader.next()`` it
ensures that the previous one was read completely. If it wasn't, all its content
sends to the void in term to fetch the next part. So you don't have to care
about cleanup routines while you're within a loop.

Once you'd found a part for the file you'd searched for, just read it. Let's
handle it as it is without applying any decoding magic::

    ...     filedata = yield from part.read(decode=False)

Later you may decide to decode the data. It's still simple and possible
to do::

    ...     filedata = part.decode(filedata)

Once you are done with multipart processing, just break a loop::

    ...     break

And release the connection to do not hang the response in the middle
of the data::

    ...  yield from resp.release()  # or yield from reader.release()


Sending Multipart Requests
--------------------------

:class:`MultipartWriter` provides an interface to build multipart payload from
the Python data and serialize it into chunked binary stream. Since multipart
format is recursive and supports deeply nesting, you can use ``with`` statement
to design your multipart data closer to how it will be::

    >>> with aiohttp.MultipartWriter('mixed') as mpwriter:
    ...     ...
    ...     with aiohttp.MultipartWriter('related') as subwriter:
    ...         ...
    ...     mpwriter.append(subwriter)
    ...
    ...     with aiohttp.MultipartWriter('related') as subwriter:
    ...         ...
    ...         with aiohttp.MultipartWriter('related') as subsubwriter:
    ...             ...
    ...         subwriter.append(subsubwriter)
    ...     mpwriter.append(subwriter)
    ...
    ...     with aiohttp.MultipartWriter('related') as subwriter:
    ...         ...
    ...     mpwriter.append(subwriter)

The :meth:`MultipartWriter.append` is used to join new body parts into a
single stream. It accepts various inputs and determines what default headers
should be used for.

For text data default `Content-Type` is :mimetype:`text/plain; charset=utf-8`::

    ...     mpwriter.append('hello')

For binary data :mimetype:`application/octet-stream` is used::

    ...     mpwriter.append(b'aiohttp')

You can always override these default by passing your own headers with the second
argument::

    ...     mpwriter.append(io.BytesIO(b'GIF89a...'),
                            {'CONTENT-TYPE': 'image/gif'})

For file objects `Content-Type` will be determined by using Python's
`mimetypes`_ module and additionally `Content-Disposition` header will include
the file's basename::

    ...     part = root.append(open(__file__, 'rb))

If you want to send a file with a different name, just handle the
:class:`BodyPartWriter` instance which :meth:`MultipartWriter.append` will
always return and set `Content-Disposition` explicitly by using
the :meth:`BodyPartWriter.set_content_disposition` helper::

    ...     part.set_content_disposition('attachment', filename='secret.txt')

Additionally, you may want to set other headers here::

    ...     part.headers[aiohttp.hdrs.CONTENT_ID] = 'X-12345'

If you'd set `Content-Encoding`, it will be automatically applied to the
data on serialization (see below)::

    ...     part.headers[aiohttp.hdrs.CONTENT_ENCODING] = 'gzip'

There are also :meth:`MultipartWriter.append_json` and
:meth:`MultipartWriter.append_form` helpers which are useful to work with JSON
and form urlencoded data, so you don't have to encode it every time manually::

    ...     mpwriter.append_json({'test': 'passed'})
    ...     mpwriter.append_form([('key', 'value')])

When it's done, to make a request just pass a root :class:`MultipartWriter`
instance as :func:`aiohttp.client.request` `data` argument::

    >>> yield from aiohttp.request('POST', 'http://example.com', data=mpwriter)

Behind the scenes :meth:`MultipartWriter.serialize` will yield chunks of every
part and if body part has `Content-Encoding` or `Content-Transfer-Encoding`
they will be applied on streaming content.

Please note, that on :meth:`MultipartWriter.serialize` all the file objects
will be read until the end and there is no way to repeat a request without
rewinding their pointers to the start.

Hacking Multipart
-----------------

The Internet is full of terror and sometimes you may find a server which
implements multipart support in strange ways when an oblivious solution
doesn't work.

For instance, is server used `cgi.FieldStorage`_ then you have to ensure that
no body part contains a `Content-Length` header::

    for part in mpwriter:
        part.headers.pop(aiohttp.hdrs.CONTENT_LENGTH, None)

On the other hand, some server may require to specify `Content-Length` for the
whole multipart request. `aiohttp` doesn't do that since it sends multipart
using chunked transfer encoding by default. To overcome this issue, you have
to serialize a :class:`MultipartWriter` by our own in the way to calculate its
size::

    body = b''.join(mpwriter.serialize())
    yield from aiohttp.request('POST', 'http://example.com',
                               data=body, headers=mpwriter.headers)

Sometimes the server response may not be well formed: it may or may not
contains nested parts. For instance, we request a resource which returns
JSON documents with the files attached to it. If the document has any
attachments, they are returned as a nested multipart.
If it has not it responds as plain body parts::

    CONTENT-TYPE: multipart/mixed; boundary=--:

    --:
    CONTENT-TYPE: application/json

    {"_id": "foo"}
    --:
    CONTENT-TYPE: multipart/related; boundary=----:

    ----:
    CONTENT-TYPE: application/json

    {"_id": "bar"}
    ----:
    CONTENT-TYPE: text/plain
    CONTENT-DISPOSITION: attachment; filename=bar.txt

    bar! bar! bar!
    ----:--
    --:
    CONTENT-TYPE: application/json

    {"_id": "boo"}
    --:
    CONTENT-TYPE: multipart/related; boundary=----:

    ----:
    CONTENT-TYPE: application/json

    {"_id": "baz"}
    ----:
    CONTENT-TYPE: text/plain
    CONTENT-DISPOSITION: attachment; filename=baz.txt

    baz! baz! baz!
    ----:--
    --:--

Reading such kind of data in single stream is possible, but is not clean at
all::

    result = []
    while True:
        part = yield from reader.next()

        if part is None:
            break

        if isinstance(part, aiohttp.MultipartReader):
            # Fetching files
            while True:
                filepart = yield from part.next()
                if filepart is None:
                    break
                result[-1].append((yield from filepart.read()))

        else:
            # Fetching document
            result.append([(yield from part.json())])

Let's hack a reader in the way to return pairs of document and reader of the
related files on each iteration::

    class PairsMultipartReader(aiohttp.MultipartReader):

        # keep reference on the original reader
        multipart_reader_cls = aiohttp.MultipartReader

        @asyncio.coroutine
        def next(self):
            """Emits a tuple of document object (:class:`dict`) and multipart
            reader of the followed attachments (if any).

            :rtype: tuple
            """
            reader = yield from super().next()

            if self._at_eof:
                return None, None

            if isinstance(reader, self.multipart_reader_cls):
                part = yield from reader.next()
                doc = yield from part.json()
            else:
                doc = yield from reader.json()

            return doc, reader

And this gives us a more cleaner solution::

    reader = PairsMultipartReader.from_response(resp)
    result = []
    while True:
        doc, files_reader = yield from reader.next()

        if doc is None:
            break

        files = []
        while True:
            filepart = yield from files_reader.next()
            if file.part is None:
                break
            files.append((yield from filepart.read()))

        result.append((doc, files))

.. seealso:: Multipart API in :ref:`aiohttp-api` section.


.. _cgi.FieldStorage: https://docs.python.org/3.4/library/cgi.html
.. _mimetypes: https://docs.python.org/3.4/library/mimetypes.html


.. disqus::