File: file-wrapper-extension.rst

package info (click to toggle)
mod-wsgi 5.0.2-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 2,824 kB
sloc: ansic: 15,512; python: 3,697; makefile: 219; sh: 107
file content (217 lines) | stat: -rw-r--r-- 10,604 bytes
parent folder | download | duplicates (5)
======================
File Wrapper Extension
======================

The WSGI specification supports an optional feature that can be implemented
by WSGI adapters for platform specific file handling.

  * http://www.python.org/dev/peps/pep-0333/#optional-platform-specific-file-handling

What this allows is for a WSGI application to return a special object type
which wraps a Python file like object. If that file like object statisfies
certain conditions as dictated by a specific platform, then the WSGI
adapter is allowed to return the content of that file in an optimised
manner.

The intent of this is to provide better performance for serving up static
file content than a pure Python WSGI application may itself be able to
achieve.

Do note however that for the best performance, static files should always
be served by a web server. In the case of mod_wsgi this means by Apache
itself rather than mod_wsgi or the WSGI application. Using the web server
may not always be possible however, such as for files generated on demand.

Example Of Wrapper Usage
------------------------

A WSGI adapter implementing this extension needs to supply a special
callable object under the key 'wsgi.file_wrapper' in the 'environ'
dictionary passed to the WSGI application.

What this callable does will be specific to a WSGI adapter, but it must be
a callable that accepts one required positional parameter, and one optional
positional parameter. The first parameter is the file like object to be
sent, and the second parameter is an optional block size. If the block size
is not supplied then the WSGI adapter would choose a value which would be
most appropriate for the specific hosting mechanism.

Whatever the WSGI adapter does, the result of the callable must be an
iterable object which can be used directly as the response from the WSGI
application or for passing into any WSGI middleware. Provided the response
content isn't consumed by any WSGI middleware and the iterable object gets
passed through the WSGI adapter, the WSGI adapter should recognise the
special iterable object and trigger any special handling to return the
response in a more efficient way.

Because the support of this platform specific file handling is optional for
any specific WSGI adapter, any user code should be coded so as to be able
to cope with it not existing.

Using the snippet as described in the WSGI specification as guide, the
WSGI application would be written as follows::

    def application(environ, start_response):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        start_response(status, response_headers)

        filelike = file('usr/share/dict/words', 'rb')
        block_size = 4096

        if 'wsgi.file_wrapper' in environ:
                return environ['wsgi.file_wrapper'](filelike, block_size)
        else:
            return iter(lambda: filelike.read(block_size), '')

Note that the file must always be opened in binary mode. If this isn't done
then on platforms which do CR/LF translation automatically then the
original content will not be returned but the translated form. As well as
it not being the original content this can cause problems with calculated
content lengths if the 'Content-Length' response header is returned by the
WSGI application and it has been generated by looking at the actual file
size rather than the translated content.

Addition Of Content Length
--------------------------

The WSGI specification does not say anything specific about whether a WSGI
adapter should generate a 'Content-Length' response header when the
'wsgi.file_wrapper' extension is used and the WSGI application does not
return one itself.

For mod_wsgi at least, if the WSGI application doesn't provide a
'Content-Length' response header it will calculate the response content
length automatically as being from the current file position to the end of
the file. A 'Content-Length' header will then be added to the response
for that value.

As far as is known, only mod_wsgi automatically supplies a 'Content-Length'
response header in this way. If consistent behaviour is required on all
platforms, the WSGI application should always calculate the length and add
the header itself.

Existing Content Length
-----------------------

Where a 'Content-Length' is specified by the WSGI application, mod_wsgi
will honour that content length. That is, mod_wsgi will only return as many
bytes of the file as specified by the 'Content-Length' header.

This is not a requirement of the WSGI specification, but then this is one
area of the WSGI specification which is arguably broken. This manifests in
the WSGI specification where it says:

  """transmission should begin at the current position within the "file"
  at the time that transmission begins, and continue until the end is
  reached"""

If this interpretation is used, where a WSGI application supplies a
'Content-Length' header and the number of bytes listed is less than the
number of bytes remaining in the file from the current position, then more
bytes than specified by the 'Content-Length' header would be returned.

To do this would technically be in violation of HTTP specifications which
should dictate that the number of bytes returned be the same as that
specified by the 'Content-Length' response header if supplied.

Not only is this statement in the WSGI specification arguably wrong, the
example snippet of code which shows how to implement a fallback where the
'wsgi.file_wrapper' is not present, ie.::

    if 'wsgi.file_wrapper' in environ:
        return environ['wsgi.file_wrapper'](filelike, block_size)
    else:
        return iter(lambda: filelike.read(block_size), '')

is also wrong. This is because it doesn't restrict the amount of bytes
returned to that specified by 'Content-Length'.

Although mod_wsgi for normal iterable content would also discard any bytes
in excess of the specified 'Content-Length', many other WSGI adapters are
not known to do this and would just pass back all content regardless. The
result of returning excessive content above the specified 'Content-Length'
would be the failure of subsequent connections were the connection using
keep alive and was pipe lining requests.

This problem is also compounded by the WSGI specification not placing any
requirement on WSGI middleware to respect the 'Content-Length' response
header when processing response content. Thus WSGI middleware could also
in general generate incorrect response content by virtue of not honouring
the 'Content-Length' response header.

Overall, although mod_wsgi does what is the logical and right thing to do,
if you need to write code which is portable to other WSGI hosting mechanisms,
you should never produce a 'Content-Length' response header which lists a
number of bytes different to that which would be yielded from an iterable
object such as a file like object. Thus it would be impossible to use any
platform specific file handling features to return a range of bytes from a
file.

Restrictions On Optimisations
-----------------------------

Although mod_wsgi always supplies the 'wsgi.file_wrapper' callable object as
part of the WSGI 'environ' dictionary, optimised methods of returning the
file contents as the response are not always used.

A general restriction is that the file like object must supply both a
'fileno()' and 'tell()' method. This is necessary in order to get access to
the underlying file descriptor and to determine the current position within
the file.

The file descriptor is needed so as to be able to use the 'sendfile()'
function to return file contents in a more optimal manner. The 'tell()'
method is needed to be able to calculate response 'Content-Length' and to
validate that where the WSGI application supplies its own 'Content-Length'
header that there is sufficient bytes in the file.

Because the 'sendfile()' function is used by Apache to return file contents
in a more optimal manner and because on Windows a Python file object only
provides a Windows file handle and not a file descriptor, no optimisations
are available on the Windows platform.

The optimisations are also not able to be used if using Apache 1.3. This is
because Apache doesn't provide access to a mechanism for optimised sending
of file contents to a content handler under Apache 1.3.

Finally, optimisations are not used where the WSGI application is running in
daemon mode. This is currently disabled because some UNIX platforms do not
appear to support use of the 'sendfile()' function over UNIX sockets and only
support INET sockets. This situation may possibly have changed with recent
versions of Linux at least but this has yet to be investigated properly.

Whether or not optimisations are supported, the mod_wsgi 'wsgi.file_wrapper'
extension generally still performs better than if a pure Python iterable
object was used to yield the file contents.

Note that this all presumes that the iterable object returned by
'wsgi.file_wrapper' is actually passed back to mod_wsgi and is not consumed
by a WSGI middleware. For example, a WSGI middleware which compresses the
response content would consume the response content and modify it with a
different iterable object being returned. In this case there is no chance
for optimisations to be used for returning the file contents.

This problem isn't restricted though to just where the response content is
modified in some way and also extends to any WSGI middleware that wants to
replace the 'close()' method to perform some cleanup actions at the end of
a request.

This is because in order to interject the cleanup actions triggered on the
'close()' method of the iterable object it has to replace the existing
iterable object with another which wraps the first, with the outer
providing its own 'close()' method. An example of a middleware which
replaces the 'close()' method in this way can be found in
:doc:`../user-guides/registering-cleanup-code`.

It is thus quite easy for a WSGI application stack to inadvertantly defeat
completely any attempts to return file contents in an optimised way using
the 'wsgi.file_wrapper' extension of WSGI. As such, attempts should always
be used instead to make use of a real web server, whether that be a separate
web server, or in the case of mod_wsgi the underlying Apache web server.

Where necessary, features of web servers or proxies such as
'X-Accel-Redirect', 'X-Sendfile' or other special purpose headers could be
used. If using mod_wsgi daemon mode and using mod_wsgi version 3.0 or later,
the 'Location' response header can also be used.