File: raw-url-path.rst

package info (click to toggle)
python-falcon 3.1.1-5
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 5,204 kB
  • sloc: python: 28,455; makefile: 184; sh: 139; javascript: 66
file content (153 lines) | stat: -rw-r--r-- 4,854 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
.. _raw_url_path_recipe:

Decoding Raw URL Path
=====================

This recipe demonstrates how to access the "raw" request path using
non-standard (WSGI) or optional (ASGI) application server extensions.
This is useful when, for instance, a URI field has been percent-encoded in
order to distinguish between forward slashes inside the field's value, and
slashes used to separate fields. See also: :ref:`routing_encoded_slashes`

WSGI
----

In the WSGI flavor of the framework, :attr:`req.path <falcon.Request.path>` is
based on the ``PATH_INFO`` CGI variable, which is already presented
percent-decoded. Some application servers expose the raw URL under another,
non-standard, CGI variable name. Let us implement a middleware component that
understands two such extensions, ``RAW_URI`` (Gunicorn, Werkzeug's dev server)
and ``REQUEST_URI`` (uWSGI, Waitress, Werkzeug's dev server), and replaces
``req.path`` with a value extracted from the raw URL:

.. code:: python

    import falcon
    import falcon.uri


    class RawPathComponent:
        def process_request(self, req, resp):
            raw_uri = req.env.get('RAW_URI') or req.env.get('REQUEST_URI')

            # NOTE: Reconstruct the percent-encoded path from the raw URI.
            if raw_uri:
                req.path, _, _ = raw_uri.partition('?')


    class URLResource:
        def on_get(self, req, resp, url):
            # NOTE: url here is potentially percent-encoded.
            url = falcon.uri.decode(url)

            resp.media = {'url': url}

        def on_get_status(self, req, resp, url):
            # NOTE: url here is potentially percent-encoded.
            url = falcon.uri.decode(url)

            resp.media = {'cached': True}


    app = falcon.App(middleware=[RawPathComponent()])
    app.add_route('/cache/{url}', URLResource())
    app.add_route('/cache/{url}/status', URLResource(), suffix='status')

Running the above app with a supported server such as Gunicorn or uWSGI, the
following response is rendered to
a ``GET /cache/http%3A%2F%2Ffalconframework.org`` request:

.. code:: json

    {
        "url": "http://falconframework.org"
    }

We can also check the status of this URI in our imaginary web caching system by
accessing ``/cache/http%3A%2F%2Ffalconframework.org/status``:

.. code:: json

    {
        "cached": true
    }

If we removed ``RawPathComponent()`` from the app's middleware list, the
request would be routed as ``/cache/http://falconframework.org``, and no
matching resource would be found:

.. code:: json

    {
        "title": "404 Not Found"
    }

What is more, even if we could implement a flexible router that was capable of
matching these complex URI patterns, the app would still not be able to
distinguish between ``/cache/http%3A%2F%2Ffalconframework.org%2Fstatus`` and
``/cache/http%3A%2F%2Ffalconframework.org/status`` if both were presented only
in the percent-decoded form.

ASGI
----

The ASGI version of :attr:`req.path <falcon.asgi.Request.path>` uses the
``path`` key from the ASGI scope, where percent-encoded sequences are already
decoded into characters just like in WSGI's ``PATH_INFO``.
Similar to the WSGI snippet from the previous chapter, let us create a
middleware component that replaces ``req.path`` with the value of ``raw_path``
(provided the latter is present in the ASGI HTTP scope):

.. code:: python

    import falcon.asgi
    import falcon.uri


    class RawPathComponent:
        async def process_request(self, req, resp):
            raw_path = req.scope.get('raw_path')

            # NOTE: Decode the raw path from the raw_path bytestring, disallowing
            #   non-ASCII characters, assuming they are correctly percent-coded.
            if raw_path:
                req.path = raw_path.decode('ascii')


    class URLResource:
        async def on_get(self, req, resp, url):
            # NOTE: url here is potentially percent-encoded.
            url = falcon.uri.decode(url)

            resp.media = {'url': url}

        async def on_get_status(self, req, resp, url):
            # NOTE: url here is potentially percent-encoded.
            url = falcon.uri.decode(url)

            resp.media = {'cached': True}


    app = falcon.asgi.App(middleware=[RawPathComponent()])
    app.add_route('/cache/{url}', URLResource())
    app.add_route('/cache/{url}/status', URLResource(), suffix='status')

Running the above snippet with ``uvicorn`` (that supports ``raw_path``), the
percent-encoded ``url`` field is now correctly handled for a
``GET /cache/http%3A%2F%2Ffalconframework.org%2Fstatus``
request:

.. code:: json

    {
        "url": "http://falconframework.org/status"
    }

Again, as in the WSGI version, removing ``RawPathComponent()`` no longer lets
the app route the above request as intended:

.. code:: json

    {
        "title": "404 Not Found"
    }