1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
|
.. _raw_url_path_recipe:
Decoding Raw URL Path
=====================
This recipe demonstrates how to access the "raw" request path using
non-standard (WSGI) or optional (ASGI) application server extensions.
This is useful when, for instance, a URI field has been percent-encoded in
order to distinguish between forward slashes inside the field's value, and
slashes used to separate fields. See also: :ref:`routing_encoded_slashes`
WSGI
----
In the WSGI flavor of the framework, :attr:`req.path <falcon.Request.path>` is
based on the ``PATH_INFO`` CGI variable, which is already presented
percent-decoded. Some application servers expose the raw URL under another,
non-standard, CGI variable name. Let us implement a middleware component that
understands two such extensions, ``RAW_URI`` (Gunicorn, Werkzeug's dev server)
and ``REQUEST_URI`` (uWSGI, Waitress, Werkzeug's dev server), and replaces
``req.path`` with a value extracted from the raw URL:
.. code:: python
import falcon
import falcon.uri
class RawPathComponent:
def process_request(self, req, resp):
raw_uri = req.env.get('RAW_URI') or req.env.get('REQUEST_URI')
# NOTE: Reconstruct the percent-encoded path from the raw URI.
if raw_uri:
req.path, _, _ = raw_uri.partition('?')
class URLResource:
def on_get(self, req, resp, url):
# NOTE: url here is potentially percent-encoded.
url = falcon.uri.decode(url)
resp.media = {'url': url}
def on_get_status(self, req, resp, url):
# NOTE: url here is potentially percent-encoded.
url = falcon.uri.decode(url)
resp.media = {'cached': True}
app = falcon.App(middleware=[RawPathComponent()])
app.add_route('/cache/{url}', URLResource())
app.add_route('/cache/{url}/status', URLResource(), suffix='status')
Running the above app with a supported server such as Gunicorn or uWSGI, the
following response is rendered to
a ``GET /cache/http%3A%2F%2Ffalconframework.org`` request:
.. code:: json
{
"url": "http://falconframework.org"
}
We can also check the status of this URI in our imaginary web caching system by
accessing ``/cache/http%3A%2F%2Ffalconframework.org/status``:
.. code:: json
{
"cached": true
}
If we removed ``RawPathComponent()`` from the app's middleware list, the
request would be routed as ``/cache/http://falconframework.org``, and no
matching resource would be found:
.. code:: json
{
"title": "404 Not Found"
}
What is more, even if we could implement a flexible router that was capable of
matching these complex URI patterns, the app would still not be able to
distinguish between ``/cache/http%3A%2F%2Ffalconframework.org%2Fstatus`` and
``/cache/http%3A%2F%2Ffalconframework.org/status`` if both were presented only
in the percent-decoded form.
ASGI
----
The ASGI version of :attr:`req.path <falcon.asgi.Request.path>` uses the
``path`` key from the ASGI scope, where percent-encoded sequences are already
decoded into characters just like in WSGI's ``PATH_INFO``.
Similar to the WSGI snippet from the previous chapter, let us create a
middleware component that replaces ``req.path`` with the value of ``raw_path``
(provided the latter is present in the ASGI HTTP scope):
.. code:: python
import falcon.asgi
import falcon.uri
class RawPathComponent:
async def process_request(self, req, resp):
raw_path = req.scope.get('raw_path')
# NOTE: Decode the raw path from the raw_path bytestring, disallowing
# non-ASCII characters, assuming they are correctly percent-coded.
if raw_path:
req.path = raw_path.decode('ascii')
class URLResource:
async def on_get(self, req, resp, url):
# NOTE: url here is potentially percent-encoded.
url = falcon.uri.decode(url)
resp.media = {'url': url}
async def on_get_status(self, req, resp, url):
# NOTE: url here is potentially percent-encoded.
url = falcon.uri.decode(url)
resp.media = {'cached': True}
app = falcon.asgi.App(middleware=[RawPathComponent()])
app.add_route('/cache/{url}', URLResource())
app.add_route('/cache/{url}/status', URLResource(), suffix='status')
Running the above snippet with ``uvicorn`` (that supports ``raw_path``), the
percent-encoded ``url`` field is now correctly handled for a
``GET /cache/http%3A%2F%2Ffalconframework.org%2Fstatus``
request:
.. code:: json
{
"url": "http://falconframework.org/status"
}
Again, as in the WSGI version, removing ``RawPathComponent()`` no longer lets
the app route the above request as intended:
.. code:: json
{
"title": "404 Not Found"
}
|