File: validating.rst

package info (click to toggle)
python-rfc3986 1.4.0-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 456 kB
  • sloc: python: 2,562; makefile: 16
file content (187 lines) | stat: -rw-r--r-- 6,517 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
.. _validating:

=================
 Validating URIs
=================

While not as difficult as `validating an email address`_, validating URIs is
tricky. Different parts of the URI allow different characters. Those sets
sometimes overlap and othertimes they don't and it's not very convenient.
Luckily, |rfc3986| makes validating URIs far simpler.


Example Usage
=============

First we need to create an instance of a
:class:`~rfc3986.validators.Validator` which takes no parameters. After that
we can call methods on the instance to indicate what we want to validate.

Allowing Only Trusted Domains and Schemes
-----------------------------------------

Let's assume that we're building something that takes user input for a URL and
we want to ensure that URL is only ever using a specific domain with https. In
that case, our code would look like this:

.. doctest::

    >>> from rfc3986 import validators, uri_reference
    >>> user_url = 'https://github.com/sigmavirus24/rfc3986'
    >>> validator = validators.Validator().allow_schemes(
    ...     'https',
    ... ).allow_hosts(
    ...     'github.com',
    ... )
    >>> validator.validate(uri_reference(
    ...     'https://github.com/sigmavirus24/rfc3986'
    ... ))
    >>> validator.validate(uri_reference(
    ...     'https://github.com/'
    ... ))
    >>> validator.validate(uri_reference(
    ...     'http://example.com'
    ... ))
    Traceback (most recent call last):
    ...
    rfc3986.exceptions.UnpermittedComponentError

First notice that we can easily reuse our validator object for each URL.
This allows users to not have to constantly reconstruct Validators for each
bit of user input. Next, we have three different URLs that we validate:

#. ``https://github.com/sigmavirus24/rfc3986``
#. ``https://github.com/``
#. ``http://example.com``

As it stands, our validator will allow the first two URLs to pass but will
fail the third. This is specifically because we only allow URLs using
``https`` as a scheme and ``github.com`` as the domain name.

Preventing Leaks of User Credentials
------------------------------------

Next, let's imagine that we want to prevent leaking user credentials. In that
case, we want to ensure that there is no password in the user information
portion of the authority. In that case, our new validator would look like this:

.. doctest::

    >>> from rfc3986 import validators, uri_reference
    >>> user_url = 'https://github.com/sigmavirus24/rfc3986'
    >>> validator = validators.Validator().allow_schemes(
    ...     'https',
    ... ).allow_hosts(
    ...     'github.com',
    ... ).forbid_use_of_password()
    >>> validator.validate(uri_reference(
    ...     'https://github.com/sigmavirus24/rfc3986'
    ... ))
    >>> validator.validate(uri_reference(
    ...     'https://github.com/'
    ... ))
    >>> validator.validate(uri_reference(
    ...     'http://example.com'
    ... ))
    Traceback (most recent call last):
    ...
    rfc3986.exceptions.UnpermittedComponentError
    >>> validator.validate(uri_reference(
    ...     'https://sigmavirus24@github.com'
    ... ))
    >>> validator.validate(uri_reference(
    ...     'https://sigmavirus24:not-my-real-password@github.com'
    ... ))
    Traceback (most recent call last):
    ...
    rfc3986.exceptions.PasswordForbidden

Requiring the Presence of Components
------------------------------------

Up until now, we have assumed that we will get a URL that has the appropriate
components for validation. For example, we assume that we will have a URL that
has a scheme and hostname. However, our current validation doesn't require
those items exist.

.. doctest::

    >>> from rfc3986 import validators, uri_reference
    >>> user_url = 'https://github.com/sigmavirus24/rfc3986'
    >>> validator = validators.Validator().allow_schemes(
    ...     'https',
    ... ).allow_hosts(
    ...     'github.com',
    ... ).forbid_use_of_password()
    >>> validator.validate(uri_reference('//github.com'))
    >>> validator.validate(uri_reference('https:/'))

In the first case, we have a host name but no scheme and in the second we have
a scheme and a path but no host. If we want to ensure that those components
are there and that they are *always* what we allow, then we must add one last
item to our validator:

.. doctest::

    >>> from rfc3986 import validators, uri_reference
    >>> user_url = 'https://github.com/sigmavirus24/rfc3986'
    >>> validator = validators.Validator().allow_schemes(
    ...     'https',
    ... ).allow_hosts(
    ...     'github.com',
    ... ).forbid_use_of_password(
    ... ).require_presence_of(
    ...     'scheme', 'host',
    ... )
    >>> validator.validate(uri_reference('//github.com'))
    Traceback (most recent call last):
    ...
    rfc3986.exceptions.MissingComponentError
    >>> validator.validate(uri_reference('https:/'))
    Traceback (most recent call last):
    ...
    rfc3986.exceptions.MissingComponentError
    >>> validator.validate(uri_reference('https://github.com'))
    >>> validator.validate(uri_reference(
    ...     'https://github.com/sigmavirus24/rfc3986'
    ... ))


Checking the Validity of Components
-----------------------------------

As of version 1.1.0, |rfc3986| allows users to check the validity of a URI
Reference using a :class:`~rfc3986.validators.Validator`. Along with the above 
examples we can also check that a URI is valid per :rfc:`3986`. The validation
of the components is pre-determined so all we need to do is specify which 
components we want to validate:

.. doctest::

    >>> from rfc3986 import validators, uri_reference
    >>> valid_uri = uri_reference('https://github.com/')
    >>> validator = validators.Validator().allow_schemes(
    ...     'https',
    ... ).allow_hosts(
    ...     'github.com',
    ... ).forbid_use_of_password(
    ... ).require_presence_of(
    ...     'scheme', 'host',
    ... ).check_validity_of(
    ...     'scheme', 'host', 'path',
    ... )
    >>> validator.validate(valid_uri)
    >>> invalid_uri = valid_uri.copy_with(path='/#invalid/path')
    >>> validator.validate(invalid_uri)
    Traceback (most recent call last):
    ...
    rfc3986.exceptions.InvalidComponentsError

Paths are not allowed to contain a ``#`` character unless it's
percent-encoded. This is why our ``invalid_uri`` raises an exception when we
attempt to validate it.


.. links
.. _validating an email address:
    http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/