1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
|
***************************************************************
:mod:`uritools` --- URI parsing, classification and composition
***************************************************************
.. module:: uritools
This module provides RFC 3986 compliant functions for parsing,
classifying and composing URIs and URI references, largely replacing
the Python Standard Library's :mod:`urllib.parse` module.
.. doctest::
>>> from uritools import uricompose, urijoin, urisplit, uriunsplit
>>> uricompose(scheme='foo', host='example.com', port=8042,
... path='/over/there', query={'name': 'ferret'},
... fragment='nose')
'foo://example.com:8042/over/there?name=ferret#nose'
>>> parts = urisplit(_)
>>> parts.scheme
'foo'
>>> parts.authority
'example.com:8042'
>>> parts.getport(default=80)
8042
>>> parts.getquerydict().get('name')
['ferret']
>>> parts.isuri()
True
>>> parts.isabsuri()
False
>>> urijoin(uriunsplit(parts), '/right/here?name=swallow#beak')
'foo://example.com:8042/right/here?name=swallow#beak'
For various reasons, :mod:`urllib.parse` and its Python 2 predecessor
:mod:`urlparse` are not compliant with current Internet standards. As
stated in `Lib/urllib/parse.py
<https://github.com/python/cpython/blob/3.8/Lib/urllib/parse.py>`_:
RFC 3986 is considered the current standard and any future changes
to urlparse module should conform with it. The urlparse module is
currently not entirely compliant with this RFC due to defacto
scenarios for parsing, and for backward compatibility purposes,
some parsing quirks from older RFCs are retained.
This module aims to provide fully RFC 3986 compliant replacements for
the most commonly used functions found in :mod:`urllib.parse`. It
also includes functions for distinguishing between the different forms
of URIs and URI references, and for conveniently creating URIs from
their individual components.
.. seealso::
:rfc:`3986` - Uniform Resource Identifier (URI): Generic Syntax
The current Internet standard (STD66) defining URI syntax, to
which any changes to :mod:`uritools` should conform. If
deviations are observed, the module's implementation should be
changed, even if this means breaking backward compatibility.
URI Classification
==================
According to RFC 3986, a URI reference is either a URI or a *relative
reference*. If the URI reference's prefix does not match the syntax
of a scheme followed by its colon separator, then the URI reference is
a relative reference.
A relative reference that begins with two slash characters is termed a
*network-path* reference. A relative reference that begins with a
single slash character is termed an *absolute-path* reference. A
relative reference that does not begin with a slash character is
termed a *relative-path* reference.
When a URI reference refers to a URI that is, aside from its fragment
component, identical to the base URI, that reference is called a
*same-document* reference. Examples of same-document references are
relative references that are empty or include only the number sign
("#") separator followed by a fragment identifier.
A URI without a fragment identifier is termed an *absolute URI*. A
base URI, for example, must be an absolute URI. If the base URI is
obtained from a URI reference, then that reference must be stripped of
any fragment component prior to its use as a base URI.
.. autofunction:: isuri
.. autofunction:: isabsuri
.. autofunction:: isnetpath
.. autofunction:: isabspath
.. autofunction:: isrelpath
.. autofunction:: issamedoc
URI Composition
===============
.. autofunction:: uricompose
All components may be specified as either Unicode strings, which
will be encoded according to `encoding`, or :class:`bytes` objects.
`authority` may also be passed a three-item iterable specifying
userinfo, host and port subcomponents. If both `authority` and any
of the `userinfo`, `host` or `port` keyword arguments are given,
the keyword argument will override the corresponding `authority`
subcomponent.
`query` may also be passed a mapping object or a sequence of
two-element tuples, which will be converted to a string of
`name=value` pairs separated by `querysep`.
The returned URI reference is of type :class:`str`.
.. autofunction:: urijoin
If `strict` is :const:`False`, a scheme in the reference is
ignored if it is identical to the base URI's scheme.
.. autofunction:: uriunsplit
URI Decomposition
=================
.. autofunction:: uridefrag
The return value is an instance of a subclass of
:class:`collections.namedtuple` with the following read-only
attributes:
+-------------------+-------+---------------------------------------------+
| Attribute | Index | Value |
+===================+=======+=============================================+
| :attr:`uri` | 0 | Absolute URI, or relative reference without |
| | | a fragment identifier |
+-------------------+-------+---------------------------------------------+
| :attr:`fragment` | 1 | Fragment identifier, or :const:`None` if no |
| | | fragment was present |
+-------------------+-------+---------------------------------------------+
.. autofunction:: urisplit
The return value is an instance of a subclass of
:class:`collections.namedtuple` with the following read-only
attributes:
+-------------------+-------+---------------------------------------------+
| Attribute | Index | Value |
+===================+=======+=============================================+
| :attr:`scheme` | 0 | URI scheme, or :const:`None` if not present |
+-------------------+-------+---------------------------------------------+
| :attr:`authority` | 1 | Authority component, |
| | | or :const:`None` if not present |
+-------------------+-------+---------------------------------------------+
| :attr:`path` | 2 | Path component, always present but may be |
| | | empty |
+-------------------+-------+---------------------------------------------+
| :attr:`query` | 3 | Query component, |
| | | or :const:`None` if not present |
+-------------------+-------+---------------------------------------------+
| :attr:`fragment` | 4 | Fragment identifier, |
| | | or :const:`None` if not present |
+-------------------+-------+---------------------------------------------+
| :attr:`userinfo` | | Userinfo subcomponent of `authority`, |
| | | or :const:`None` if not present |
+-------------------+-------+---------------------------------------------+
| :attr:`host` | | Host subcomponent of `authority`, |
| | | or :const:`None` if not present |
+-------------------+-------+---------------------------------------------+
| :attr:`port` | | Port subcomponent of `authority` as a |
| | | (possibly empty) string, |
| | | or :const:`None` if not present |
+-------------------+-------+---------------------------------------------+
URI Encoding
============
.. autofunction:: uridecode
If `encoding` is set to :const:`None`, return the percent-decoded
`uristring` as a :class:`bytes` object. Otherwise, replace any
percent-encodings and decode `uristring` using the codec registered
for `encoding`, returning a Unicode string.
.. autofunction:: uriencode
If `uristring` is a :class:`bytes` object, replace any characters
not in :const:`UNRESERVED` or `safe` with their corresponding
percent-encodings and return the result as a :class:`bytes` object.
Otherwise, encode `uristring` using the codec registered for
`encoding` before replacing any percent encodings.
Structured Parse Results
========================
The result objects from the :func:`uridefrag` and :func:`urisplit`
functions are instances of subclasses of
:class:`collections.namedtuple`. These objects contain the attributes
described in the function documentation, as well as some additional
convenience methods.
.. autoclass:: DefragResult
:members:
.. autoclass:: SplitResult
:members:
Character Constants
===================
.. data:: GEN_DELIMS
A string containing all general delimiting characters specified in
RFC 3986.
.. data:: RESERVED
A string containing all reserved characters specified in RFC 3986.
.. data:: SUB_DELIMS
A string containing all subcomponent delimiting characters
specified in RFC 3986.
.. data:: UNRESERVED
A string containing all unreserved characters specified in
RFC 3986.
|