1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
|
``urlutils`` - Structured URL
=============================
.. automodule:: boltons.urlutils
.. versionadded:: 17.2
The URL type
------------
.. autoclass:: boltons.urlutils.URL
.. attribute:: URL.scheme
The scheme is an ASCII string, normally lowercase, which
specifies the semantics for the rest of the URL, as well as
network protocol in many cases. For example, "http" in
"http://hatnote.com".
.. attribute:: URL.username
The username is a string used by some schemes for
authentication. For example, "public" in
"ftp://public@example.com".
.. attribute:: URL.password
The password is a string also used for
authentication. Technically deprecated by `RFC 3986 Section
7.5`_, they're still used in cases when the URL is private or
the password is public. For example "password" in
"db://private:password@127.0.0.1".
.. _RFC 3986 Section 7.5: https://tools.ietf.org/html/rfc3986#section-7.5
.. attribute:: URL.host
The host is a string used to resolve the network location of the
resource, either empty, a domain, or IP address (v4 or
v6). "example.com", "127.0.0.1", and "::1" are all good examples
of host strings.
Per spec, fully-encoded output from :attr:`~URL.to_text()` is
`IDNA encoded`_ for compatibility with DNS.
.. _IDNA encoded: https://en.wikipedia.org/wiki/Internationalized_domain_name#Example_of_IDNA_encoding
.. attribute:: URL.port
The port is an integer used, along with :attr:`host`, in
connecting to network locations. ``8080`` is the port in
"http://localhost:8080/index.html".
.. note::
As is the case for 80 for HTTP and 22 for SSH, many schemes
have default ports, and `Section 3.2.3 of RFC 3986`_ states
that when a URL's port is the same as its scheme's default
port, the port should not be emitted::
>>> URL(u'https://github.com:443/mahmoud/boltons').to_text()
u'https://github.com/mahmoud/boltons'
Custom schemes can register their port with
:func:`~boltons.urlutils.register_scheme`. See
:attr:`URL.default_port` for more info.
.. _Section 3.2.3 of RFC 3986: https://tools.ietf.org/html/rfc3986#section-3.2.3
.. attribute:: URL.path
The string starting with the first leading slash after the
authority part of the URL, ending with the first question
mark. Often percent-quoted for network use. "/a/b/c" is the path
of "http://example.com/a/b/c?d=e".
.. attribute:: URL.path_parts
The :class:`tuple` form of :attr:`~URL.path`, split on
slashes. Empty slash segments are preserved, including that of
the leading slash::
>>> url = URL(u'http://example.com/a/b/c')
>>> url.path_parts
(u'', u'a', u'b', u'c')
.. attribute:: URL.query_params
An instance of :class:`~boltons.urlutils.QueryParamDict`, an
:class:`~boltons.dictutils.OrderedMultiDict` subtype, mapping
textual keys and values which follow the first question mark
after the :attr:`path`. Also available as the handy alias
``qp``::
>>> url = URL('http://boltons.readthedocs.io/en/latest/?utm_source=docs&sphinx=ok')
>>> url.qp.keys()
[u'utm_source', u'sphinx']
Also percent-encoded for network use cases.
.. attribute:: URL.fragment
The string following the first '#' after the
:attr:`query_params` until the end of the URL. It has no
inherent internal structure, and is percent-quoted.
.. automethod:: URL.from_parts
.. automethod:: URL.to_text
.. autoattribute:: URL.default_port
.. autoattribute:: URL.uses_netloc
.. automethod:: URL.get_authority
.. automethod:: URL.normalize
.. automethod:: URL.navigate
Related functions
~~~~~~~~~~~~~~~~~
.. autofunction:: boltons.urlutils.find_all_links
.. autofunction:: boltons.urlutils.register_scheme
Low-level functions
-------------------
A slew of functions used internally by :class:`~boltons.urlutils.URL`.
.. autofunction:: boltons.urlutils.parse_url
.. autofunction:: boltons.urlutils.parse_host
.. autofunction:: boltons.urlutils.parse_qsl
.. autofunction:: boltons.urlutils.resolve_path_parts
.. autoclass:: boltons.urlutils.QueryParamDict
:members:
Quoting
~~~~~~~
URLs have many parts, and almost as many individual "quoting"
(encoding) strategies.
.. autofunction:: boltons.urlutils.quote_userinfo_part
.. autofunction:: boltons.urlutils.quote_path_part
.. autofunction:: boltons.urlutils.quote_query_part
.. autofunction:: boltons.urlutils.quote_fragment_part
There is however, only one unquoting strategy:
.. autofunction:: boltons.urlutils.unquote
Useful constants
----------------
.. attribute:: boltons.urlutils.SCHEME_PORT_MAP
A mapping of URL schemes to their protocols' default
ports. Painstakingly assembled from the `IANA scheme registry`_,
`port registry`_, and independent research.
Keys are lowercase strings, values are integers or None, with None
indicating that the scheme does not have a default port (or may not
support ports at all)::
>>> boltons.urlutils.SCHEME_PORT_MAP['http']
80
>>> boltons.urlutils.SCHEME_PORT_MAP['file']
None
See :attr:`URL.port` for more info on how it is used. See
:attr:`~boltons.urlutils.NO_NETLOC_SCHEMES` for more scheme info.
Also `available in JSON`_.
.. _IANA scheme registry: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
.. _port registry: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml
.. _available in JSON: https://gist.github.com/mahmoud/2fe281a8daaff26cfe9c15d2c5bf5c8b
.. attribute:: boltons.urlutils.NO_NETLOC_SCHEMES
This is a :class:`set` of schemes explicitly do not support network
resolution, such as "mailto" and "urn".
|