1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
|
rfc3986
=======
A Python implementation of `RFC 3986`_ including validation and authority
parsing. Coming soon: `Reference Resolution <http://tools.ietf.org/html/rfc3986#section-5>`_.
Installation
------------
Simply use pip to install ``rfc3986`` like so::
pip install rfc3986
License
-------
`Apache License Version 2.0`_
Example Usage
-------------
To parse a URI into a convenient named tuple, you can simply::
from rfc3986 import uri_reference
example = uri_reference('http://example.com')
email = uri_reference('mailto:user@domain.com')
ssh = uri_reference('ssh://user@git.openstack.org:29418/openstack/keystone.git')
With a parsed URI you can access data about the components::
print(example.scheme) # => http
print(email.path) # => user@domain.com
print(ssh.userinfo) # => user
print(ssh.host) # => git.openstack.org
print(ssh.port) # => 29418
It can also parse URIs with unicode present::
uni = uri_reference(b'http://httpbin.org/get?utf8=\xe2\x98\x83')
print(uni.query) # utf8=%E2%98%83
With a parsed URI you can also validate it::
if ssh.is_valid():
subprocess.call(['git', 'clone', ssh.unsplit()])
You can also take a parsed URI and normalize it::
mangled = uri_reference('hTTp://exAMPLe.COM')
print(mangled.scheme) # => hTTp
print(mangled.authority) # => exAMPLe.COM
normal = mangled.normalize()
print(normal.scheme) # => http
print(mangled.authority) # => example.com
But these two URIs are (functionally) equivalent::
if normal == mangled:
webbrowser.open(normal.unsplit())
Your paths, queries, and fragments are safe with us though::
mangled = uri_reference('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth')
normal = mangled.normalize()
assert normal == 'hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth'
assert normal == 'http://example.com/Some/reallY/biZZare/pAth'
assert normal != 'http://example.com/some/really/bizzare/path'
If you do not actually need a real reference object and just want to normalize
your URI::
from rfc3986 import normalize_uri
assert (normalize_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') ==
'http://example.com/Some/reallY/biZZare/pAth')
You can also very simply validate a URI::
from rfc3986 import is_valid_uri
assert is_valid_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth')
Requiring Components
~~~~~~~~~~~~~~~~~~~~
You can validate that a particular string is a valid URI and require
independent components::
from rfc3986 import is_valid_uri
assert is_valid_uri('http://localhost:8774/v2/resource',
require_scheme=True,
require_authority=True,
require_path=True)
# Assert that a mailto URI is invalid if you require an authority
# component
assert is_valid_uri('mailto:user@example.com', require_authority=True) is False
If you have an instance of a ``URIReference``, you can pass the same arguments
to ``URIReference#is_valid``, e.g.,
.. code::
from rfc3986 import uri_reference
http = uri_reference('http://localhost:8774/v2/resource')
assert uri.is_valid(require_scheme=True,
require_authority=True,
require_path=True)
# Assert that a mailto URI is invalid if you require an authority
# component
mailto = uri_reference('mailto:user@example.com')
assert uri.is_valid(require_authority=True) is False
Alternatives
------------
- `rfc3987 <https://pypi.python.org/pypi/rfc3987/1.3.4>`_
This is a direct competitor to this library, with extra features,
licensed under the GPL.
- `uritools <https://pypi.python.org/pypi/uritools/0.5.1>`_
This can parse URIs in the manner of RFC 3986 but provides no validation and
only recently added Python 3 support.
- Standard library's `urlparse`/`urllib.parse`
The functions in these libraries can only split a URI (valid or not) and
provide no validation.
Contributing
------------
This project follows and enforces the Python Software Foundation's `Code of
Conduct <https://www.python.org/psf/codeofconduct/>`_.
If you would like to contribute but do not have a bug or feature in mind, feel
free to email Ian and find out how you can help.
.. _RFC 3986: http://tools.ietf.org/html/rfc3986
.. _Apache License Version 2.0: https://www.apache.org/licenses/LICENSE-2.0
|