1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
|
# Extending `smart_open`
This document targets potential contributors to `smart_open`.
Currently, there are two main directions for extending existing `smart_open` functionality:
1. Add a new transport mechanism
2. Add a new compression format
The first is by far the more challenging, and also the more welcome.
## New transport mechanisms
Each transport mechanism lives in its own submodule.
For example, currently we have:
- [smart_open.local_file](smart_open/local_file.py)
- [smart_open.s3](smart_open/s3.py)
- [smart_open.ssh](smart_open/ssh.py)
- ... and others
So, to implement a new transport mechanism, you need to create a new module.
Your module must expose the following (see [smart_open.http](smart_open/http.py) for the full implementation):
```python
SCHEMA = ...
"""The name of the mechanism, e.g. s3, ssh, etc.
This is the part that goes before the `://` in a URL, e.g. `s3://`."""
URI_EXAMPLES = ('xxx://foo/bar', 'zzz://baz/boz')
"""This will appear in the documentation of the the `parse_uri` function."""
MISSING_DEPS = False
"""Wrap transport-specific imports in a try/catch and set this to True if
any imports are not found. Seting MISSING_DEPS to True will cause the library
to suggest installing its dependencies with an example pip command.
If your transport has no external dependencies, you can omit this variable.
"""
def parse_uri(uri_as_str):
"""Parse the specified URI into a dict.
At a bare minimum, the dict must have `schema` member.
"""
return dict(schema=XXX_SCHEMA, ...)
def open_uri(uri_as_str, mode, transport_params):
"""Return a file-like object pointing to the URI.
Parameters:
uri_as_str: str
The URI to open
mode: str
Either "rb" or "wb". You don't need to implement text modes,
`smart_open` does that for you, outside of the transport layer.
transport_params: dict
Any additional parameters to pass to the `open` function (see below).
"""
#
# Parse the URI using parse_uri
# Consolidate the parsed URI with transport_params, if needed
# Pass everything to the open function (see below).
#
...
def open(..., mode, param1=None, param2=None, paramN=None):
"""This function does the hard work.
The keyword parameters are the transport_params from the `open_uri`
function.
"""
...
```
Have a look at the existing mechanisms to see how they work.
You may define other functions and classes as necessary for your implementation.
Once your module is working, register it in the [smart_open.transport](smart_open/transport.py) submodule.
The `register_transport()` function updates a mapping from schemes to the modules that implement functionality for them.
Once you've registered your new transport module, the following will happen automagically:
1. `smart_open` will be able to open any URI supported by your module
2. The docstring for the `smart_open.open` function will contain a section
detailing the parameters for your transport module.
3. The docstring for the `parse_uri` function will include the schemas and
examples supported by your module.
You can confirm the documentation changes by running:
python -c 'help("smart_open")'
and verify that documentation for your new submodule shows up.
### What's the difference between the `open_uri` and `open` functions?
There are several key differences between the two.
First, the parameters to `open_uri` are the same for _all transports_.
On the other hand, the parameters to the `open` function can differ from transport to transport.
Second, the responsibilities of the two functions are also different.
The `open` function opens the remote object.
The `open_uri` function deals with parsing transport-specific details out of the URI, and then delegates to `open`.
The `open` function contains documentation for transport parameters.
This documentation gets parsed by the `doctools` module and appears in various docstrings.
Some of these differences are by design; others as a consequence of evolution.
## New compression mechanisms
The compression layer is self-contained in the `smart_open.compression` submodule.
To add support for a new compressor:
- Create a new function to handle your compression format (given an extension)
- Add your compressor to the registry
For example:
```python
def _handle_xz(file_obj, mode):
import lzma
return lzma.LZMAFile(filename=file_obj, mode=mode)
register_compressor('.xz', _handle_xz)
```
There are many compression formats out there, and supporting all of them is beyond the scope of `smart_open`.
We want our code's functionality to cover the bare minimum required to satisfy 80% of our users.
We leave the remaining 20% of users with the ability to deal with compression in their own code, using the trivial mechanism described above.
Documentation
-------------
Once you've contributed your extension, please add it to the documentation so that it is discoverable for other users.
Some notable files:
- setup.py: See the `description` keyword. Not all contributions will affect this.
- README.rst
- howto.md (if your extension solves a specific problem that doesn't get covered by other documentation)
|