File: help.txt

package info (click to toggle)
smart-open 7.1.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 976 kB
  • sloc: python: 7,737; sh: 118; makefile: 15
file content (334 lines) | stat: -rw-r--r-- 12,842 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
Help on package smart_open:

NAME
    smart_open

DESCRIPTION
    Utilities for streaming to/from several file-like data storages: S3 / HDFS / local
    filesystem / compressed files, and many more, using a simple, Pythonic API.
    
    The streaming makes heavy use of generators and pipes, to avoid loading
    full file contents into memory, allowing work with arbitrarily large files.
    
    The main functions are:
    
    * `open()`, which opens the given file for reading/writing
    * `parse_uri()`
    * `s3_iter_bucket()`, which goes over all keys in an S3 bucket in parallel
    * `register_compressor()`, which registers callbacks for transparent compressor handling

PACKAGE CONTENTS
    azure
    bytebuffer
    compression
    concurrency
    constants
    doctools
    gcs
    hdfs
    http
    local_file
    s3
    smart_open_lib
    ssh
    tests (package)
    transport
    utils
    version
    webhdfs

FUNCTIONS
    open(uri, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None, ignore_ext=False, transport_params=None)
        Open the URI object, returning a file-like object.
        
        The URI is usually a string in a variety of formats.
        For a full list of examples, see the :func:`parse_uri` function.
        
        The URI may also be one of:
        
        - an instance of the pathlib.Path class
        - a stream (anything that implements io.IOBase-like functionality)
        
        Parameters
        ----------
        uri: str or object
            The object to open.
        mode: str, optional
            Mimicks built-in open parameter of the same name.
        buffering: int, optional
            Mimicks built-in open parameter of the same name.
        encoding: str, optional
            Mimicks built-in open parameter of the same name.
        errors: str, optional
            Mimicks built-in open parameter of the same name.
        newline: str, optional
            Mimicks built-in open parameter of the same name.
        closefd: boolean, optional
            Mimicks built-in open parameter of the same name.  Ignored.
        opener: object, optional
            Mimicks built-in open parameter of the same name.  Ignored.
        ignore_ext: boolean, optional
            Disable transparent compression/decompression based on the file extension.
        transport_params: dict, optional
            Additional parameters for the transport layer (see notes below).
        
        Returns
        -------
        A file-like object.
        
        Notes
        -----
        smart_open has several implementations for its transport layer (e.g. S3, HTTP).
        Each transport layer has a different set of keyword arguments for overriding
        default behavior.  If you specify a keyword argument that is *not* supported
        by the transport layer being used, smart_open will ignore that argument and
        log a warning message.
        
        smart_open supports the following transport mechanisms:
        
        azure (smart_open/azure.py)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Implements file-like objects for reading and writing to/from Azure Blob Storage.
        
        buffer_size: int, optional
            The buffer size to use when performing I/O. For reading only.
        min_part_size: int, optional
            The minimum part size for multipart uploads.  For writing only.
        
        file (smart_open/local_file.py)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Implements the transport for the file:// schema.
        
        gs (smart_open/gcs.py)
        ~~~~~~~~~~~~~~~~~~~~~~
        Implements file-like objects for reading and writing to/from GCS.
        
        buffer_size: int, optional
            The buffer size to use when performing I/O. For reading only.
        min_part_size: int, optional
            The minimum part size for multipart uploads.  For writing only.
        client: google.cloud.storage.Client, optional
            The GCS client to use when working with google-cloud-storage.
        
        hdfs (smart_open/hdfs.py)
        ~~~~~~~~~~~~~~~~~~~~~~~~~
        Implements reading and writing to/from HDFS.
        
        http (smart_open/http.py)
        ~~~~~~~~~~~~~~~~~~~~~~~~~
        Implements file-like objects for reading from http.
        
        kerberos: boolean, optional
            If True, will attempt to use the local Kerberos credentials
        user: str, optional
            The username for authenticating over HTTP
        password: str, optional
            The password for authenticating over HTTP
        cert: str/tuple, optional
            If String, path to ssl client cert file (.pem). If Tuple, (‘cert’, ‘key’)
        headers: dict, optional
            Any headers to send in the request. If ``None``, the default headers are sent:
            ``{'Accept-Encoding': 'identity'}``. To use no headers at all,
            set this variable to an empty dict, ``{}``.
        
        s3 (smart_open/s3.py)
        ~~~~~~~~~~~~~~~~~~~~~
        Implements file-like objects for reading and writing from/to AWS S3.
        
        buffer_size: int, optional
            The buffer size to use when performing I/O.
        min_part_size: int, optional
            The minimum part size for multipart uploads.  For writing only.
        multipart_upload: bool, optional
            Default: `True`
            If set to `True`, will use multipart upload for writing to S3. If set
            to `False`, S3 upload will use the S3 Single-Part Upload API, which
            is more ideal for small file sizes.
            For writing only.
        version_id: str, optional
            Version of the object, used when reading object.
            If None, will fetch the most recent version.
        defer_seek: boolean, optional
            Default: `False`
            If set to `True` on a file opened for reading, GetObject will not be
            called until the first seek() or read().
            Avoids redundant API queries when seeking before reading.
        client: object, optional
            The S3 client to use when working with boto3.
            If you don't specify this, then smart_open will create a new client for you.
        client_kwargs: dict, optional
            Additional parameters to pass to the relevant functions of the client.
            The keys are fully qualified method names, e.g. `S3.Client.create_multipart_upload`.
            The values are kwargs to pass to that method each time it is called.
        writebuffer: IO[bytes], optional
            By default, this module will buffer data in memory using io.BytesIO
            when writing. Pass another binary IO instance here to use it instead.
            For example, you may pass a file object to buffer to local disk instead
            of in RAM. Use this to keep RAM usage low at the expense of additional
            disk IO. If you pass in an open file, then you are responsible for
            cleaning it up after writing completes.
        
        scp (smart_open/ssh.py)
        ~~~~~~~~~~~~~~~~~~~~~~~
        Implements I/O streams over SSH.
        
        mode: str, optional
            The mode to use for opening the file.
        host: str, optional
            The hostname of the remote machine.  May not be None.
        user: str, optional
            The username to use to login to the remote machine.
            If None, defaults to the name of the current user.
        password: str, optional
            The password to use to login to the remote machine.
        port: int, optional
            The port to connect to.
        transport_params: dict, optional
            Any additional settings to be passed to paramiko.SSHClient.connect
        
        webhdfs (smart_open/webhdfs.py)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Implements reading and writing to/from WebHDFS.
        
        min_part_size: int, optional
            For writing only.
        
        Examples
        --------
        
        >>> from smart_open import open
        >>>
        >>> # stream lines from an S3 object
        >>> for line in open('s3://commoncrawl/robots.txt'):
        ...    print(repr(line))
        ...    break
        'User-Agent: *\n'
        
        >>> # stream from/to compressed files, with transparent (de)compression:
        >>> for line in open('smart_open/tests/test_data/1984.txt.gz', encoding='utf-8'):
        ...    print(repr(line))
        'It was a bright cold day in April, and the clocks were striking thirteen.\n'
        'Winston Smith, his chin nuzzled into his breast in an effort to escape the vile\n'
        'wind, slipped quickly through the glass doors of Victory Mansions, though not\n'
        'quickly enough to prevent a swirl of gritty dust from entering along with him.\n'
        
        >>> # can use context managers too:
        >>> with open('smart_open/tests/test_data/1984.txt.gz') as fin:
        ...    with open('smart_open/tests/test_data/1984.txt.bz2', 'w') as fout:
        ...        for line in fin:
        ...           fout.write(line)
        
        >>> # can use any IOBase operations, like seek
        >>> with open('s3://commoncrawl/robots.txt', 'rb') as fin:
        ...     for line in fin:
        ...         print(repr(line.decode('utf-8')))
        ...         break
        ...     offset = fin.seek(0)  # seek to the beginning
        ...     print(fin.read(4))
        'User-Agent: *\n'
        b'User'
        
        >>> # stream from HTTP
        >>> for line in open('http://example.com/index.html'):
        ...     print(repr(line))
        ...     break
        
        This function also supports transparent compression and decompression 
        using the following codecs:
        
        * .bz2
        * .gz
        
        The function depends on the file extension to determine the appropriate codec.
        
        
        See Also
        --------
        - `Standard library reference <https://docs.python.org/3.7/library/functions.html#open>`__
        - `smart_open README.rst
          <https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst>`__
    
    parse_uri(uri_as_string)
        Parse the given URI from a string.
        
        Parameters
        ----------
        uri_as_string: str
            The URI to parse.
        
        Returns
        -------
        collections.namedtuple
            The parsed URI.
        
        Notes
        -----
        Supported URI schemes are:
        
        * azure
        * file
        * gs
        * hdfs
        * http
        * s3
        * scp
        * webhdfs
        
        Valid URI examples::
        
        * ./local/path/file
        * ~/local/path/file
        * local/path/file
        * ./local/path/file.gz
        * file:///home/user/file
        * file:///home/user/file.bz2
        * hdfs:///path/file
        * hdfs://path/file
        * s3://my_bucket/my_key
        * s3://my_key:my_secret@my_bucket/my_key
        * s3://my_key:my_secret@my_server:my_port@my_bucket/my_key
        * ssh://username@host/path/file
        * ssh://username@host//path/file
        * scp://username@host/path/file
        * sftp://username@host/path/file
        * webhdfs://host:port/path/file
    
    register_compressor(ext, callback)
        Register a callback for transparently decompressing files with a specific extension.
        
        Parameters
        ----------
        ext: str
            The extension.  Must include the leading period, e.g. ``.gz``.
        callback: callable
            The callback.  It must accept two position arguments, file_obj and mode.
            This function will be called when ``smart_open`` is opening a file with
            the specified extension.
        
        Examples
        --------
        
        Instruct smart_open to use the `lzma` module whenever opening a file
        with a .xz extension (see README.rst for the complete example showing I/O):
        
        >>> def _handle_xz(file_obj, mode):
        ...     import lzma
        ...     return lzma.LZMAFile(filename=file_obj, mode=mode, format=lzma.FORMAT_XZ)
        >>>
        >>> register_compressor('.xz', _handle_xz)
    
    s3_iter_bucket(bucket_name, prefix='', accept_key=None, key_limit=None, workers=16, retries=3, **session_kwargs)
        Deprecated.  Use smart_open.s3.iter_bucket instead.
    
    smart_open(uri, mode='rb', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None, ignore_extension=False, **kwargs)

DATA
    __all__ = ['open', 'parse_uri', 'register_compressor', 's3_iter_bucket...

VERSION
    4.1.2.dev0

FILE
    /Users/misha/git/smart_open/smart_open/__init__.py