File: README.rst

package info (click to toggle)
azure-data-lake-store-python 1.0.1-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 31,952 kB
  • sloc: python: 4,332; makefile: 192
file content (234 lines) | stat: -rw-r--r-- 8,348 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
Microsoft Azure Data Lake Store Filesystem Library for Python
=============================================================

.. image:: https://travis-ci.org/Azure/azure-data-lake-store-python.svg?branch=dev
    :target: https://travis-ci.org/Azure/azure-data-lake-store-python
.. image:: https://coveralls.io/repos/github/Azure/azure-data-lake-store-python/badge.svg?branch=master
    :target: https://coveralls.io/github/Azure/azure-data-lake-store-python?branch=master

This project is the Python filesystem library for Azure Data Lake Store.

INSTALLATION
============

To install from source instead of pip (for local testing and development):

.. code-block:: bash

    > pip install -r dev_requirements.txt
    > python setup.py develop

Usage: Sample Code
==================

To play with the code, here is a starting point:

.. code-block:: python

    from azure.datalake.store import core, lib, multithread
    token = lib.auth(tenant_id, username, password)
    adl = core.AzureDLFileSystem(token, store_name=store_name)

    # typical operations
    adl.ls('')
    adl.ls('tmp/', detail=True)
    adl.ls('tmp/', detail=True, invalidate_cache=True)
    adl.cat('littlefile')
    adl.head('gdelt20150827.csv')

    # file-like object
    with adl.open('gdelt20150827.csv', blocksize=2**20) as f:
        print(f.readline())
        print(f.readline())
        print(f.readline())
        # could have passed f to any function requiring a file object:
        # pandas.read_csv(f)

    with adl.open('anewfile', 'wb') as f:
        # data is written on flush/close, or when buffer is bigger than
        # blocksize
        f.write(b'important data')

    adl.du('anewfile')

    # recursively download the whole directory tree with 5 threads and
    # 16MB chunks
    multithread.ADLDownloader(adl, "", 'my_temp_dir', 5, 2**24)

Progress can be tracked using a callback function in the form `track(current, total)`
When passed, this will keep track of transferred bytes and be called on each complete chunk.

Here's an example using the Azure CLI progress controller as the `progress_callback`:

.. code-block:: python

    from cli.core.application import APPLICATION

    def _update_progress(current, total):
        hook = APPLICATION.get_progress_controller(det=True)
        hook.add(message='Alive', value=current, total_val=total)
        if total == current:
            hook.end()

    ...
    ADLUploader(client, destination_path, source_path, thread_count, overwrite=overwrite,
            chunksize=chunk_size,
            buffersize=buffer_size,
            blocksize=block_size,
            progress_callback=_update_progress)

This will output a progress bar to the stdout:

.. code-block:: bash

    Alive[#########################                                       ]  40.0881%
    
    Finished[#############################################################]  100.0000%

Usage: Command Line Sample
==========================

To interact with the API at a higher-level, you can use the provided
command-line interface in "samples/cli.py". You will need to set
the appropriate environment variables 

* :code:`azure_username`

* :code:`azure_password`

* :code:`azure_data_lake_store_name`

* :code:`azure_subscription_id`

* :code:`azure_resource_group_name`

* :code:`azure_service_principal`

* :code:`azure_service_principal_secret`

to connect to the Azure Data Lake Store. Optionally, you may need to define :code:`azure_tenant_id` or :code:`azure_data_lake_store_url_suffix`.

Below is a simple sample, with more details beyond.


.. code-block:: bash

    python samples\cli.py ls -l

Execute the program without arguments to access documentation.

To start the CLI in interactive mode, run "python samples/cli.py"
and then type "help" to see all available commands (similiar to Unix utilities):

.. code-block:: bash

    > python samples/cli.py
    azure> help

    Documented commands (type help <topic>):
    ========================================
    cat    chmod  close  du      get   help  ls     mv   quit  rmdir  touch
    chgrp  chown  df     exists  head  info  mkdir  put  rm    tail

    azure>


While still in interactive mode, you can run "ls -l" to list the entries in the
home directory ("help ls" will show the command's usage details). If you're not
familiar with the Unix/Linux "ls" command, the columns represent 1) permissions,
2) file owner, 3) file group, 4) file size, 5-7) file's modification time, and
8) file name.

.. code-block:: bash

    > python samples/cli.py
    azure> ls -l
    drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
    azure> ls -l --human-readable
    drwxrwx--- 0123abcd 0123abcd   0B Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1M Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd  36B Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd   0B Aug 03 13:46 tmp
    azure>


To download a remote file, run "get remote-file [local-file]". The second
argument, "local-file", is optional. If not provided, the local file will be
named after the remote file minus the directory path.

.. code-block:: bash

    > python samples/cli.py
    azure> ls -l
    drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
    azure> get xyz.csv
    2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
    2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
    2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
    2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
    azure>


It is also possible to run in command-line mode, allowing any available command
to be executed separately without remaining in the interpreter.

For example, listing the entries in the home directory:

.. code-block:: bash

    > python samples/cli.py ls -l
    drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
    -rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
    -r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
    drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
    >


Also, downloading a remote file:

.. code-block:: bash

    > python samples/cli.py get xyz.csv
    2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
    2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
    2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
    2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
    >

Tests
=====

For detailed documentation about our test framework, please visit the 
`tests folder <https://github.com/Azure/azure-data-lake-store-python/tree/master/tests>`__.

Need Help?
==========

Be sure to check out the Microsoft Azure `Developer Forums on Stack Overflow <http://go.microsoft.com/fwlink/?LinkId=234489>`__
if you have trouble with the provided code. Most questions are tagged `azure and python <https://stackoverflow.com/questions/tagged/azure+python>`__.


Contribute Code or Provide Feedback
===================================

If you would like to become an active contributor to this project please
follow the instructions provided in `Microsoft Azure Projects Contribution Guidelines <http://azure.github.io/guidelines/>`__. 
Furthermore, check out `GUIDANCE.md <https://github.com/Azure/azure-data-lake-store-python/blob/master/GUIDANCE.md>`__ 
for specific information related to this project.

If you encounter any bugs with the library please file an issue in the
`Issues <https://github.com/Azure/azure-data-lake-store-python/issues>`__
section of the project.


Code of Conduct
===============
This project has adopted the `Microsoft Open Source Code of Conduct <https://opensource.microsoft.com/codeofconduct/>`__. 
For more information see the `Code of Conduct FAQ <https://opensource.microsoft.com/codeofconduct/faq/>`__ or contact 
`opencode@microsoft.com <mailto:opencode@microsoft.com>`__ with any additional questions or comments.