1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
|
API Reference
=============
Dask APIs generally follow from upstream APIs:
- :doc:`Arrays<array-api>` follows NumPy
- :doc:`DataFrames <dataframe-api>` follows Pandas
- :doc:`Bag <bag-api>` follows map/filter/groupby/reduce common in Spark and Python iterators
- :doc:`Delayed <delayed-api>` wraps general Python code
- :doc:`Futures <futures>` follows `concurrent.futures <https://docs.python.org/3/library/concurrent.futures.html>`_ from the standard library for real-time computation.
.. toctree::
:maxdepth: 1
:hidden:
Array <array-api.rst>
DataFrame <dataframe-api.rst>
Bag <bag-api.rst>
Delayed <delayed-api.rst>
Futures <futures>
Additionally, Dask has its own functions to start computations, persist data in
memory, check progress, and so forth that complement the APIs above.
These more general Dask functions are described below:
.. currentmodule:: dask
.. autosummary::
compute
is_dask_collection
optimize
persist
visualize
These functions work with any scheduler. More advanced operations are
available when using the newer scheduler and starting a
:obj:`dask.distributed.Client` (which, despite its name, runs nicely on a
single machine). This API provides the ability to submit, cancel, and track
work asynchronously, and includes many functions for complex inter-task
workflows. These are not necessary for normal operation, but can be useful for
real-time or advanced operation.
This more advanced API is available in the `Dask distributed documentation
<https://distributed.dask.org/en/latest/api.html>`_
.. autofunction:: annotate
.. autofunction:: get_annotations
.. autofunction:: compute
.. autofunction:: is_dask_collection
.. autofunction:: optimize
.. autofunction:: persist
.. autofunction:: visualize
Datasets
--------
Dask has a few helpers for generating demo datasets
.. currentmodule:: dask.datasets
.. autofunction:: make_people
.. autofunction:: timeseries
Datasets with defined specs
---------------------------
The following helpers are still experimental:
.. currentmodule:: dask.dataframe.io.demo
.. autofunction:: with_spec
The ``ColumnSpec`` class
************************
.. autoclass:: dask.dataframe.io.demo.ColumnSpec
:members:
:undoc-members:
:show-inheritance:
The ``RangeIndexSpec`` class
****************************
.. autoclass:: dask.dataframe.io.demo.RangeIndexSpec
:members:
:undoc-members:
:show-inheritance:
The ``DatetimeIndexSpec`` class
*******************************
.. autoclass:: dask.dataframe.io.demo.DatetimeIndexSpec
:members:
:undoc-members:
:show-inheritance:
The ``DatasetSpec`` class
*************************
.. autoclass:: dask.dataframe.io.demo.DatasetSpec
:members:
:undoc-members:
:show-inheritance:
.. _api.utilities:
Utilities
---------
Dask has some public utility methods. These are primarily used for parsing
configuration values.
.. currentmodule:: dask.utils
.. autofunction:: apply
.. autofunction:: format_bytes
.. autofunction:: format_time
.. autofunction:: parse_bytes
.. autofunction:: parse_timedelta
|