File: api.rst

package info (click to toggle)
dask 2024.12.1%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 20,024 kB
  • sloc: python: 105,182; javascript: 1,917; makefile: 159; sh: 88
file content (116 lines) | stat: -rw-r--r-- 3,109 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
API Reference
=============

Dask APIs generally follow from upstream APIs:

-  :doc:`Arrays<array-api>` follows NumPy
-  :doc:`DataFrames <dataframe-api>` follows Pandas
-  :doc:`Bag <bag-api>` follows map/filter/groupby/reduce common in Spark and Python iterators
-  :doc:`Delayed <delayed-api>` wraps general Python code
-  :doc:`Futures <futures>` follows `concurrent.futures <https://docs.python.org/3/library/concurrent.futures.html>`_ from the standard library for real-time computation.

.. toctree::
   :maxdepth: 1
   :hidden:

   Array <array-api.rst>
   DataFrame <dataframe-api.rst>
   Bag <bag-api.rst>
   Delayed <delayed-api.rst>
   Futures <futures>


Additionally, Dask has its own functions to start computations, persist data in
memory, check progress, and so forth that complement the APIs above.
These more general Dask functions are described below:

.. currentmodule:: dask

.. autosummary::
   compute
   is_dask_collection
   optimize
   persist
   visualize

These functions work with any scheduler.  More advanced operations are
available when using the newer scheduler and starting a
:obj:`dask.distributed.Client` (which, despite its name, runs nicely on a
single machine).  This API provides the ability to submit, cancel, and track
work asynchronously, and includes many functions for complex inter-task
workflows.  These are not necessary for normal operation, but can be useful for
real-time or advanced operation.

This more advanced API is available in the `Dask distributed documentation
<https://distributed.dask.org/en/latest/api.html>`_

.. autofunction:: annotate
.. autofunction:: get_annotations
.. autofunction:: compute
.. autofunction:: is_dask_collection
.. autofunction:: optimize
.. autofunction:: persist
.. autofunction:: visualize

Datasets
--------

Dask has a few helpers for generating demo datasets

.. currentmodule:: dask.datasets

.. autofunction:: make_people
.. autofunction:: timeseries

Datasets with defined specs
---------------------------

The following helpers are still experimental:

.. currentmodule:: dask.dataframe.io.demo

.. autofunction:: with_spec

The ``ColumnSpec`` class
************************
.. autoclass:: dask.dataframe.io.demo.ColumnSpec
    :members:
    :undoc-members:
    :show-inheritance:

The ``RangeIndexSpec`` class
****************************
.. autoclass:: dask.dataframe.io.demo.RangeIndexSpec
    :members:
    :undoc-members:
    :show-inheritance:

The ``DatetimeIndexSpec`` class
*******************************
.. autoclass:: dask.dataframe.io.demo.DatetimeIndexSpec
    :members:
    :undoc-members:
    :show-inheritance:

The ``DatasetSpec`` class
*************************
.. autoclass:: dask.dataframe.io.demo.DatasetSpec
    :members:
    :undoc-members:
    :show-inheritance:

.. _api.utilities:

Utilities
---------

Dask has some public utility methods. These are primarily used for parsing
configuration values.

.. currentmodule:: dask.utils

.. autofunction:: apply
.. autofunction:: format_bytes
.. autofunction:: format_time
.. autofunction:: parse_bytes
.. autofunction:: parse_timedelta