1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315
|
Development Guidelines
======================
Dask is a community maintained project. We welcome contributions in the form
of bug reports, documentation, code, design proposals, and more.
This page provides resources on how best to contribute.
.. note:: Dask strives to be a welcoming community of individuals with diverse
backgrounds. For more information on our values, please see our
`code of conduct
<https://github.com/dask/governance/blob/master/code-of-conduct.md>`_
and
`diversity statement <https://github.com/dask/governance/blob/master/diversity.md>`_
Where to ask for help
---------------------
Dask conversation happens in the following places:
1. `Stack Overflow #dask tag`_: for usage questions
2. `GitHub Issue Tracker`_: for discussions around new features or established bugs
3. `Gitter chat`_: for real-time discussion
For usage questions and bug reports we strongly prefer the use of Stack Overflow
and GitHub issues over gitter chat. GitHub and Stack Overflow are more easily
searchable by future users and so is more efficient for everyone's time.
Gitter chat is generally reserved for community discussion.
.. _`Stack Overflow #dask tag`: https://stackoverflow.com/questions/tagged/dask
.. _`GitHub Issue Tracker`: https://github.com/dask/dask/issues/
.. _`Gitter chat`: https://gitter.im/dask/dask
Separate Code Repositories
--------------------------
Dask maintains code and documentation in a few git repositories hosted on the
GitHub ``dask`` organization, https://github.com/dask. This includes the primary
repository and several other repositories for different components. A
non-exhaustive list follows:
* https://github.com/dask/dask: The main code repository holding parallel
algorithms, the single-machine scheduler, and most documentation
* https://github.com/dask/distributed: The distributed memory scheduler
* https://github.com/dask/dask-ml: Machine learning algorithms
* https://github.com/dask/s3fs: S3 Filesystem interface
* https://github.com/dask/gcsfs: GCS Filesystem interface
* https://github.com/dask/hdfs3: Hadoop Filesystem interface
* ...
Git and GitHub can be challenging at first. Fortunately good materials exist
on the internet. Rather than repeat these materials here, we refer you to
Pandas' documentation and links on this subject at
https://pandas.pydata.org/pandas-docs/stable/contributing.html
Issues
------
The community discusses and tracks known bugs and potential features in the
`GitHub Issue Tracker`_. If you have a new idea or have identified a bug, then
you should raise it there to start public discussion.
If you are looking for an introductory issue to get started with development,
then check out the `"good first issue" label`_, which contains issues that are good
for starting developers. Generally, familiarity with Python, NumPy, Pandas, and
some parallel computing are assumed.
.. _`"good first issue" label`: https://github.com/dask/dask/labels/good%20first%20issue
Development Environment
-----------------------
Download code
~~~~~~~~~~~~~
Make a fork of the main `Dask repository <https://github.com/dask/dask>`_ and
clone the fork::
git clone https://github.com/<your-github-username>/dask
Contributions to Dask can then be made by submitting pull requests on GitHub.
Install
~~~~~~~
To build the library you can install the necessary requirements using
pip or conda_::
cd dask
.. _conda: https://conda.io/
``pip``::
python -m pip install -e ".[complete]"
``conda``::
conda env create -n dask-dev -f continuous_integration/environment-latest.yaml
conda activate dask-dev
python -m pip install --no-deps -e .
Run Tests
~~~~~~~~~
Dask uses py.test_ for testing. You can run tests from the main dask directory
as follows::
py.test dask --verbose --doctest-modules
.. _py.test: https://docs.pytest.org/en/latest/
Contributing to Code
--------------------
Dask maintains development standards that are similar to most PyData projects. These standards include
language support, testing, documentation, and style.
Python Versions
~~~~~~~~~~~~~~~
Dask supports Python versions 3.6, 3.7, and 3.8.
Name changes are handled by the :file:`dask/compatibility.py` file.
Test
~~~~
Dask employs extensive unit tests to ensure correctness of code both for today
and for the future. Test coverage is expected for all code contributions.
Tests are written in a py.test style with bare functions:
.. code-block:: python
def test_fibonacci():
assert fib(0) == 0
assert fib(1) == 0
assert fib(10) == 55
assert fib(8) == fib(7) + fib(6)
for x in [-3, 'cat', 1.5]:
with pytest.raises(ValueError):
fib(x)
These tests should compromise well between covering all branches and fail cases
and running quickly (slow test suites get run less often).
You can run tests locally by running ``py.test`` in the local dask directory::
py.test dask --verbose
You can also test certain modules or individual tests for faster response::
py.test dask/dataframe --verbose
py.test dask/dataframe/tests/test_dataframe.py::test_rename_index
Tests run automatically on the Travis.ci and Appveyor continuous testing
frameworks on every push to every pull request on GitHub.
Tests are organized within the various modules' subdirectories::
dask/array/tests/test_*.py
dask/bag/tests/test_*.py
dask/bytes/tests/test_*.py
dask/dataframe/tests/test_*.py
dask/diagnostics/tests/test_*.py
For the Dask collections like Dask Array and Dask DataFrame, behavior is
typically tested directly against the NumPy or Pandas libraries using the
``assert_eq`` functions:
.. code-block:: python
import numpy as np
import dask.array as da
from dask.array.utils import assert_eq
def test_aggregations():
nx = np.random.random(100)
dx = da.from_array(nx, chunks=(10,))
assert_eq(nx.sum(), dx.sum())
assert_eq(nx.min(), dx.min())
assert_eq(nx.max(), dx.max())
...
This technique helps to ensure compatibility with upstream libraries and tends
to be simpler than testing correctness directly. Additionally, by passing Dask
collections directly to the ``assert_eq`` function rather than call compute
manually, the testing suite is able to run a number of checks on the lazy
collections themselves.
Docstrings
~~~~~~~~~~
User facing functions should roughly follow the numpydoc_ standard, including
sections for ``Parameters``, ``Examples``, and general explanatory prose.
By default, examples will be doc-tested. Reproducible examples in documentation
is valuable both for testing and, more importantly, for communication of common
usage to the user. Documentation trumps testing in this case and clear
examples should take precedence over using the docstring as testing space.
To skip a test in the examples add the comment ``# doctest: +SKIP`` directly
after the line.
.. code-block:: python
def fib(i):
""" A single line with a brief explanation
A more thorough description of the function, consisting of multiple
lines or paragraphs.
Parameters
----------
i: int
A short description of the argument if not immediately clear
Examples
--------
>>> fib(4)
3
>>> fib(5)
5
>>> fib(6)
8
>>> fib(-1) # Robust to bad inputs
ValueError(...)
"""
.. _numpydoc: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard
Docstrings are currently tested under Python 3.6 on Travis.ci. You can test
docstrings with pytest as follows::
py.test dask --doctest-modules
Docstring testing requires ``graphviz`` to be installed. This can be done via::
conda install -y graphviz
Code Formatting
~~~~~~~~~~~~~~~
Dask uses `Black <https://black.readthedocs.io/en/stable/>`_ and
`Flake8 <http://flake8.pycqa.org/en/latest/>`_ to ensure a consistent code
format throughout the project. ``black`` and ``flake8`` can be installed with
``pip``::
python -m pip install black flake8
and then run from the root of the Dask repository::
black dask
flake8 dask
to auto-format your code. Additionally, many editors have plugins that will
apply ``black`` as you edit files.
Optionally, you may wish to setup `pre-commit hooks <https://pre-commit.com/>`_
to automatically run ``black`` and ``flake8`` when you make a git commit. This
can be done by installing ``pre-commit``::
python -m pip install pre-commit
and then running::
pre-commit install
from the root of the Dask repository. Now ``black`` and ``flake8`` will be run
each time you commit changes. You can skip these checks with
``git commit --no-verify``.
Contributing to Documentation
-----------------------------
Dask uses Sphinx_ for documentation, hosted on https://readthedocs.org .
Documentation is maintained in the RestructuredText markup language (``.rst``
files) in ``dask/docs/source``. The documentation consists both of prose
and API documentation.
To build the documentation locally, clone this repository and install
the necessary requirements using ``pip`` or ``conda``::
git clone https://github.com/dask/dask.git
cd dask/docs
``pip``::
python -m pip install -r requirements-docs.txt
``conda``::
conda create -n daskdocs -c conda-forge --file requirements-docs.txt
conda activate daskdocs
Then build the documentation with ``make``::
make html
The resulting HTML files end up in the ``build/html`` directory.
You can now make edits to rst files and run ``make html`` again to update
the affected pages.
.. _Sphinx: https://www.sphinx-doc.org/
|