1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677
|
# Contributing to DataLad
[gh-datalad]: http://github.com/datalad/datalad
## Files organization
- [datalad/](./datalad) is the main Python module where major development is happening,
with major submodules being:
- `cmdline/` - helpers for accessing `interface/` functionality from
command line
- `customremotes/` - custom special remotes for annex provided by datalad
- `downloaders/` - support for accessing data from various sources (e.g.
http, S3, XNAT) via a unified interface.
- `configs/` - specifications for known data providers and associated
credentials
- `interface/` - high level interface functions which get exposed via
command line (`cmdline/`) or Python (`datalad.api`).
- `tests/` - some unit- and regression- tests (more could be found under
`tests/` of corresponding submodules. See [Tests](#tests))
- [utils.py](./datalad/tests/utils.py) provides convenience helpers used by unit-tests such as
`@with_tree`, `@serve_path_via_http` and other decorators
- `ui/` - user-level interactions, such as messages about errors, warnings,
progress reports, AND when supported by available frontend --
interactive dialogs
- `support/` - various support modules, e.g. for git/git-annex interfaces,
constraints for the `interface/`, etc
- [benchmarks/](./benchmarks) - [asv] benchmarks suite (see [Benchmarking](#benchmarking))
- [docs/](./docs) - yet to be heavily populated documentation
- `bash-completions` - bash and zsh completion setup for datalad (just
`source` it)
- [fixtures/](./fixtures) currently not under git, contains generated by vcr fixtures
- [sandbox/](./sandbox) - various scripts and prototypes which are not part of
the main/distributed with releases codebase
- [tools/](./tools) contains helper utilities used during development, testing, and
benchmarking of DataLad. Implemented in any most appropriate language
(Python, bash, etc.)
Whenever a new top-level file or folder is added to the repository, it should
be listed in `MANIFEST.in` so that it will be either included in or excluded
from source distributions as appropriate. [See
here](https://packaging.python.org/guides/using-manifest-in/) for information
about writing a `MANIFEST.in`.
## How to contribute
The preferred way to contribute to the DataLad code base is
to fork the [main repository][gh-datalad] on GitHub. Here
we outline the workflow used by the developers:
0. Have a clone of our main [project repository][gh-datalad] as `origin`
remote in your git:
git clone git://github.com/datalad/datalad
1. Fork the [project repository][gh-datalad]: click on the 'Fork'
button near the top of the page. This creates a copy of the code
base under your account on the GitHub server.
2. Add your forked clone as a remote to the local clone you already have on your
local disk:
git remote add gh-YourLogin git@github.com:YourLogin/datalad.git
git fetch gh-YourLogin
To ease addition of other github repositories as remotes, here is
a little bash function/script to add to your `~/.bashrc`:
ghremote () {
url="$1"
proj=${url##*/}
url_=${url%/*}
login=${url_##*/}
git remote add gh-$login $url
git fetch gh-$login
}
thus you could simply run:
ghremote git@github.com:YourLogin/datalad.git
to add the above `gh-YourLogin` remote. Additional handy aliases
such as `ghpr` (to fetch existing pr from someone's remote) and
`ghsendpr` could be found at [yarikoptic's bash config file](http://git.onerussian.com/?p=etc/bash.git;a=blob;f=.bash/bashrc/30_aliases_sh;hb=HEAD#l865)
3. Create a branch (generally off the `origin/master`) to hold your changes:
git checkout -b nf-my-feature
and start making changes. Ideally, use a prefix signaling the purpose of the
branch
- `nf-` for new features
- `bf-` for bug fixes
- `rf-` for refactoring
- `doc-` for documentation contributions (including in the code docstrings).
- `bm-` for changes to benchmarks
We recommend to not work in the ``master`` branch!
4. Work on this copy on your computer using Git to do the version control. When
you're done editing, do:
git add modified_files
git commit
to record your changes in Git. Ideally, prefix your commit messages with the
`NF`, `BF`, `RF`, `DOC`, `BM` similar to the branch name prefixes, but you could
also use `TST` for commits concerned solely with tests, and `BK` to signal
that the commit causes a breakage (e.g. of tests) at that point. Multiple
entries could be listed joined with a `+` (e.g. `rf+doc-`). See `git log` for
examples. If a commit closes an existing DataLad issue, then add to the end
of the message `(Closes #ISSUE_NUMER)`
5. Push to GitHub with:
git push -u gh-YourLogin nf-my-feature
Finally, go to the web page of your fork of the DataLad repo, and click
'Pull request' (PR) to send your changes to the maintainers for review. This
will send an email to the committers. You can commit new changes to this branch
and keep pushing to your remote -- github automagically adds them to your
previously opened PR.
(If any of the above seems like magic to you, then look up the
[Git documentation](http://git-scm.com/documentation) on the web.)
Our [Design Docs](http://docs.datalad.org/en/stable/design/index.html) provide a
growing collection of insights on the command API principles and the design of
particular subsystems in DataLad to inform standard development practice.
## Development environment
We support Python 3 only (>= 3.9).
See [README.md:Dependencies](README.md#Dependencies) for basic information
about installation of datalad itself.
On Debian-based systems we recommend to enable [NeuroDebian](http://neuro.debian.net)
since we use it to provide backports of recent fixed external modules we depend upon:
```sh
apt-get install -y -q git git-annex-standalone
apt-get install -y -q patool python3-scrapy python3-{argcomplete,git,humanize,keyring,lxml,msgpack,progressbar,requests,setuptools}
```
and additionally, for development we suggest to use tox and new
versions of dependencies from pypy:
```sh
apt-get install -y -q python3-{dev,httpretty,pytest,pip,vcr,virtualenv} python3-tox
# Some libraries which might be needed for installing via pip
apt-get install -y -q lib{ffi,ssl,curl4-openssl,xml2,xslt1}-dev
```
some of which you could also install from PyPi using pip (prior installation of those libraries listed above
might be necessary)
```sh
pip install -r requirements-devel.txt
```
and you will need to install recent git-annex using appropriate for your
OS means (for Debian/Ubuntu, once again, just use NeuroDebian).
Contributor Files History
-------------------------
The original repository provided a [.zenodo.json](.zenodo.json)
file, and we generate a [.contributors file](.all-contributorsrc) from that via:
```bash
pip install tributors
tributors --version
0.0.18
```
It helps to have a GitHub token to increase API limits:
```bash
export GITHUB_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
Instructions for these environment variables can be found [here](https://con.github.io/tributors/docs/getting-started#2-environment).
Then update zenodo:
```bash
tributors update zenodo
INFO: zenodo:Updating .zenodo.json
INFO: zenodo:Updating .tributors cache from .zenodo.json
WARNING:tributors:zenodo does not support updating from names.
```
In the case that there is more than one orcid found for a user, you will be given a list
to check. Others will be updated in the file. You can then curate the file as you see fit.
We next want to add the .allcontributors file:
```bash
$ tributors init allcontrib
INFO:allcontrib:Generating .all-contributorsrc for datalad/datalad
$ tributors update allcontrib
INFO:allcontrib:Updating .all-contributorsrc
INFO:allcontrib:Updating .tributors cache from .all-contributorsrc
INFO:allcontrib:⭐️ Found new contributor glalteva in .all-contributorsrc
INFO:allcontrib:⭐️ Found new contributor adswa in .all-contributorsrc
INFO:allcontrib:⭐️ Found new contributor chrhaeusler in .all-contributorsrc
...
INFO:allcontrib:⭐️ Found new contributor bpoldrack in .all-contributorsrc
INFO:allcontrib:⭐️ Found new contributor yetanothertestuser in .all-contributorsrc
WARNING:tributors:allcontrib does not support updating from orcids.
WARNING:tributors:allcontrib does not support updating from email.
```
We can then populate the shared .tributors file:
```bash
$ tributors update-lookup allcontrib
```
And then we can rely on the [GitHub action](.github/workflows/update-contributors.yml) to update contributors. The action is set to run on merges to master, meaning when the contributions are finalized. This means that we add new contributors, and we
look for new orcids as we did above.
## Additional Hints
### Merge commits
For merge commits to have more informative description, add to your
`.git/config` or `~/.gitconfig` following section:
[merge]
log = true
and if conflicts occur, provide short summary on how they were resolved
in "Conflicts" listing within the merge commit
(see [example](https://github.com/datalad/datalad/commit/eb062a8009d160ae51929998771964738636dcc2)).
## Quality Assurance
It is recommended to check that your contribution complies with the following
rules before submitting a pull request:
- All public methods should have informative docstrings with sample usage
presented as doctests when appropriate.
- All other tests pass when everything is rebuilt from scratch.
- New code should be accompanied by tests.
The documentation contains a [Design Document specifically on running and writing tests](http://docs.datalad.org/en/stable/design/testing.html) that we encourage you to read beforehand.
Further hands-on advice is detailed below.
### Tests
`datalad/tests` contains tests for the core portion of the project, and
more tests are provided under corresponding submodules in `tests/`
subdirectories to simplify re-running the tests concerning that portion
of the codebase. To execute many tests, the codebase first needs to be
"installed" in order to generate scripts for the entry points. For
that, the recommended course of action is to use `virtualenv`, e.g.
```sh
virtualenv --system-site-packages venv-tests
source venv-tests/bin/activate
pip install -r requirements.txt
python setup.py develop
```
and then use that virtual environment to run the tests, via
```sh
pytest datalad
```
then to later deactivate the virtualenv just simply enter
```sh
deactivate
```
Alternatively, or complimentary to that, you can use `tox` -- there is a `tox.ini`
file which sets up a few virtual environments for testing locally, which you can
later reuse like any other regular virtualenv for troubleshooting.
Additionally, [tools/testing/test_README_in_docker](tools/testing/test_README_in_docker) script can
be used to establish a clean docker environment (based on any NtesteuroDebian-supported
release of Debian or Ubuntu) with all dependencies listed in README.md pre-installed.
### CI setup
We are using several continuous integration services to run our tests battery for every PR and on the default branch.
Please note that new a contributor's first PR needs workflow approval from a team member to start the CI runs, but we promise to promptly review and start the CI runs on your PR.
As the full CI suite takes a while to complete, we recommend to run at least tests directly related to your contributions locally beforehand.
Logs from all CI runs are collected periodically by [con/tinuous](https://github.com/con/tinuous/) and archived at `smaug:/mnt/btrfs/datasets/datalad/ci/logs/`.
For developing on Windows you can use free [Windows VMs](https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/).
If you would like to propose patch against `git-annex` itself, submit them against [datalad/git-annex](https://github.com/datalad/git-annex/#submitting-patches) repository which builds and tests `git-annex`.
### Coverage
You can also check for common programming errors with the following tools:
- Code with good unittest coverage (at least 80%), check with:
pip install pytest coverage
pytest --cov=datalad path/to/tests_for_package
- We rely on https://codecov.io to provide convenient view of code coverage.
Installation of the codecov extension for Firefox/Iceweasel or Chromium
is strongly advised, since it provides coverage annotation of pull
requests.
### Linting
We are not (yet) fully PEP8 compliant, so please use these tools as
guidelines for your contributions, but not to PEP8 entire code
base.
[beyond-pep8]: https://www.youtube.com/watch?v=wf-BqAjZb8M
*Sidenote*: watch [Raymond Hettinger - Beyond PEP 8][beyond-pep8]
- No pyflakes warnings, check with:
pip install pyflakes
pyflakes path/to/module.py
- No PEP8 warnings, check with:
pip install pep8
pep8 path/to/module.py
- AutoPEP8 can help you fix some of the easy redundant errors:
pip install autopep8
autopep8 path/to/pep8.py
Also, some team developers use
[PyCharm community edition](https://www.jetbrains.com/pycharm) which
provides built-in PEP8 checker and handy tools such as smart
splits/joins making it easier to maintain code following the PEP8
recommendations. NeuroDebian provides `pycharm-community-sloppy`
package to ease pycharm installation even further.
### Benchmarking
We use [asv] to benchmark some core DataLad functionality.
The benchmarks suite is located under [benchmarks/](./benchmarks), and
periodically we publish results of running benchmarks on a dedicated host
to http://datalad.github.io/datalad/ . Those results are collected
and available under the `.asv/` submodule of this repository, so to get started
- `git submodule update --init .asv`
- `pip install .[devel]` or just `pip install asv`
- `asv machine` - to configure asv for your host if you want to run
benchmarks locally
And then you could use [asv] in multiple ways.
#### Quickly benchmark the working tree
- `asv run -E existing` - benchmark using the existing python environment
and just print out results (not stored anywhere). You can add `-q`
to run each benchmark just once (thus less reliable estimates)
- `asv run -b api.supers.time_createadd_to_dataset -E existing`
would run that specific benchmark using the existing python environment
Note: `--python=same` (`-E existing`) seems to have restricted
applicability, e.g. can't be used for a range of commits, so it can't
be used with `continuous`.
#### Compare results for two commits from recorded runs
Use [asv compare] to compare results from different runs, which should be
available under `.asv/results/<machine>`. (Note that the example
below passes ref names instead of commit IDs, which requires asv v0.3
or later.)
```shell
> asv compare -m hopa maint master
All benchmarks:
before after ratio
[b619eca4] [7635f467]
- 1.87s 1.54s 0.82 api.supers.time_createadd
- 1.85s 1.56s 0.84 api.supers.time_createadd_to_dataset
- 5.57s 4.40s 0.79 api.supers.time_installr
145±6ms 145±6ms 1.00 api.supers.time_ls
- 4.59s 2.17s 0.47 api.supers.time_remove
427±1ms 434±8ms 1.02 api.testds.time_create_test_dataset1
- 4.10s 3.37s 0.82 api.testds.time_create_test_dataset2x2
1.81±0.07ms 1.73±0.04ms 0.96 core.runner.time_echo
2.30±0.2ms 2.04±0.03ms ~0.89 core.runner.time_echo_gitrunner
+ 420±10ms 535±3ms 1.27 core.startup.time_help_np
111±6ms 107±3ms 0.96 core.startup.time_import
+ 334±6ms 466±4ms 1.39 core.startup.time_import_api
```
#### Run and compare results for two commits
[asv continuous] could be used to first run benchmarks for the to-be-tested
commits and then provide stats:
- `asv continuous maint master` - would run and compare `maint` and `master` branches
- `asv continuous HEAD` - would compare `HEAD` against `HEAD^`
- `asv continuous master HEAD` - would compare `HEAD` against state of master
- [TODO: continuous -E existing](https://github.com/airspeed-velocity/asv/issues/338#issuecomment-380520022)
Notes:
- only significant changes will be reported
- raw results from benchmarks are not stored (use `--record-samples` if
desired)
#### Run and record benchmarks results (for later comparison etc)
- `asv run` would run all configured branches (see
[asv.conf.json](./asv.conf.json))
#### Profile a benchmark and produce a nice graph visualization
Example (replace with the benchmark of interest)
asv profile -v -o profile.gprof usecases.study_forrest.time_make_studyforrest_mockup
gprof2dot -f pstats profile.gprof | dot -Tpng -o profile.png \
&& xdg-open profile.png
#### Common options
- `-E` to restrict to specific environment, e.g. `-E virtualenv:2.7`
- `-b` could be used to specify specific benchmark(s)
- `-q` to run benchmark just once for a quick assessment (results are
not stored since too unreliable)
[asv compare]: http://asv.readthedocs.io/en/latest/commands.html#asv-compare
[asv continuous]: http://asv.readthedocs.io/en/latest/commands.html#asv-continuous
[asv]: http://asv.readthedocs.io
## Easy Issues
A great way to start contributing to DataLad is to pick an item from the list of
[Easy issues](https://github.com/datalad/datalad/labels/easy) in the issue
tracker. Resolving these issues allows you to start contributing to the project
without much prior knowledge. Your assistance in this area will be greatly
appreciated by the more experienced developers as it helps free up their time to
concentrate on other issues.
## Maintenance teams coordination
We distinguish particular aspects of DataLad's functionality, each corresponding
to parts of the code base in this repository, and loosely maintain teams assigned
to these aspects.
While any contributor can tackle issues on any aspect, you may want to refer to
members of such teams (via GitHub tagging or review requests) or the team itself
(via GitHub issue label ``team-<area>``) when creating a PR, feature request, or bug report.
Members of a team are encouraged to respond to PRs or issues within the given area,
and pro-actively improve robustness, user experience, documentation, and
performance of the code.
New and existing contributors are invited to join teams:
- **core**: core API/commands (@datalad/team-core)
- **git**: Git interface (e.g. GitRepo, protocols, helpers, compatibility) (@datalad/team-git)
- **gitannex**: git-annex interface (e.g. AnnexRepo, protocols, helpers, compatibility) (@datalad/team-gitannex)
- **remotes**: (special) remote implementations (@datalad/team-remotes)
- **runner**: sub-process execution and IO (@datalad/team-runner)
- **services**: interaction with 3rd-party services (create-sibling*, downloaders, credentials, etc.) (@datalad/team-services)
## Recognizing contributions
We welcome and recognize all contributions from documentation to testing to code development.
You can see a list of current contributors in our [zenodo file][link_zenodo].
If you are new to the project, don't forget to add your name and affiliation there!
We also have an .all-contributorsrc that is updated automatically on merges. Once it's
merged, if you helped in a non standard way (e.g., a contribution other than code)
you can open a pull request to add any [All Contributors Emoji][contrib_emoji] that
match your contribution types.
## Thank you!
You're awesome. :wave::smiley:
# Various hints for developers
## Useful tools
- While performing IO/net heavy operations use [dstat](http://dag.wieers.com/home-made/dstat)
for quick logging of various health stats in a separate terminal window:
dstat -c --top-cpu -d --top-bio --top-latency --net
- To monitor speed of any data pipelining [pv](http://www.ivarch.com/programs/pv.shtml) is really handy,
just plug it in the middle of your pipe.
- For remote debugging epdb could be used (avail in pip) by using
`import epdb; epdb.serve()` in Python code and then connecting to it with
`python -c "import epdb; epdb.connect()".`
- We are using codecov which has extensions for the popular browsers
(Firefox, Chrome) which annotates pull requests on github regarding changed coverage.
## Useful Environment Variables
Refer datalad/config.py for information on how to add these environment variables to the config file and their naming convention
- *DATALAD_DATASETS_TOPURL*:
Used to point to an alternative location for `///` dataset. If running
tests preferred to be set to https://datasets-tests.datalad.org
- *DATALAD_LOG_LEVEL*:
Used for control the verbosity of logs printed to stderr while running datalad commands/debugging
- *DATALAD_LOG_NAME*:
Whether to include logger name (e.g. `datalad.support.sshconnector`) in the log
- *DATALAD_LOG_OUTPUTS*:
Used to control either both stdout and stderr of external commands execution are logged in detail (at DEBUG level)
- *DATALAD_LOG_PID*
To instruct datalad to log PID of the process
- *DATALAD_LOG_TARGET*
Where to log: `stderr` (default), `stdout`, or another filename
- *DATALAD_LOG_TIMESTAMP*:
Used to add timestamp to datalad logs
- *DATALAD_LOG_TRACEBACK*:
Runs TraceBack function with collide set to True, if this flag is set to 'collide'.
This replaces any common prefix between current traceback log and previous invocation with "..."
- *DATALAD_LOG_VMEM*:
Reports memory utilization (resident/virtual) at every log line, needs `psutil` module
- *DATALAD_EXC_STR_TBLIMIT*:
This flag is used by datalad to cap the number of traceback steps included in exception logging and result reporting to DATALAD_EXC_STR_TBLIMIT of pre-processed entries from traceback.
- *DATALAD_SEED*:
To seed Python's `random` RNG, which will also be used for generation of dataset UUIDs to make
those random values reproducible. You might want also to set all the relevant git config variables
like we do in one of the travis runs
- *DATALAD_TESTS_TEMP_KEEP*:
Function rmtemp will not remove temporary file/directory created for testing if this flag is set
- *DATALAD_TESTS_TEMP_DIR*:
Create a temporary directory at location specified by this flag.
It is used by tests to create a temporary git directory while testing git annex archives etc
- *DATALAD_TESTS_NONETWORK*:
Skips network tests completely if this flag is set
Examples include test for S3, git_repositories, OpenfMRI, etc
- *DATALAD_TESTS_SSH*:
Skips SSH tests if this flag is **not** set. If you enable this,
you need to set up a "datalad-test" and "datalad-test2" target in
your SSH configuration. The second target is used by only a couple
of tests, so depending on the tests you're interested in, you can
get by with only "datalad-test" configured.
A Docker image that is used for DataLad's tests is available at
<https://github.com/datalad-tester/docker-ssh-target>. Note that
the DataLad tests assume that target files exist in
`DATALAD_TESTS_TEMP_DIR`, which restricts the "datalad-test" target
to being either the localhost or a container that mounts
`DATALAD_TESTS_TEMP_DIR`.
- *DATALAD_TESTS_NOTEARDOWN*:
Does not execute teardown_package which cleans up temp files and directories created by tests if this flag is set
- *DATALAD_TESTS_USECASSETTE*:
Specifies the location of the file to record network transactions by the VCR module.
Currently used by when testing custom special remotes
- *DATALAD_TESTS_OBSCURE_PREFIX*:
A string to prefix the most obscure (but supported by the filesystem test filename
- *DATALAD_TESTS_PROTOCOLREMOTE*:
Binary flag to specify whether to test protocol interactions of custom remote with annex
- *DATALAD_TESTS_RUNCMDLINE*:
Binary flag to specify if shell testing using shunit2 to be carried out
- *DATALAD_TESTS_TEMP_FS*:
Specify the temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation
- *DATALAD_TESTS_TEMP_FSSIZE*:
Specify the size of temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation
- *DATALAD_TESTS_NONLO*:
Specifies network interfaces to bring down/up for testing. Currently used by travis.
- *DATALAD_TESTS_KNOWNFAILURES_PROBE*:
Binary flag to test whether "known failures" still actually are failures. That
is - change behavior of tests, that decorated with any of the `known_failure`,
to not skip, but executed and *fail* if they would pass (indicating that the
decorator may be removed/reconsidered).
- *DATALAD_TESTS_GITCONFIG*:
Additional content to add to `~/.gitconfig` in the tests `HOME` environment. `\n` is replaced with `os.linesep`.
- *DATALAD_TESTS_CREDENTIALS*:
Set to `system` to allow for credentials possibly present in the user/system wide environment to be used.
- *DATALAD_CMD_PROTOCOL*:
Specifies the protocol number used by the Runner to note shell command or python function call times and allows for dry runs.
'externals-time' for ExecutionTimeExternalsProtocol, 'time' for ExecutionTimeProtocol and 'null' for NullProtocol.
Any new DATALAD_CMD_PROTOCOL has to implement datalad.support.protocol.ProtocolInterface
- *DATALAD_CMD_PROTOCOL_PREFIX*:
Sets a prefix to add before the command call times are noted by DATALAD_CMD_PROTOCOL.
- *DATALAD_USE_DEFAULT_GIT*:
Instructs to use `git` as available in current environment, and not the one which possibly comes with git-annex (default behavior).
- *DATALAD_ASSERT_NO_OPEN_FILES*:
Instructs test helpers to check for open files at the end of a test. If set, remaining open files are logged at ERROR level. Alternative modes are: "assert" (raise AssertionError if any open file is found), "pdb"/"epdb" (drop into debugger when open files are found, info on files is provided in a "files" dictionary, mapping filenames to psutil process objects).
- *DATALAD_ALLOW_FAIL*:
Instructs `@never_fail` decorator to allow to fail, e.g. to ease debugging.
# Release(s) workflow
## Branches
- `master`: changes toward the next `MAJOR.MINOR.0` release.
Release candidates (tagged with an `rcX` suffix) are cut from this branch
- `maint`: bug fixes for the latest released `MAJOR.MINOR.PATCH`
- `maint-MAJOR.MINOR`: generally not used, unless some bug fix release with a critical bug fix is needed.
## Workflow
- upon release of `MAJOR.MINOR.0`, `maint` branch needs to be fast-forwarded to that release
- bug fixes to functionality released within the `maint` branch should be
submitted against `maint` branch
- cherry-picking fixes from `master` into `maint` is allowed where needed
- `master` branch accepts PRs with new functionality
- `master` branch merges `maint` as frequently as needed
## Helpers
[Makefile](./Makefile) provides a number of useful `make` targets:
- `linkissues-changelog`: converts `(#ISSUE)` placeholders into proper markdown within [CHANGELOG.md]()
- `update-changelog`: uses above `linkissues-changelog` and updates .rst changelog
- `release-pypi`: ensures no `dist/` exists yet, creates a wheel and a source distribution and uploads to pypi.
## Releasing with GitHub Actions, auto, and pull requests
New releases of DataLad are created via a GitHub Actions workflow using [datalad/release-action](https://github.com/datalad/release-action), which was inspired by [`auto`](https://github.com/intuit/auto).
Whenever a pull request is merged into `maint` that has the "`release`" label, that workflow updates the
changelog based on the pull requests since the last release, commits the
results, tags the new commit with the next version number, and creates a GitHub
release for the tag.
This in turn triggers a job for building an sdist & wheel for the project and uploading them to PyPI.
The release workflow alternatively could be triggered by visiting [release workflow page](https://github.com/datalad/datalad/actions/workflows/release.yml) and pressing "Run workflow" and choosing corresponding (`maint`) branch to release.
Note that release workflow would fail if there were no commits since the most recent tagged release.
### CHANGELOG entries and labelling pull requests
DataLad uses [scriv](https://github.com/nedbat/scriv/) to maintain [CHANGELOG.md](./CHANGELOG.md).
Adding label `CHANGELOG-missing` to a PR triggers workflow to add a new `scriv` changelog fragment under `changelog.d/` using PR title as the content.
That produced changelog snippet could subsequently tuned to improve perspective CHANGELOG entry.
The section that workflow adds to the changelog depends on the `semver-` label added to the PR:
- `semver-minor` — for changes corresponding to an increase in the minor version
component
- `semver-patch` — for changes corresponding to an increase in the patch/micro version
component; this is the default label for unlabelled PRs
- `semver-internal` — for changes only affecting the internal API
- `semver-documentation` — for changes only affecting the documentation
- `semver-tests` — for changes to tests
- `semver-dependencies` — for updates to dependency versions
- `semver-performance` — for performance improvements
[link_zenodo]: https://github.com/datalad/datalad/blob/master/.zenodo.json
[contrib_emoji]: https://allcontributors.org/docs/en/emoji-key
## git-annex
Even though git-annex is a separate project, DataLad's and git-annex's development is often intertwined.
## Filing issues
It is not uncommon to discover potential git-annex bugs or git-annex feature request while working on DataLad.
In those cases, it is common for developers and contributors to file an issue in git-annex's public bug tracker at [git-annex.branchable.com](https://git-annex.branchable.com/).
Here are a few hints on how to go about it:
- You can report a new bug or browse through existing bug reports at [git-annex.branchable.com/bugs](https://git-annex.branchable.com/bugs/)
- In order to associate a bug report with the DataLad you can add the following mark up into the description: ``[[!tag projects/datalad]]``
- You can add author metadata with the following mark up: ``[[!meta author=yoh]]``. Some authors will be automatically associated with the DataLad project by git-annex's bug tracker.
## Testing and contributing
To provide downstream testing of development `git-annex` against DataLad, we maintain the [datalad/git-annex](https://github.com/datalad/git-annex) repository.
It provides daily builds of git-annex with CI setup to run git-annex built-in tests and tests of DataLad across all supported operating systems.
It also has a facility to test git-annex on *your* client systems following [the instructions](https://github.com/datalad/git-annex/tree/master/clients#testing-git-annex-builds-on-local-clients).
All the build logs and artifacts (installer packages etc) for daily builds and releases are collected using [con/tinuous](https://github.com/con/tinuous/) and archived on `smaug:/mnt/btrfs/datasets/datalad/ci/git-annex/`.
You can test your fixes for git-annex by submitting patches for it [following instructions](https://github.com/datalad/git-annex#submitting-patches).
|