File: why.rst

package info (click to toggle)
joblib 0.6.4-3
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 480 kB
  • sloc: python: 3,651; makefile: 21
file content (60 lines) | stat: -rw-r--r-- 1,719 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

Why joblib: project goals
===========================

What pipelines bring us
--------------------------

Pipeline processing systems can provide a set of useful features:

Data-flow programming for performance
......................................

* **On-demand computing:** in pipeline systems such as labView, or VTK
  calculations are performed as needed by the outputs and only when
  inputs change.

* **Transparent parallelization:** a pipeline topology can be inspected
  to deduce which operations can be run in parallel (it is equivalent to
  purely functional programming).

Provenance tracking for understanding the code
...............................................

* **Tracking of data and computations:** to be able to fully reproduce a
  computational experiment: requires tracking of the data and operation
  implemented.

* **Inspecting data flow:** Inspecting intermediate results helps
  debugging and understanding.

.. topic:: But pipeline frameworks can get in the way
    :class: warning

    We want our code to look like the underlying algorithm,
    not like a software framework.

Joblib's approach
--------------------

Functions are the simplest abstraction used by everyone. Our pipeline
jobs (or tasks) are made of decorated functions.

Tracking of parameters in a meaningful way requires specification of
data model. We give up on that and use hashing for performance and
robustness.

Design choices
---------------

* No dependencies other than Python

* Robust, well-tested code, at the cost of functionality

* Fast and suitable for scientific computing on big dataset without
  changing the original code

* Only local imports: **embed joblib in your code by copying it**