File: index.md

package info (click to toggle)
jupyter-cache 1.0.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 840 kB
  • sloc: python: 2,601; makefile: 40; sh: 9
file content (133 lines) | stat: -rw-r--r-- 4,386 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# Jupyter Cache

Execute and cache multiple Jupyter Notebook-like files via an [API](use/api) and [CLI](use/cli).

🤓 Smart re-execution
: Notebooks will only be re-executed when **code cells** have changed (or code related metadata), not Markdown/Raw cells.

🧩 Pluggable execution modes
: Select the executor for notebooks, including serial and parallel execution

📈 Execution reports
: Timing statistics and exception tracebacks are stored for analysis

📖 [jupytext](https://jupytext.readthedocs.io) integration
: Read and execute notebooks written in multiple formats

## Why use jupyter-cache?

If you have a number of notebooks whose execution outputs you want to ensure are kept up to date, without having to re-execute them every time (particularly for long running code, or text-based formats that do not store the outputs).

The notebooks must have deterministic execution outputs:

- You use the same environment to run them (e.g. the same installed packages)
- They run no non-deterministic code (e.g. random numbers)
- They do not depend on external resources (e.g. files or network connections) that change over time

For example, it is utilised by [jupyter-book](https://jupyterbook.org/content/execute.html#caching-the-notebook-execution), to allow for fast document re-builds.

## Installation

Install `jupyter-cache`, via pip or Conda:

```bash
pip install jupyter-cache
```

```bash
conda install jupyter-cache
```

## Quick-start

```{jcache-clear}
```

Add one or more source notebook files to the "project" (a folder containing a database and a cache of executed notebooks):

```{jcache-cli} jupyter_cache.cli.commands.cmd_notebook:cmnd_notebook
:command: add
:args: tests/notebooks/basic_unrun.ipynb tests/notebooks/basic_failing.ipynb
:input: y
```

These files are now ready for execution:

```{jcache-cli} jupyter_cache.cli.commands.cmd_notebook:cmnd_notebook
:command: list
```

Now run the execution:

```{jcache-cli} jupyter_cache.cli.commands.cmd_project:cmnd_project
:command: execute
```

Successfully executed files will now be associated with a record in the cache:

```{jcache-cli} jupyter_cache.cli.commands.cmd_notebook:cmnd_notebook
:command: list
```

The cache record includes execution statistics:

```{jcache-cli} jupyter_cache.cli.commands.cmd_cache:cmnd_cache
:command: info
:args: 1
```

Next time we execute, jupyter-cache will check which files require re-execution:

```{jcache-cli} jupyter_cache.cli.commands.cmd_project:cmnd_project
:command: execute
```

The source files themselves will not be modified during/after execution.
You can create a new "final" notebook, with the cached outputs merged into the source notebook with:

```{jcache-cli} jupyter_cache.cli.commands.cmd_notebook:cmnd_notebook
:command: merge
:args: 1 final_notebook.ipynb
```

You can also add notebooks with custom formats, such as those read by [jupytext](https://jupytext.readthedocs.io):

```{jcache-cli} jupyter_cache.cli.commands.cmd_notebook:cmnd_notebook
:command: add
:args: --reader jupytext tests/notebooks/basic.md
```

```{jcache-cli} jupyter_cache.cli.commands.cmd_notebook:cmnd_notebook
:command: list
```

## Design considerations

Although there are certainly other use cases, the principle use case this was written for is generating books / websites, created from multiple notebooks (and other text documents).
It is desired that notebooks can be *auto-executed* **only** if the notebook had been modified in a way that may alter its code cell outputs.

Some desired requirements (not yet all implemented):

- A clear and robust API
- The cache is persistent on disk
- Notebook comparisons separate out "edits to content" from "edits to code cells".
  Cell rearranges and code cell changes should require a re-execution.
  Text content changes should not.
- Allow parallel access to notebooks (for execution)
- Store execution statistics/reports.
- Store external assets: Notebooks being executed often require external assets: importing scripts/data/etc. These are prepared by the users.
- Store execution artefacts: created during execution
- A transparent and robust cache invalidation: imagine the user updating an external dependency or a Python module, or checking out a different git branch.

## Contents

```{toctree}
:caption: Tutorials
using/cli
using/api
```

```{toctree}
:caption: Development
develop/contributing
```