1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256
|
Executing notebooks
===================
.. module:: nbclient.client.guide
Jupyter notebooks are often saved with output cells that have been cleared.
NBClient provides a convenient way to execute the input cells of an
.ipynb notebook file and save the results, both input and output cells,
as a .ipynb file.
In this section we show how to execute a ``.ipynb`` notebook
document saving the result in notebook format. If you need to export
notebooks to other formats, such as reStructured Text or Markdown (optionally
executing them) see `nbconvert <https://nbconvert.readthedocs.io/en/latest/>`_.
Executing notebooks can be very helpful, for example, to run all notebooks
in Python library in one step, or as a way to automate the data analysis in
projects involving more than one notebook.
Using the Python API interface
------------------------------
This section will illustrate the Python API interface.
Example
~~~~~~~
Let's start with a complete quick example, leaving detailed explanations
to the following sections.
**Import**: First we import nbformat and the :class:`NotebookClient`
class::
import nbformat
from nbclient import NotebookClient
**Load**: Assuming that ``notebook_filename`` contains the path to a notebook,
we can load it with::
nb = nbformat.read(notebook_filename, as_version=4)
**Configure**: Next, we configure the notebook execution mode::
client = NotebookClient(nb, timeout=600, kernel_name='python3', resources={'metadata': {'path': 'notebooks/'}})
We specified two (optional) arguments ``timeout`` and ``kernel_name``, which
define respectively the cell execution timeout and the execution kernel.
Usually you don't need to set these options, but these and other options are
available to control execution context. Note that ``path`` specifies
in which folder to execute the notebook.
**Execute/Run**: To actually run the notebook we call the method
``execute``::
client.execute()
Hopefully, we will not get any errors during the notebook execution
(see the last section for error handling). This notebook will
now have its cell outputs populated with the result of running
each cell.
**Save**: Finally, save the resulting notebook with::
nbformat.write(nb, 'executed_notebook.ipynb')
That's all. Your executed notebook will be saved in the current folder
in the file ``executed_notebook.ipynb``.
Execution arguments (traitlets)
-------------------------------
The arguments passed to :class:`NotebookClient` are configuration options
called `traitlets <https://traitlets.readthedocs.io/en/stable>`_.
There are many cool things about traitlets. For example,
they enforce the input type, and they can be accessed/modified as
class attributes.
Let's now discuss in more detail the two traitlets we used.
The ``timeout`` traitlet defines the maximum time (in seconds) each notebook
cell is allowed to run, if the execution takes longer an exception will be
raised. The default is 30 s, so in cases of long-running cells you may want to
specify an higher value. The ``timeout`` option can also be set to ``None``
or ``-1`` to remove any restriction on execution time.
The second traitlet, ``kernel_name``, allows specifying the name of the kernel
to be used for the execution. By default, the kernel name is obtained from the
notebook metadata. The traitlet ``kernel_name`` allows specifying a
user-defined kernel, overriding the value in the notebook metadata. A common
use case is that of a Python 2/3 library which includes documentation/testing
notebooks. These notebooks will specify either a python2 or python3 kernel in
their metadata (depending on the kernel used the last time the notebook was
saved). In reality, these notebooks will work on both Python 2 and Python 3,
and, for testing, it is important to be able to execute them programmatically
on both versions. Here the traitlet ``kernel_name`` helps simplify and
maintain consistency: we can just run a notebook twice, specifying first
"python2" and then "python3" as the kernel name.
Hooks before and after notebook or cell execution
-------------------------------------------------
There are several configurable hooks that allow the user to execute code before and
after a notebook or a cell is executed. Each one is configured with a function that will be called in its
respective place in the execution pipeline.
Each is described below:
**Notebook-level hooks**: These hooks are called with a single extra parameter:
- ``notebook=NotebookNode``: the current notebook being executed.
Here is the available hooks:
- ``on_notebook_start`` will run when the notebook client is initialized, before any execution has happened.
- ``on_notebook_complete`` will run when the notebook client has finished executing, after kernel cleanup.
- ``on_notebook_error`` will run when the notebook client has encountered an exception before kernel cleanup.
**Cell-level hooks**: These hooks are called with at least two parameters:
- ``cell=NotebookNode``: a reference to the current cell.
- ``cell_index=int``: the index of the cell in the current notebook's list of cells.
Here are the available hooks:
- ``on_cell_start`` will run for all cell types before the cell is executed.
- ``on_cell_execute`` will run right before the code cell is executed.
- ``on_cell_complete`` will run after execution, if the cell is executed with no errors.
- ``on_cell_executed`` will run right after the code cell is executed.
- ``on_cell_error`` will run if there is an error during cell execution.
``on_cell_executed`` and ``on_cell_error`` are called with an extra parameter ``execute_reply=dict``.
Handling errors and exceptions
------------------------------
In the previous sections we saw how to save an executed notebook, assuming
there are no execution errors. But, what if there are errors?
Execution until first error
~~~~~~~~~~~~~~~~~~~~~~~~~~~
An error during the notebook execution, by default, will stop the execution
and raise a ``CellExecutionError``. Conveniently, the source cell causing
the error and the original error name and message are also printed.
After an error, we can still save the notebook as before::
nbformat.write(nb, 'executed_notebook.ipynb')
The saved notebook contains the output up until the failing cell,
and includes a full stack-trace and error (which can help debugging).
Handling errors
~~~~~~~~~~~~~~~
A useful pattern to execute notebooks while handling errors is the following::
from nbclient.exceptions import CellExecutionError
try:
client.execute()
except CellExecutionError:
msg = 'Error executing the notebook "%s".\n\n' % notebook_filename
msg += 'See notebook "%s" for the traceback.' % notebook_filename_out
print(msg)
raise
finally:
nbformat.write(nb, notebook_filename_out)
This will save the executed notebook regardless of execution errors.
In case of errors, however, an additional message is printed and the
``CellExecutionError`` is raised. The message directs the user to
the saved notebook for further inspection.
Execute and save all errors
~~~~~~~~~~~~~~~~~~~~~~~~~~~
As a last scenario, it is sometimes useful to execute notebooks which raise
exceptions, for example to show an error condition. In this case, instead of
stopping the execution on the first error, we can keep executing the notebook
using the traitlet ``allow_errors`` (default is False). With
``allow_errors=True``, the notebook is executed until the end, regardless of
any error encountered during the execution. The output notebook, will contain
the stack-traces and error messages for **all** the cells raising exceptions.
Widget state
------------
If your notebook contains any
`Jupyter Widgets <https://github.com/jupyter-widgets/ipywidgets/>`_,
the state of all the widgets can be stored in the notebook's metadata.
This allows rendering of the live widgets on for instance nbviewer, or when
converting to html.
We can tell nbclient to not store the state using the ``store_widget_state``
argument::
client = NotebookClient(nb, store_widget_state=False)
This widget rendering is not performed against a browser during execution, so
only widget default states or states manipulated via user code will be
calculated during execution. ``%%javascript`` cells will execute upon notebook
rendering, enabling complex interactions to function as expected when viewed by
a UI.
If you can't view widget results after execution, you may need to select
:menuselection:`Trust Notebook` under the :menuselection:`File` menu.
Using a command-line interface
------------------------------
This section will illustrate how to run notebooks from your terminal. It supports the most basic use case. For more sophisticated execution options, consider the `papermill <https://pypi.org/project/papermill/>`_ library.
This library's command line tool is available by running ``jupyter execute``. It expects notebooks as input arguments and accepts optional flags to modify the default behavior.
Running a notebook is this easy.::
jupyter execute notebook.ipynb
You can pass more than one notebook as well.::
jupyter execute notebook.ipynb notebook2.ipynb
By default, notebook errors will be raised and printed into the terminal. You can suppress them by passing the ``--allow-errors`` flag.::
jupyter execute notebook.ipynb --allow-errors
Other options allow you to modify the timeout length and dictate the kernel in use. A full set of options is available via the help command.::
jupyter execute --help
An application used to execute notebook files (*.ipynb)
Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
<cmd> --help-all
--allow-errors
Errors are ignored and execution is continued until the end of the notebook.
Equivalent to: [--NbClientApp.allow_errors=True]
--timeout=<Int>
The time to wait (in seconds) for output from executions. If a cell
execution takes longer, a TimeoutError is raised. ``-1`` will disable the
timeout.
Default: None
Equivalent to: [--NbClientApp.timeout]
--startup_timeout=<Int>
The time to wait (in seconds) for the kernel to start. If kernel startup
takes longer, a RuntimeError is raised.
Default: 60
Equivalent to: [--NbClientApp.startup_timeout]
--kernel_name=<Unicode>
Name of kernel to use to execute the cells. If not set, use the kernel_spec
embedded in the notebook.
Default: ''
Equivalent to: [--NbClientApp.kernel_name]
To see all available configurables, use `--help-all`.
|