File: launchers.md

package info (click to toggle)
ipyparallel 8.8.0-6
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 12,412 kB
  • sloc: python: 21,991; javascript: 267; makefile: 29; sh: 28
file content (202 lines) | stat: -rw-r--r-- 6,249 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# Launchers

The `Launcher` is the basic abstraction in IPython Parallel
for starting and stopping processes.

A Launcher has two primary methods: {meth}`~.BaseLauncher.start` and {meth}`~.BaseLauncher.stop`,
which should be `async def` coroutines.

There are two basic kinds of Launcher: {class}`~.ControllerLauncher` and {class}`~.EngineLauncher`.
A ControllerLauncher should launch `ipcontroller` somewhere,
and an EngineLauncher should start `n` engines somewhere.
Shared configuration,
principally `profile_dir` and `cluster_id` are typically used to locate the connection files necessary for these two communicate,
though explicit paths can be added to arguments.

Launchers are used through the {class}`~.Cluster` API,
which manages one ControllerLauncher and zero to many EngineLaunchers,
each representing a set of engines.

Launchers are registered via entry points ([more below](entrypoints)),
and can be selected via short lowercase string naming the kind of launcher, e.g. 'mpi' or 'local':

```python
import ipyparallel as ipp
c = ipp.Cluster(engines="mpi")
```

For the most part, Launchers are not interacted-with directly,
but can be _configured_.

If you generate a config file with:

```bash
ipython profile create --parallel
```

you can check out the resulting `ipcluster_config.py`,
which includes configuration options for all available Launcher classes.

You can also check `ipcluster start --help-all` to see them on the command-line.

## Debugging launchers

If a launcher isn't doing what you want,
the first thing to do is probably start your Cluster with `log_level=logging.DEBUG`.

You can also access the Launcher(s) on the Cluster object and call {meth}`~.BaseLauncher.get_output` to retrieve the output from the process.

## Writing your own Launcher(s)

If you want to write your own launcher,
the best place to start is to look at the Launcher classes that ship with IPython Parallel.

There are three key methods to implement:

- [`start()`](writing-start)
- [`stop()`](writing-stop)
- [`from_dict()`](writing-from-dict)

(writing-start)=

### Writing start

A start method on a launcher should do the following:

1. request the process(es) to be started
2. start monitoring to notice when the process exits, such that {meth}`.notify_stop` will be called when the process exits.

The command to launch should be the `self.args` list, inherited from the base class.

The default for the `LocalProcessLauncher`

```{literalinclude} ../../../ipyparallel/cluster/launcher.py
:pyobject: LocalProcessLauncher.start
```

_ControllerLauncher.start_ is always called with no arguments,
whereas `EngineLauncher.start` is called with `n`,
which is an integer or None. If `n` is an integer,
this many engines should be started.
If `n` is None, a 'default' number should be used,
e.g. the number of CPUs on a host.

(writing-stop)=

### Writing stop

A stop method should request that the process(es) stop,
and return only after everything is stopped and cleaned up.
Exactly how to collect these resources will depend greatly on how the resources were requested in `start`.

### Serializing Launchers

Launchers are serialized to disk using JSON,
via the `.to_dict()` method.
The default `.to_dict()` method should rarely need to be overridden.

To declare a property of your launcher as one that should be included in serialization,
register it as a [traitlet][] with `to_dict=True`.
For example:

```python
from traitlets import Integer
from ipyparallel.cluster.launcher import EngineLauncher
class MyLauncher(EngineLauncher):
    pid = Integer(
        help="The pid of the process",
    ).tag(to_dict=True)
```

[traitlet]: https://traitlets.readthedocs.io

This `.tag(to_dict=True)` ensures that the `.pid` property will be persisted to disk,
and reloaded in the default `.from_dict` implementation.
Typically, these are populated in `.start()`:

```python
def start(self):
    process = start_process(self.args, ...)
    self.pid = process.pid
```

Mark whatever properties are required to reconstruct your object from disk with this metadata.

(writing-from-dict)=

#### writing from_dict

{meth}`~.BaseLauncher.from_dict` should be a class method which returns an instance of your Launcher class, loaded from dict.

Most `from_dict` methods will look similar to this:

```{literalinclude} ../../../ipyparallel/cluster/launcher.py
:pyobject: LocalProcessLauncher.from_dict
```

where serializable-state is loaded first, then 'live' objects are loaded from that.
As in the default LocalProcessLauncher:

```{literalinclude} ../../../ipyparallel/cluster/launcher.py
:pyobject: LocalProcessLauncher._reconstruct_process
```

The local process case is the simplest, where the main thing that needs serialization is the PID of the process.

If reconstruction of the object fails because the resource is no longer running
(e.g. check for the PID and it's not there, or a VM / batch job are gone),
the {exc}`.NotRunning` exception should be raised.
This tells the Cluster that the object is gone and should be removed
(handled the same as if it had stopped while we are watching).
Raising other unhandled errors will be assumed to be a bug in the Launcher,
and not result in removing the resource from cluster state.

### Additional methods

Some useful additional methods to implement, if the base class implementations do not work for you:

- {meth}`~.ControllerLauncher.get_connection_info`
- {meth}`~.BaseLauncher.get_output`

TODO: write more docs on these

(entrypoints)=

## Registering your Launcher via entrypoints

Once you have defined your launcher, you can 'register' it for discovery
via entrypoints. In your setup.py:

```python
setup(
    ...
    entry_points={
        'ipyparallel.controller_launchers': [
            'mine = mypackage:MyControllerLauncher',
        ],
        'ipyparallel.engine_launchers': [
            'mine = mypackage:MyEngineSetLauncher',
        ],
    },
)
```

This allows clusters created to use the shortcut:

```python
Cluster(engines="mine")
```

instead of the full import string

```
Cluster(engines="mypackage.MyEngineSetLauncher")
```

though the long form will always still work.

## Launcher API reference

```{eval-rst}
.. automodule:: ipyparallel.cluster.launcher
```