File: workceptor.rst

package info (click to toggle)
receptor 1.5.5-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 2,772 kB
  • sloc: python: 1,643; makefile: 305; sh: 174
file content (383 lines) | stat: -rw-r--r-- 12,569 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
Workceptor
==========

.. contents::
   :local:

Workceptor is a component of receptor that handles units of work.

``work-commands`` defines a type of work that can run on the node.

foo.yml

.. code-block:: yaml

    ---
    version: 2
    node:
      id: foo

    log-level:
      level: Debug

    tcp-listeners:
      - port: 2222

    control-services:
      - service: control
        filename: /tmp/foo.sock

    work-commands:
      - workType: echoint
        command: bash
        params:  "-c \"for i in {1..5}; do echo $i; sleep 1; done\""

bar.yml

.. code-block:: yaml

    ---
    version: 2
    node:
      id: bar

    log-level:
      level: Debug

    tcp-peer:
      address: localhost:2222

    control-services:
      - service: control

    work-commands:
      - worktype: echoint
        command: bash
        params:  "-c \"for i in {1..10}; do echo $i; sleep 1; done\""
      - workType: echopayload
        command: bash
        params: "-c \"while read -r line; do echo ${line^^}; sleep 3; done\""


Configuring work commands
--------------------------

``worktype`` User-defined name to give this work definition

``command`` The executable that is invoked when running this work

``params`` Command-line options passed to this executable


Local work
-----------

Start the work by connecting to the ``control-services`` and issuing a "work submit" command

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work submit echoint --no-payload
    Result:  Job Started
    Unit ID: t1BlAB18

Receptor started an instance of this work type, and labeled it with a unique "Unit ID"

Work results
-------------

Use the "Unit ID" to get work results

.. code-block:: bash

    receptorctl --socket /tmp/foo.sock work results t1BlAB18
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10


Remote work
------------

Although connected to `foo`, by providing the "--node" option the work can be started on node `bar`.

The work type must be defined on the node it is intended to run on, e.g. `bar` must have a ``work-command`` called "echoint", in this case.

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work submit echoint --node bar --no-payload
    Result:  Job Started
    Unit ID: 87Vwqb6A

Remote work submission ultimately results in two work units running at the same time; a local work unit and the remote work unit. These two units have their own Unit IDs. The local work unit's goal is to monitor and stream results back from the running remote work unit.

Sequence of events for remote work submission

- `foo` starts a local work unit of work type "remote". This is a special work type that is built into receptor.
- This work unit attempts to connect to `bar`'s control service and issue a "work submit echoint" command. From `bar`'s perspective, this is the exact same operation as if a user connected to `bar` directly and issued a work submit command. `bar` is not aware that `foo` is the one that issued the command.
- Once submitted, `foo` will stream work results back to itself and store it on disk. It also periodically gets the ``work status`` of the work running on `bar`. Status includes information about the work state and the stdout size.
- `foo` continues streaming stdout results until the size stored on disk matches the StdoutSize reported in `bar`'s status.

.. _work_payload:

Payload
--------

in `bar.yml`

.. code-block:: yaml

      - workType: echopayload
        command: bash
        params: "-c \"while read -r line; do echo ${line^^}; sleep 5; done\""

Here the bash command expects to read a line from stdin, echo the line in all uppercase letters, and sleep for 3 seconds.

Payloads can be passed into receptor using the "--payload" option.

.. code-block:: bash

    $ echo -e "hi\ni am foo\nwhat is your name" | receptorctl --socket /tmp/foo.sock work submit echopayload --node bar --payload - -f
    HI
    I AM FOO
    WHAT IS YOUR NAME

"--payload -" means the payload should be whatever the stdin is, which is piped in from the "echo -e ..." command.

Note: "-f" instructs receptorctl to follow the work unit immediately, i.e. stream results to stdout. One could also use "work results" to stream the results.


Runtime Parameters
-------------------

Work commands can be configured to allow parameters to be passed to commands when work is submitted:

.. code-block:: yaml

  work-commands:
    - workType: listcontents
      command: ls
      allowruntimeparams: true

The ``allowruntimeparams`` option will allow parameters to be passed to the work command by the
client submitting the work. The contents of a specific directory can be listed by passing the paths
to the receptor command as positional arguments immediately after the ``workType``:

.. code-block:: bash

    receptorctl --socket /tmp/foo.sock work submit --node bar --no-payload -f listcontents /root/ /bin/
    /bin/:
    bash
    sh

    /root/:
    helloworld.sh

Passing options or flags to the work command needs to be done using the ``--param`` parameter to
extend the ``params`` work command setting. The ``--all`` flag can be passed to the work command this way:

.. code-block:: bash

    receptorctl --socket /tmp/foo.sock work submit --node bar --no-payload -f --param params='--all' listcontents /root/
    .
    ..
    .bash_logout
    .bash_profile
    .bashrc
    .cache
    helloworld.sh


Work list
----------

"work list" returns information about all work units that have ran on this receptor node. The following shows two work units, ``12L8s8h2`` and ``T0oN0CAp``

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work list
    {'12L8s8h2': {'Detail': 'exit status 0',
                  'ExtraData': None,
                  'State': 2,
                  'StateName': 'Succeeded',
                  'StdoutSize': 21,
                  'WorkType': 'echoint'},
     'T0oN0CAp': {'Detail': 'Running: PID 1700818',
                  'ExtraData': {'Expiration': '0001-01-01T00:00:00Z',
                                'LocalCancelled': False,
                                'LocalReleased': False,
                                'RemoteNode': 'bar',
                                'RemoteParams': {},
                                'RemoteStarted': True,
                                'RemoteUnitID': 'ATDzdViR',
                                'RemoteWorkType': 'echoint',
                                'TLSClient': ''},
                  'State': 1,
                  'StateName': 'Running',
                  'StdoutSize': 4,
                  'WorkType': 'remote'},


Notice that ``T0oN0CAp`` was a remote work submission, therefore its work type is "remote". On `bar` there is a local unit ``ATDzdViR``, with the "echoint" work type.


Work cancel
------------

Cancel will stop any running work unit. Upon canceling a "remote" work unit, the local node will attempt to connect to the remote node's control service and issue a work cancel. If the remote node is down, receptor will periodically attempt to connect to the remote node to do the cancellation.

Work release
-------------

Release will cancel the work and then delete files on disk associated with that work unit. For remote work submission, release will attempt to delete files both locally and on the remote machine. Like work cancel, the release can be pending if the remote node is down. In that situation, the local files will remain on disk until the remote node can be contacted.

Work force-release
--------------------

It might be preferable to force a release, using the ``work force-release`` command. This will do a one-time attempt to connect to the remote node and issue a work release there. After this one attempt, it will then proceed to delete all local files associated with the work unit.

States
---------

A unit of work can be in Pending, Running, Succeeded, or Failed state

For local work, transitioning from Pending to Running occurs the moment the ``command`` executable is started

For remote work, transitioning from Pending to Running occurs when the status reported from the remote node has a Running state.

Signed work
------------

Remote work submissions can be digitally signed by the sender. The target node will verify the signature of the work command before starting the work unit.

A *single* pair of RSA public and private keys is created offline and distributed to the nodes. Distribute the public key (PKIX format) to any node that should receive work. Distribute the private key (PKCS1 format) to any node that needs authority to submit work.

The following commands can be used to create keys for signing work:

.. code-block:: bash

    openssl genrsa -out signworkprivate.pem 2048
    openssl rsa -in signworkprivate.pem -pubout -out signworkpublic.pem

in `bar.yml`

.. code-block:: yaml

    # PKIX
    work-verification:
      publickey: /full/path/signworkpublic.pem

      - workType: echopayload
        command: bash
        params: "-c \"while read -r line; do echo ${line^^}; sleep 5; done\""
        verifysignature: true

in `foo.yml`

.. code-block:: yaml

    # PKCS1
    work-signing:
      privatekey: /full/path/signworkprivate.pem
      tokenexpiration: 30m

Tokenexpiration determines how long a the signature is valid for. This expiration directly corresponds to the "expiresAt" field in the generated JSON web token. Valid units include "h" and "m", e.g. 1h30m for one hour and 30 minutes.

Use the "--signwork" parameter to sign the work.

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work submit echoint --node bar --no-payload --signwork

Units on disk
--------------

Netceptor, the main component of receptor that handles mesh connectivity and traffic, operates entirely in memory. That is, it does not store any state information on disk. However, Workceptor functionality is designed to be persistent across receptor restarts. Work units might be running commands that could take hours to complete, and as such needs to store some relevant information on disk in case the receptor process restarts.

By default receptor stores data under ``/tmp/receptor`` but can be changed by setting the ``datadir`` param under the ``node`` action in the config file.

For a given work unit, receptor will store files in ``{datadir}/{nodeID}/{unitID}/``.

Here is the receptor directory tree after running ``work submit echopayload`` described in :ref:`work_payload`.

.. code-block:: bash

    $ tree /tmp/receptor
    /tmp/receptor
    ├── bar
    │   └── NImim5WA
    │       ├── status
    │       ├── status.lock
    │       ├── stdin
    │       └── stdout
    └── foo
        └── BsAjS4wi
            ├── status
            ├── status.lock
            ├── stdin
            └── stdout

The main purpose of work unit ``BsAjS4wi`` on `foo` is to copy stdin, stdout, and status from ``NImim5WA`` on `bar` back to its own working directory.

``stdin`` is a copy of the submitted payload. The contents of this file is the same on both the local (`foo`) and remote (`bar`) machines.

.. code-block:: bash

    $ cat /tmp/receptor/bar/NImim5WA/stdin
    hi
    i am foo
    what is your name

``stdout`` contains the work unit results; the stdout of the command execution. It will also be the same on both the local node and remote node.

.. code-block:: bash

    $ cat /tmp/receptor/bar/NImim5WA/stdout
    HI
    I AM FOO
    WHAT IS YOUR NAME

``status`` contains additional information related to the work unit. The contents of status are different on `foo` and `bar`.

.. code-block:: bash

    $ cat /tmp/receptor/bar/NImim5WA/stdout
    {
       "State":2,
       "Detail":"exit status 0",
       "StdoutSize":30,
       "WorkType":"echopayload",
       "ExtraData":null
    }

.. code-block:: text

    $ cat /tmp/receptor/foo/BsAjS4wi/stdout
    {
       "State":2,
       "Detail":"exit status 0",
       "StdoutSize":30,
       "WorkType":"remote",
       "ExtraData":{
          "RemoteNode":"bar",
          "RemoteWorkType":"echopayload",
          "RemoteParams":{},
          "RemoteUnitID":"NImim5WA",
          "RemoteStarted":true,
          "LocalCancelled":false,
          "LocalReleased":false,
          "TLSClient":"",
          "Expiration":"0001-01-01T00:00:00Z"
       }
    }

.. image:: remote.png
   :alt: sequence of events during work remote submission

The sequence of events during a work remote submission. Blue lines indicate moments when receptor writes files to disk.