File: workceptor.rst

package info (click to toggle)
receptor 1.5.5-2
links: PTS, VCS
area: main
in suites: sid
size: 2,772 kB
sloc: python: 1,643; makefile: 305; sh: 174
file content (383 lines) | stat: -rw-r--r-- 12,569 bytes
parent folder | download | duplicates (2)
Workceptor
==========

.. contents::
   :local:

Workceptor is a component of receptor that handles units of work.

``work-commands`` defines a type of work that can run on the node.

foo.yml

.. code-block:: yaml

    ---
    version: 2
    node:
      id: foo

    log-level:
      level: Debug

    tcp-listeners:
      - port: 2222

    control-services:
      - service: control
        filename: /tmp/foo.sock

    work-commands:
      - workType: echoint
        command: bash
        params:  "-c \"for i in {1..5}; do echo $i; sleep 1; done\""

bar.yml

.. code-block:: yaml

    ---
    version: 2
    node:
      id: bar

    log-level:
      level: Debug

    tcp-peer:
      address: localhost:2222

    control-services:
      - service: control

    work-commands:
      - worktype: echoint
        command: bash
        params:  "-c \"for i in {1..10}; do echo $i; sleep 1; done\""
      - workType: echopayload
        command: bash
        params: "-c \"while read -r line; do echo ${line^^}; sleep 3; done\""


Configuring work commands
--------------------------

``worktype`` User-defined name to give this work definition

``command`` The executable that is invoked when running this work

``params`` Command-line options passed to this executable


Local work
-----------

Start the work by connecting to the ``control-services`` and issuing a "work submit" command

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work submit echoint --no-payload
    Result:  Job Started
    Unit ID: t1BlAB18

Receptor started an instance of this work type, and labeled it with a unique "Unit ID"

Work results
-------------

Use the "Unit ID" to get work results

.. code-block:: bash

    receptorctl --socket /tmp/foo.sock work results t1BlAB18
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10


Remote work
------------

Although connected to `foo`, by providing the "--node" option the work can be started on node `bar`.

The work type must be defined on the node it is intended to run on, e.g. `bar` must have a ``work-command`` called "echoint", in this case.

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work submit echoint --node bar --no-payload
    Result:  Job Started
    Unit ID: 87Vwqb6A

Remote work submission ultimately results in two work units running at the same time; a local work unit and the remote work unit. These two units have their own Unit IDs. The local work unit's goal is to monitor and stream results back from the running remote work unit.

Sequence of events for remote work submission

- `foo` starts a local work unit of work type "remote". This is a special work type that is built into receptor.
- This work unit attempts to connect to `bar`'s control service and issue a "work submit echoint" command. From `bar`'s perspective, this is the exact same operation as if a user connected to `bar` directly and issued a work submit command. `bar` is not aware that `foo` is the one that issued the command.
- Once submitted, `foo` will stream work results back to itself and store it on disk. It also periodically gets the ``work status`` of the work running on `bar`. Status includes information about the work state and the stdout size.
- `foo` continues streaming stdout results until the size stored on disk matches the StdoutSize reported in `bar`'s status.

.. _work_payload:

Payload
--------

in `bar.yml`

.. code-block:: yaml

      - workType: echopayload
        command: bash
        params: "-c \"while read -r line; do echo ${line^^}; sleep 5; done\""

Here the bash command expects to read a line from stdin, echo the line in all uppercase letters, and sleep for 3 seconds.

Payloads can be passed into receptor using the "--payload" option.

.. code-block:: bash

    $ echo -e "hi\ni am foo\nwhat is your name" | receptorctl --socket /tmp/foo.sock work submit echopayload --node bar --payload - -f
    HI
    I AM FOO
    WHAT IS YOUR NAME

"--payload -" means the payload should be whatever the stdin is, which is piped in from the "echo -e ..." command.

Note: "-f" instructs receptorctl to follow the work unit immediately, i.e. stream results to stdout. One could also use "work results" to stream the results.


Runtime Parameters
-------------------

Work commands can be configured to allow parameters to be passed to commands when work is submitted:

.. code-block:: yaml

  work-commands:
    - workType: listcontents
      command: ls
      allowruntimeparams: true

The ``allowruntimeparams`` option will allow parameters to be passed to the work command by the
client submitting the work. The contents of a specific directory can be listed by passing the paths
to the receptor command as positional arguments immediately after the ``workType``:

.. code-block:: bash

    receptorctl --socket /tmp/foo.sock work submit --node bar --no-payload -f listcontents /root/ /bin/
    /bin/:
    bash
    sh

    /root/:
    helloworld.sh

Passing options or flags to the work command needs to be done using the ``--param`` parameter to
extend the ``params`` work command setting. The ``--all`` flag can be passed to the work command this way:

.. code-block:: bash

    receptorctl --socket /tmp/foo.sock work submit --node bar --no-payload -f --param params='--all' listcontents /root/
    .
    ..
    .bash_logout
    .bash_profile
    .bashrc
    .cache
    helloworld.sh


Work list
----------

"work list" returns information about all work units that have ran on this receptor node. The following shows two work units, ``12L8s8h2`` and ``T0oN0CAp``

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work list
    {'12L8s8h2': {'Detail': 'exit status 0',
                  'ExtraData': None,
                  'State': 2,
                  'StateName': 'Succeeded',
                  'StdoutSize': 21,
                  'WorkType': 'echoint'},
     'T0oN0CAp': {'Detail': 'Running: PID 1700818',
                  'ExtraData': {'Expiration': '0001-01-01T00:00:00Z',
                                'LocalCancelled': False,
                                'LocalReleased': False,
                                'RemoteNode': 'bar',
                                'RemoteParams': {},
                                'RemoteStarted': True,
                                'RemoteUnitID': 'ATDzdViR',
                                'RemoteWorkType': 'echoint',
                                'TLSClient': ''},
                  'State': 1,
                  'StateName': 'Running',
                  'StdoutSize': 4,
                  'WorkType': 'remote'},


Notice that ``T0oN0CAp`` was a remote work submission, therefore its work type is "remote". On `bar` there is a local unit ``ATDzdViR``, with the "echoint" work type.


Work cancel
------------

Cancel will stop any running work unit. Upon canceling a "remote" work unit, the local node will attempt to connect to the remote node's control service and issue a work cancel. If the remote node is down, receptor will periodically attempt to connect to the remote node to do the cancellation.

Work release
-------------

Release will cancel the work and then delete files on disk associated with that work unit. For remote work submission, release will attempt to delete files both locally and on the remote machine. Like work cancel, the release can be pending if the remote node is down. In that situation, the local files will remain on disk until the remote node can be contacted.

Work force-release
--------------------

It might be preferable to force a release, using the ``work force-release`` command. This will do a one-time attempt to connect to the remote node and issue a work release there. After this one attempt, it will then proceed to delete all local files associated with the work unit.

States
---------

A unit of work can be in Pending, Running, Succeeded, or Failed state

For local work, transitioning from Pending to Running occurs the moment the ``command`` executable is started

For remote work, transitioning from Pending to Running occurs when the status reported from the remote node has a Running state.

Signed work
------------

Remote work submissions can be digitally signed by the sender. The target node will verify the signature of the work command before starting the work unit.

A *single* pair of RSA public and private keys is created offline and distributed to the nodes. Distribute the public key (PKIX format) to any node that should receive work. Distribute the private key (PKCS1 format) to any node that needs authority to submit work.

The following commands can be used to create keys for signing work:

.. code-block:: bash

    openssl genrsa -out signworkprivate.pem 2048
    openssl rsa -in signworkprivate.pem -pubout -out signworkpublic.pem

in `bar.yml`

.. code-block:: yaml

    # PKIX
    work-verification:
      publickey: /full/path/signworkpublic.pem

      - workType: echopayload
        command: bash
        params: "-c \"while read -r line; do echo ${line^^}; sleep 5; done\""
        verifysignature: true

in `foo.yml`

.. code-block:: yaml

    # PKCS1
    work-signing:
      privatekey: /full/path/signworkprivate.pem
      tokenexpiration: 30m

Tokenexpiration determines how long a the signature is valid for. This expiration directly corresponds to the "expiresAt" field in the generated JSON web token. Valid units include "h" and "m", e.g. 1h30m for one hour and 30 minutes.

Use the "--signwork" parameter to sign the work.

.. code-block:: bash

    $ receptorctl --socket /tmp/foo.sock work submit echoint --node bar --no-payload --signwork

Units on disk
--------------

Netceptor, the main component of receptor that handles mesh connectivity and traffic, operates entirely in memory. That is, it does not store any state information on disk. However, Workceptor functionality is designed to be persistent across receptor restarts. Work units might be running commands that could take hours to complete, and as such needs to store some relevant information on disk in case the receptor process restarts.

By default receptor stores data under ``/tmp/receptor`` but can be changed by setting the ``datadir`` param under the ``node`` action in the config file.

For a given work unit, receptor will store files in ``{datadir}/{nodeID}/{unitID}/``.

Here is the receptor directory tree after running ``work submit echopayload`` described in :ref:`work_payload`.

.. code-block:: bash

    $ tree /tmp/receptor
    /tmp/receptor
    ├── bar
    │   └── NImim5WA
    │       ├── status
    │       ├── status.lock
    │       ├── stdin
    │       └── stdout
    └── foo
        └── BsAjS4wi
            ├── status
            ├── status.lock
            ├── stdin
            └── stdout

The main purpose of work unit ``BsAjS4wi`` on `foo` is to copy stdin, stdout, and status from ``NImim5WA`` on `bar` back to its own working directory.

``stdin`` is a copy of the submitted payload. The contents of this file is the same on both the local (`foo`) and remote (`bar`) machines.

.. code-block:: bash

    $ cat /tmp/receptor/bar/NImim5WA/stdin
    hi
    i am foo
    what is your name

``stdout`` contains the work unit results; the stdout of the command execution. It will also be the same on both the local node and remote node.

.. code-block:: bash

    $ cat /tmp/receptor/bar/NImim5WA/stdout
    HI
    I AM FOO
    WHAT IS YOUR NAME

``status`` contains additional information related to the work unit. The contents of status are different on `foo` and `bar`.

.. code-block:: bash

    $ cat /tmp/receptor/bar/NImim5WA/stdout
    {
       "State":2,
       "Detail":"exit status 0",
       "StdoutSize":30,
       "WorkType":"echopayload",
       "ExtraData":null
    }

.. code-block:: text

    $ cat /tmp/receptor/foo/BsAjS4wi/stdout
    {
       "State":2,
       "Detail":"exit status 0",
       "StdoutSize":30,
       "WorkType":"remote",
       "ExtraData":{
          "RemoteNode":"bar",
          "RemoteWorkType":"echopayload",
          "RemoteParams":{},
          "RemoteUnitID":"NImim5WA",
          "RemoteStarted":true,
          "LocalCancelled":false,
          "LocalReleased":false,
          "TLSClient":"",
          "Expiration":"0001-01-01T00:00:00Z"
       }
    }

.. image:: remote.png
   :alt: sequence of events during work remote submission

The sequence of events during a work remote submission. Blue lines indicate moments when receptor writes files to disk.