File: validating-jobs.rst

package info (click to toggle)
ecflow 5.15.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 51,868 kB
  • sloc: cpp: 269,341; python: 22,756; sh: 3,609; perl: 770; xml: 333; f90: 204; ansic: 141; makefile: 70
file content (104 lines) | stat: -rw-r--r-- 5,055 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
.. _tutorial-checking-job-creation:

Validating jobs
===============

Validating job creation
-----------------------

Before submitting the task, the server will transform the :term:`ecf script` to a :term:`job file`.

This process, known as :term:`job creation`, is performed by the :term:`ecflow_server` when the task is
ready for submission, and includes the following steps:

* Locating and loading the :term:`ecf script` -- see more about the :term:`location algorithm <ecf file location algorithm>`.
* Perform :term:`pre-processing` of :code:`%include` :term:`directives`
* Perform :term:`variable substitution`.
* Store the resulting script in the :term:`job file`, with a :code:`.job` extension

The resulting :term:`job file` is the script that the :term:`ecflow_server` will actually submit for execution.

Considering the :code:`$HOME/course/test/t1.ecf` file, defined in the previous section,
the generation of the :term:`job file` will include the following steps:

* :code:`%include "../head.h"` will be substituted by the content of the selected file.
* :code:`%include "../tail.h"` will be substituted by the content of the selected file.
* All variable occurrences (i.e. any text of the form :code:`%<VAR>%`) will be substituted by the value of the named variable. For example, :code:`%ECF_NAME%` will be replaced by :code:`t1`.

For practical purposes, it is often useful to check the :term:`job creation` process even before loading the :term:`suite definition`.
This allows the early detection of potential problems, such as missing ecf script or include files, references to unspecified variables and other errors during :term:`pre-processing`.

Using the ecFlow Python API it is possible to execute the :term:`job creation` process locally.


    .. tabs::


        .. tab:: Python

            Consider the following regarding the :term:`job creation` process performed by the Python API:

            * The job creation is *independent* of the :term:`ecflow_server`, so default values will be used for server specific
              variables such as :code:`ECF_PORT` and :code:`ECF_HOST`.
            * The resulting job files will use extension :code:`.job0`, whereas the server will always generate jobs with extension
              :code:`.job<N>` (where :code:`<N>` corresponds to :term:`ECF_TRYNO` which is never zero).
            * The :term:`job file` is created in the same directory as the :term:`ecf script`.

            .. literalinclude:: src/checking-job-creation.py
                :language: python
                :caption: $HOME/course/validate.py

            The script above loads the suite definition from the :file:`$HOME/course/test/t1.ecf` file and
            performs the check via the call to :py:class:`ecflow.Defs.check_job_creation`. An all-in-one script
            could also create the suite definition programmatically, followed by the job creation check.

**What to do:**

#. Create the :code:`$HOME/course/validate.py` script as shown above, and execute it as follows:

    .. code-block:: shell

       cd $HOME/course

       # Either run by explicitly invoking python
       python3 ./validate.py

       # Or make the script executable, and run it directly
       chmod +x validate.py
       ./validate.py

#. Examine the job file :file:`$HOME/course/test/t1.job0`, in particular note the variable substitutions made by the ecFlow server (e.g. :code:`ECF_PORT`, :code:`ECF_HOST`).

Validating job execution
------------------------

The previous section demonstrated how a task script can be transformed into a job script.

Unfortunatelly, trying to run this job script locally will fail, because the :code:`ecflow_client`
commands embedded in the script/job will not be able to communicate with the server.
In particular, the server specific variables such as :code:`ECF_PORT` and :code:`ECF_HOST`
where generated by the Python API and will not typically correspond to an existing ecFlow server.
Even if a server was running on the specified host and port, the job would be rejected because
the :code:`ECF_PASSWD` variable would be used to identify the specific task. When this happens,
i.e. a job uses an incorrect :code:`ECF_PASSWD`, the job is treated as a zombie and essentially ignored
by the server.

To disable the calls to :code:`ecflow_client`, and allow the job to be executed locally,
export the environment variable :code:`NO_ECF=1`. When :code:`NO_ECF` is set, the :code:`ecflow_client`
executable returns immediately with a success value, and allows the job to proceed uninterrupted.

.. code-block:: shell

    export NO_ECF=1
    $HOME/course/test/t1.job0

.. warning::

    :code:`NO_ECF` can be used in any job script, regardless if it was generated using the Python API
    or by the ecFlow server itself, and is useful for testing and debugging purposes.

    This makes :code:`NO_ECF` usefull, but should **never** be used in a production environment.

**What to do**

#. Run the job :code:`$HOME/course/test/t1.job0`, disabling the calls to :code:`ecflow_client`.