File: dagman-reference.rst

package info (click to toggle)
condor 23.9.6%2Bdfsg-2.1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 60,012 kB
  • sloc: cpp: 528,272; perl: 87,066; python: 42,650; ansic: 29,558; sh: 11,271; javascript: 3,479; ada: 2,319; java: 619; makefile: 615; xml: 613; awk: 268; yacc: 78; fortran: 54; csh: 24
file content (371 lines) | stat: -rw-r--r-- 13,741 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
:index:`DAGMan Quick Reference<single: DAGMan; DAGMan Quick Reference>`

Quick Reference
===============

DAG Commands
------------

General
^^^^^^^

:dag-cmd-def:`INCLUDE` (see :ref:`Full Description<DAG Include cmd>`)
    Parse the provided file as if it was inline to the current file.

    .. code-block:: condor-dagman

        INCLUDE filename

:dag-cmd-def:`JOB` (see :ref:`Full Description<DAGMan JOB>`)
    Create a normal DAG node to execute specified HTCondor jobs.

    .. code-block:: condor-dagman

        JOB NodeName SubmitDescription [DIR directory] [NOOP] [DONE]

:dag-cmd-def:`PARENT/CHILD` (see :ref:`Full Description<DAG node dependencies>`)
    Create dependencies between two or more DAG nodes.

    .. code-block:: condor-dagman

        PARENT ParentNodeName [ParentNodeName2 ... ] CHILD  ChildNodeName [ChildNodeName2 ... ]

:dag-cmd-def:`SPLICE` (see :ref:`Full Description<DAG splicing>`)
    Incorporate the specified DAG file into the structure of another DAG.

    .. code-block:: condor-dagman

        SPLICE SpliceName DagFileName [DIR directory]

:dag-cmd-def:`SUBDAG` (see :ref:`Full Description<subdag-external>`)
    Specify a DAG workflow to be submitted by :tool:`condor_submit_dag` and managed
    by a parent DAG.

    .. code-block:: condor-dagman

        SUBDAG EXTERNAL JobName DagFileName [DIR directory] [NOOP] [DONE]

:dag-cmd-def:`SUBMIT-DESCRIPTION` (see :ref:`Full Description<DAG submit description cmd>`)
    Create an inline job submit description that can be applied to multiple
    DAG nodes.

    .. code-block:: condor-dagman

        SUBMIT-DESCRIPTION DescriptionName {
            # submit attributes go here
        }

Node Behavior
^^^^^^^^^^^^^

:dag-cmd-def:`DONE`
    Mark a DAG node as done causing neither the associated jobs or scripts to execute.

    .. code-block:: condor-dagman

        DONE NodeName

:dag-cmd-def:`PRE_SKIP` (see :ref:`Full Description<Node pre skip cmd>`)
    Inform DAGMan to skip the remaining node execution if that nodes specified PRE
    script exits with a specified code.

    .. code-block:: condor-dagman

        PRE_SKIP <NodeName | ALL_NODES> non-zero-exit-code

:dag-cmd-def:`PRIORITY` (see :ref:`Full Description<DAG Node Priorities>`)
    Assign a node priority to control DAGMan node submission.

    .. code-block:: condor-dagman

        PRIORITY <NodeName | ALL_NODES> PriorityValue

:dag-cmd-def:`RETRY` (see :ref:`Full Description<Retry DAG Nodes>`)
    Inform DAGMan to retry a node up to a specified number of times when a failure
    occurs.

    .. code-block:: condor-dagman

        RETRY <NodeName | ALL_NODES> NumberOfRetries [UNLESS-EXIT value]

:dag-cmd-def:`SCRIPT` (see :ref:`Full Description<DAG Node Scripts>`)
    Apply a script to be executed on the AP for a specified node.

    .. code-block:: condor-dagman

        # PRE-Script
        SCRIPT [DEFER status time] [DEBUG filename type] PRE <NodeName | ALL_NODES> ExecutableName [arguments]
        # POST-Script
        SCRIPT [DEFER status time] [DEBUG filename type] POST <NodeName | ALL_NODES> ExecutableName [arguments]
        # HOLD-Script
        SCRIPT [DEFER status time] [DEBUG filename type] HOLD <NodeName | ALL_NODES> ExecutableName [arguments]

:dag-cmd-def:`VARS` (see :ref:`Full Description<DAGMan VARS>`)
    Specify a list of **key="Value"** pairs of information to be applied to the
    specified node's jobs as referable submit macros.

    .. code-block:: condor-dagman

        VARS <NodeName | ALL_NODES> [PREPEND | APPEND] macroname="string" [macroname2="string2" ... ]

Special Nodes
^^^^^^^^^^^^^

:dag-cmd-def:`FINAL` (see :ref:`Full Description<final-node>`)
    Create a DAG node guaranteed to run at the end of a DAG regardless
    of successful or failed execution.

    .. code-block:: condor-dagman

        FINAL NodeName SubmitDescription [DIR directory] [NOOP]

:dag-cmd-def:`PROVISIONER` (see :ref:`Full Description<DAG Provisioner Node>`)
    Create a DAG node responsible for provisioning resources to be utilized by other
    DAG nodes. Guaranteed to start before all other nodes.

    .. code-block:: condor-dagman

        PROVISIONER NodeName SubmitDescription

:dag-cmd-def:`SERVICE` (see :ref:`Full Description<DAG Service Node>`)
    Create a DAG node for specialized management/monitoring tasks. All service nodes
    are submitted prior to normal nodes.

    .. code-block:: condor-dagman

        SERVICE NodeName SubmitDescription

Throttling
^^^^^^^^^^

:dag-cmd-def:`CATEGORY` (see :ref:`Full Description<DAG throttling cmds>`)
    Assign a specified node to a DAG category.

    .. code-block:: condor-dagman

        CATEGORY <NodeName | ALL_NODES> CategoryName

:dag-cmd-def:`MAXJOBS` (see :ref:`Full Description<DAG throttling cmds>`)
    Set the max number of submitted list of jobs for a specified :dag-cmd:`CATEGORY`

    .. code-block:: condor-dagman

        MAXJOBS CategoryName MaxJobsValue

DAG Control
^^^^^^^^^^^

:dag-cmd-def:`ABORT-DAG-ON` (see :ref:`Full Description<abort-dag-on>`)
    Inform DAGMan to write a rescue file and exit when specified node exits with
    the specified value.

    .. code-block:: condor-dagman

        ABORT-DAG-ON <NodeName | ALL_NODES> AbortExitValue [RETURN DAGReturnValue]

:dag-cmd-def:`CONFIG` (see :ref:`Full Description<Per DAG Config>`)
    Specify custom DAGMan configuration file for DAGMan.

    .. code-block:: condor-dagman

        CONFIG filename

:dag-cmd-def:`ENV` (see :ref:`Full Description<DAG ENV cmd>`)
    Modify the DAGMan proper job's environment by explicitly setting environment
    variables or filtering variables from the :tool:`condor_submit_dag`\ s environment
    at submit time.

    .. code-block:: condor-dagman

        ENV GET VAR-1 [VAR-2 ... ]
        #  or
        ENV SET Key=Value;Key=Value; ...

:dag-cmd-def:`SET_JOB_ATTR` (see :ref:`Full Description<DAG set-job-attrs>`)
    Set a ClassAd attribute in the DAGMan proper job's ad.

    .. code-block:: condor-dagman

        SET_JOB_ATTR AttributeName = AttributeValue

:dag-cmd-def:`REJECT`
    Mark the DAG description file as rejected to prevent execution.

    .. code-block:: condor-dagman

        REJECT

Special Files
^^^^^^^^^^^^^

:dag-cmd-def:`DOT` (see :ref:`Full Description<visualizing-dags-with-dot>`)
    Inform DAGMan to produce a Graphiz Dot file for visualizing a DAG.

    .. code-block:: condor-dagman

        DOT filename [UPDATE | DONT-UPDATE] [OVERWRITE | DONT-OVERWRITE] [INCLUDE <dot-file-header>]

:dag-cmd-def:`JOBSTATE_LOG` (see :ref:`Full Description<DAGMan Machine Readable History>`)
    Inform DAGMan to produce a machine-readable event history file.

    .. code-block:: condor-dagman

        JOBSTATE_LOG filename

:dag-cmd-def:`NODE_STATUS_FILE` (see :ref:`Full Description<node-status-file>`)
    Inform DAGMan to produce a snapshot status file for the DAG nodes.

    .. code-block:: condor-dagman

        NODE_STATUS_FILE filename [minimumUpdateTime] [ALWAYS-UPDATE]

:dag-cmd-def:`SAVE_POINT_FILE` (see :ref:`Full Description<DAG Save Files>`)
    Inform DAGMan to write a save file the first time the specified node starts.

    .. code-block:: condor-dagman

        SAVE_POINT_FILE NodeName [Filename]

:index:`DAGMan Files<single: DAGMan; DAGMan Files>`

Produced Files
--------------

The following files are always produced automatically by DAGMan on execution. Where the
primary DAG is the only or first DAG file specified at submit time.

#. :tool:`condor_dagman` scheduler universe job files:
    .. parsed-literal::

        <Primary DAG>.condor.sub | DAGMan proper jobs submit description file.
        <Primary DAG>.dagman.log | DAGMan proper jobs event :subcom:`log` file.
        <Primary DAG>.lib.out    | DAGMan proper jobs :subcom:`output` file.
        <Primary DAG>.lib.err    | DAGMan proper jobs :subcom:`error` file.

#. DAGMan informational files:
    .. parsed-literal::

        <Primary DAG>.dagman.out | DAGMan processes debug log file.
        <Primary DAG>.nodes.log  | Shared job event log file for all jobs managed by DAGMan (Heart of DAGMan).
        <Primary DAG>.metrics    | JSON formatted file containing DAGMan metrics outputted at DAGMan exit.
#. Other:
    .. parsed-literal::

        <Primary DAG>.rescue<XXX> | Rescue DAG file denoting completed work from previous execution (see :ref:`Rescue DAG`).
        <Primary DAG>.lock        | DAGMan process lock file to prevent multiple executions of one DAG in the same directory.

Referable DAG Information
-------------------------

DAGMan provides various pieces of DAG information to scripts and jobs in the
form of special referable macros and job ClassAd attributes.

Job Macros
^^^^^^^^^^

Macros referable by job submit description as ``$(<macro>)``

.. parsed-literal::

    **JOB**              | Name of the node this job is associated with.
    **RETRY**            | Current node retry attempt value. Set to 0 on first execution.
    **FAILED_COUNT**     | Number of failed nodes currently in the DAG (intended for Final Node).
    **DAG_STATUS**       | Current :ad-attr:`DAG_Status` (intended for Final Node).
    **DAGManJobId**      | The job(s) :ad-attr:`DAGManJobId`.
    **DAG_PARENT_NAMES** | Comma separated list of node names that are parents of the node this job belongs.

Job ClassAd Attributes
^^^^^^^^^^^^^^^^^^^^^^

ClassAd attributes added to the job ad of all jobs managed by DAGMan.

.. parsed-literal::

    :ad-attr:`DAGManJobId`        | Job-Id of the DAGMan job that submitted this job.
    :ad-attr:`DAGNodeName`        | The node name of which this job belongs.
    :ad-attr:`DAGManNodeRetry`    | The nodes current retry number. First execution is 0.\
     This is only included if :macro:`DAGMAN_NODE_RECORD_INFO` includes ``Retry``.
    :ad-attr:`DAGParentNodeNames` | List of parent node names. Note depending on the number\
     of parent nodes this may be left empty.
    :ad-attr:`DAG_Status`         | Current DAG status (Intended for Final Nodes).

Script Macros
^^^^^^^^^^^^^

Macros that can be passed to a script as optional arguments like ``$<macro>``

.. parsed-literal::

    For All Scripts:
        **NODE**              | Name of the node this script is associated with.
        **RETRY**             | The node's current retry number. Set to 0 on first execution.
        **MAX_RETRIES**       | Maximum number of retries allowed for the node.
        **NODE_COUNT**        | The total number of nodes in the DAG (Including the :dag-cmd:`FINAL` node).
        **QUEUED_COUNT**      | The current number of nodes running jobs in the DAG.
        **DONE_COUNT**        | The current number of successfully completed nodes in the DAG.
        **FAILED_COUNT**      | The current number of failed nodes in the DAG.
        **FUTILE_COUNT**      | The current number of nodes that will never start in the DAG.
        **DAGID**             | The node's associated :ad-attr:`DAGManJobId`
        **DAG_STATUS**        | The current :ad-attr:`DAG_Status`.
    Only for POST Scripts:
        **CLUSTERID**         | The :ad-attr:`ClusterId` of the list of jobs associated with the node.
        **JOBID**             | The Job-ID (:ad-attr:`ClusterId` & :ad-attr:`ProcId`) of the last job in\
     the node's associated list of jobs.
        **JOB_COUNT**         | The total number of jobs associated with the node.
        **JOB_ABORT_COUNT**   | The number of jobs associated with the node that got an abort event.
        **SUCCESS**           | A boolean string that represents whether the node has been successful\
        up to this point (PRE script and list of jobs succeeded) (``True`` or ``False``).
        **RETURN**            | The exit code of the first failed job in the set or 0 for a\
     successful list of jobs execution.
        **EXIT_CODES**        | An ordered comma separated list of all :ad-attr:`ExitCode`\ s returned by\
     jobs associated with the node.
        **EXIT_CODE_COUNTS**  | An ordered comma separated list of the number of jobs that exited with a particular\
     :ad-attr:`ExitCode` (``{ExitCode}:{Count}``).
        **PRE_SCRIPT_RETURN** | Return value of the associated node's PRE Script.


DAG Submission and Management
-----------------------------

.. sidebar:: Tip for Querying All Jobs in a DAG

    When doing job queries to the AP queue or history, the constraint
    **-const "DAGManJobId==<DAG Job Id>"** can be used to return job
    ads for only the jobs submitted and managed by the specified DAG.

    **<DAG Job Id>** should be replaced with the :ad-attr:`ClusterId`
    of the DAGMan proper job.

For more in depth explanation of controlling a DAG see :ref:`DAG controls`

DAG Submission
^^^^^^^^^^^^^^

To submit a DAGMan workflow simply use :tool:`condor_submit_dag` on a DAG
description file.

.. code-block:: console

    $ condor_submit_dag diamond.dag

DAG Monitoring
^^^^^^^^^^^^^^

All the jobs managed by DAGMan and the DAGMan proper job itself can be monitored
with the tools listed below. :tool:`condor_q` by default returns a condensed overview
of jobs managed by DAGMan currently in the queue. To see all jobs individually use
the **-nobatch** flag.

+-----------------------------+-----------------------------+-----------------------------+
| :tool:`condor_q`            | :tool:`condor_watch_q`      | :tool:`htcondor dag status` |
+-----------------------------+-----------------------------+-----------------------------+

Stopping a DAG
^^^^^^^^^^^^^^

Pause/Restart
    A DAG can temporarily be stopped by using :tool:`condor_hold` on the DAGMan
    proper job. To restart the DAG simply use :tool:`condor_release`.
Remove
    To remove a DAG simply use :tool:`condor_rm` on the DAGMan proper job.