1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371
|
:index:`DAGMan Quick Reference<single: DAGMan; DAGMan Quick Reference>`
Quick Reference
===============
DAG Commands
------------
General
^^^^^^^
:dag-cmd-def:`INCLUDE` (see :ref:`Full Description<DAG Include cmd>`)
Parse the provided file as if it was inline to the current file.
.. code-block:: condor-dagman
INCLUDE filename
:dag-cmd-def:`JOB` (see :ref:`Full Description<DAGMan JOB>`)
Create a normal DAG node to execute specified HTCondor jobs.
.. code-block:: condor-dagman
JOB NodeName SubmitDescription [DIR directory] [NOOP] [DONE]
:dag-cmd-def:`PARENT/CHILD` (see :ref:`Full Description<DAG node dependencies>`)
Create dependencies between two or more DAG nodes.
.. code-block:: condor-dagman
PARENT ParentNodeName [ParentNodeName2 ... ] CHILD ChildNodeName [ChildNodeName2 ... ]
:dag-cmd-def:`SPLICE` (see :ref:`Full Description<DAG splicing>`)
Incorporate the specified DAG file into the structure of another DAG.
.. code-block:: condor-dagman
SPLICE SpliceName DagFileName [DIR directory]
:dag-cmd-def:`SUBDAG` (see :ref:`Full Description<subdag-external>`)
Specify a DAG workflow to be submitted by :tool:`condor_submit_dag` and managed
by a parent DAG.
.. code-block:: condor-dagman
SUBDAG EXTERNAL JobName DagFileName [DIR directory] [NOOP] [DONE]
:dag-cmd-def:`SUBMIT-DESCRIPTION` (see :ref:`Full Description<DAG submit description cmd>`)
Create an inline job submit description that can be applied to multiple
DAG nodes.
.. code-block:: condor-dagman
SUBMIT-DESCRIPTION DescriptionName {
# submit attributes go here
}
Node Behavior
^^^^^^^^^^^^^
:dag-cmd-def:`DONE`
Mark a DAG node as done causing neither the associated jobs or scripts to execute.
.. code-block:: condor-dagman
DONE NodeName
:dag-cmd-def:`PRE_SKIP` (see :ref:`Full Description<Node pre skip cmd>`)
Inform DAGMan to skip the remaining node execution if that nodes specified PRE
script exits with a specified code.
.. code-block:: condor-dagman
PRE_SKIP <NodeName | ALL_NODES> non-zero-exit-code
:dag-cmd-def:`PRIORITY` (see :ref:`Full Description<DAG Node Priorities>`)
Assign a node priority to control DAGMan node submission.
.. code-block:: condor-dagman
PRIORITY <NodeName | ALL_NODES> PriorityValue
:dag-cmd-def:`RETRY` (see :ref:`Full Description<Retry DAG Nodes>`)
Inform DAGMan to retry a node up to a specified number of times when a failure
occurs.
.. code-block:: condor-dagman
RETRY <NodeName | ALL_NODES> NumberOfRetries [UNLESS-EXIT value]
:dag-cmd-def:`SCRIPT` (see :ref:`Full Description<DAG Node Scripts>`)
Apply a script to be executed on the AP for a specified node.
.. code-block:: condor-dagman
# PRE-Script
SCRIPT [DEFER status time] [DEBUG filename type] PRE <NodeName | ALL_NODES> ExecutableName [arguments]
# POST-Script
SCRIPT [DEFER status time] [DEBUG filename type] POST <NodeName | ALL_NODES> ExecutableName [arguments]
# HOLD-Script
SCRIPT [DEFER status time] [DEBUG filename type] HOLD <NodeName | ALL_NODES> ExecutableName [arguments]
:dag-cmd-def:`VARS` (see :ref:`Full Description<DAGMan VARS>`)
Specify a list of **key="Value"** pairs of information to be applied to the
specified node's jobs as referable submit macros.
.. code-block:: condor-dagman
VARS <NodeName | ALL_NODES> [PREPEND | APPEND] macroname="string" [macroname2="string2" ... ]
Special Nodes
^^^^^^^^^^^^^
:dag-cmd-def:`FINAL` (see :ref:`Full Description<final-node>`)
Create a DAG node guaranteed to run at the end of a DAG regardless
of successful or failed execution.
.. code-block:: condor-dagman
FINAL NodeName SubmitDescription [DIR directory] [NOOP]
:dag-cmd-def:`PROVISIONER` (see :ref:`Full Description<DAG Provisioner Node>`)
Create a DAG node responsible for provisioning resources to be utilized by other
DAG nodes. Guaranteed to start before all other nodes.
.. code-block:: condor-dagman
PROVISIONER NodeName SubmitDescription
:dag-cmd-def:`SERVICE` (see :ref:`Full Description<DAG Service Node>`)
Create a DAG node for specialized management/monitoring tasks. All service nodes
are submitted prior to normal nodes.
.. code-block:: condor-dagman
SERVICE NodeName SubmitDescription
Throttling
^^^^^^^^^^
:dag-cmd-def:`CATEGORY` (see :ref:`Full Description<DAG throttling cmds>`)
Assign a specified node to a DAG category.
.. code-block:: condor-dagman
CATEGORY <NodeName | ALL_NODES> CategoryName
:dag-cmd-def:`MAXJOBS` (see :ref:`Full Description<DAG throttling cmds>`)
Set the max number of submitted list of jobs for a specified :dag-cmd:`CATEGORY`
.. code-block:: condor-dagman
MAXJOBS CategoryName MaxJobsValue
DAG Control
^^^^^^^^^^^
:dag-cmd-def:`ABORT-DAG-ON` (see :ref:`Full Description<abort-dag-on>`)
Inform DAGMan to write a rescue file and exit when specified node exits with
the specified value.
.. code-block:: condor-dagman
ABORT-DAG-ON <NodeName | ALL_NODES> AbortExitValue [RETURN DAGReturnValue]
:dag-cmd-def:`CONFIG` (see :ref:`Full Description<Per DAG Config>`)
Specify custom DAGMan configuration file for DAGMan.
.. code-block:: condor-dagman
CONFIG filename
:dag-cmd-def:`ENV` (see :ref:`Full Description<DAG ENV cmd>`)
Modify the DAGMan proper job's environment by explicitly setting environment
variables or filtering variables from the :tool:`condor_submit_dag`\ s environment
at submit time.
.. code-block:: condor-dagman
ENV GET VAR-1 [VAR-2 ... ]
# or
ENV SET Key=Value;Key=Value; ...
:dag-cmd-def:`SET_JOB_ATTR` (see :ref:`Full Description<DAG set-job-attrs>`)
Set a ClassAd attribute in the DAGMan proper job's ad.
.. code-block:: condor-dagman
SET_JOB_ATTR AttributeName = AttributeValue
:dag-cmd-def:`REJECT`
Mark the DAG description file as rejected to prevent execution.
.. code-block:: condor-dagman
REJECT
Special Files
^^^^^^^^^^^^^
:dag-cmd-def:`DOT` (see :ref:`Full Description<visualizing-dags-with-dot>`)
Inform DAGMan to produce a Graphiz Dot file for visualizing a DAG.
.. code-block:: condor-dagman
DOT filename [UPDATE | DONT-UPDATE] [OVERWRITE | DONT-OVERWRITE] [INCLUDE <dot-file-header>]
:dag-cmd-def:`JOBSTATE_LOG` (see :ref:`Full Description<DAGMan Machine Readable History>`)
Inform DAGMan to produce a machine-readable event history file.
.. code-block:: condor-dagman
JOBSTATE_LOG filename
:dag-cmd-def:`NODE_STATUS_FILE` (see :ref:`Full Description<node-status-file>`)
Inform DAGMan to produce a snapshot status file for the DAG nodes.
.. code-block:: condor-dagman
NODE_STATUS_FILE filename [minimumUpdateTime] [ALWAYS-UPDATE]
:dag-cmd-def:`SAVE_POINT_FILE` (see :ref:`Full Description<DAG Save Files>`)
Inform DAGMan to write a save file the first time the specified node starts.
.. code-block:: condor-dagman
SAVE_POINT_FILE NodeName [Filename]
:index:`DAGMan Files<single: DAGMan; DAGMan Files>`
Produced Files
--------------
The following files are always produced automatically by DAGMan on execution. Where the
primary DAG is the only or first DAG file specified at submit time.
#. :tool:`condor_dagman` scheduler universe job files:
.. parsed-literal::
<Primary DAG>.condor.sub | DAGMan proper jobs submit description file.
<Primary DAG>.dagman.log | DAGMan proper jobs event :subcom:`log` file.
<Primary DAG>.lib.out | DAGMan proper jobs :subcom:`output` file.
<Primary DAG>.lib.err | DAGMan proper jobs :subcom:`error` file.
#. DAGMan informational files:
.. parsed-literal::
<Primary DAG>.dagman.out | DAGMan processes debug log file.
<Primary DAG>.nodes.log | Shared job event log file for all jobs managed by DAGMan (Heart of DAGMan).
<Primary DAG>.metrics | JSON formatted file containing DAGMan metrics outputted at DAGMan exit.
#. Other:
.. parsed-literal::
<Primary DAG>.rescue<XXX> | Rescue DAG file denoting completed work from previous execution (see :ref:`Rescue DAG`).
<Primary DAG>.lock | DAGMan process lock file to prevent multiple executions of one DAG in the same directory.
Referable DAG Information
-------------------------
DAGMan provides various pieces of DAG information to scripts and jobs in the
form of special referable macros and job ClassAd attributes.
Job Macros
^^^^^^^^^^
Macros referable by job submit description as ``$(<macro>)``
.. parsed-literal::
**JOB** | Name of the node this job is associated with.
**RETRY** | Current node retry attempt value. Set to 0 on first execution.
**FAILED_COUNT** | Number of failed nodes currently in the DAG (intended for Final Node).
**DAG_STATUS** | Current :ad-attr:`DAG_Status` (intended for Final Node).
**DAGManJobId** | The job(s) :ad-attr:`DAGManJobId`.
**DAG_PARENT_NAMES** | Comma separated list of node names that are parents of the node this job belongs.
Job ClassAd Attributes
^^^^^^^^^^^^^^^^^^^^^^
ClassAd attributes added to the job ad of all jobs managed by DAGMan.
.. parsed-literal::
:ad-attr:`DAGManJobId` | Job-Id of the DAGMan job that submitted this job.
:ad-attr:`DAGNodeName` | The node name of which this job belongs.
:ad-attr:`DAGManNodeRetry` | The nodes current retry number. First execution is 0.\
This is only included if :macro:`DAGMAN_NODE_RECORD_INFO` includes ``Retry``.
:ad-attr:`DAGParentNodeNames` | List of parent node names. Note depending on the number\
of parent nodes this may be left empty.
:ad-attr:`DAG_Status` | Current DAG status (Intended for Final Nodes).
Script Macros
^^^^^^^^^^^^^
Macros that can be passed to a script as optional arguments like ``$<macro>``
.. parsed-literal::
For All Scripts:
**NODE** | Name of the node this script is associated with.
**RETRY** | The node's current retry number. Set to 0 on first execution.
**MAX_RETRIES** | Maximum number of retries allowed for the node.
**NODE_COUNT** | The total number of nodes in the DAG (Including the :dag-cmd:`FINAL` node).
**QUEUED_COUNT** | The current number of nodes running jobs in the DAG.
**DONE_COUNT** | The current number of successfully completed nodes in the DAG.
**FAILED_COUNT** | The current number of failed nodes in the DAG.
**FUTILE_COUNT** | The current number of nodes that will never start in the DAG.
**DAGID** | The node's associated :ad-attr:`DAGManJobId`
**DAG_STATUS** | The current :ad-attr:`DAG_Status`.
Only for POST Scripts:
**CLUSTERID** | The :ad-attr:`ClusterId` of the list of jobs associated with the node.
**JOBID** | The Job-ID (:ad-attr:`ClusterId` & :ad-attr:`ProcId`) of the last job in\
the node's associated list of jobs.
**JOB_COUNT** | The total number of jobs associated with the node.
**JOB_ABORT_COUNT** | The number of jobs associated with the node that got an abort event.
**SUCCESS** | A boolean string that represents whether the node has been successful\
up to this point (PRE script and list of jobs succeeded) (``True`` or ``False``).
**RETURN** | The exit code of the first failed job in the set or 0 for a\
successful list of jobs execution.
**EXIT_CODES** | An ordered comma separated list of all :ad-attr:`ExitCode`\ s returned by\
jobs associated with the node.
**EXIT_CODE_COUNTS** | An ordered comma separated list of the number of jobs that exited with a particular\
:ad-attr:`ExitCode` (``{ExitCode}:{Count}``).
**PRE_SCRIPT_RETURN** | Return value of the associated node's PRE Script.
DAG Submission and Management
-----------------------------
.. sidebar:: Tip for Querying All Jobs in a DAG
When doing job queries to the AP queue or history, the constraint
**-const "DAGManJobId==<DAG Job Id>"** can be used to return job
ads for only the jobs submitted and managed by the specified DAG.
**<DAG Job Id>** should be replaced with the :ad-attr:`ClusterId`
of the DAGMan proper job.
For more in depth explanation of controlling a DAG see :ref:`DAG controls`
DAG Submission
^^^^^^^^^^^^^^
To submit a DAGMan workflow simply use :tool:`condor_submit_dag` on a DAG
description file.
.. code-block:: console
$ condor_submit_dag diamond.dag
DAG Monitoring
^^^^^^^^^^^^^^
All the jobs managed by DAGMan and the DAGMan proper job itself can be monitored
with the tools listed below. :tool:`condor_q` by default returns a condensed overview
of jobs managed by DAGMan currently in the queue. To see all jobs individually use
the **-nobatch** flag.
+-----------------------------+-----------------------------+-----------------------------+
| :tool:`condor_q` | :tool:`condor_watch_q` | :tool:`htcondor dag status` |
+-----------------------------+-----------------------------+-----------------------------+
Stopping a DAG
^^^^^^^^^^^^^^
Pause/Restart
A DAG can temporarily be stopped by using :tool:`condor_hold` on the DAGMan
proper job. To restart the DAG simply use :tool:`condor_release`.
Remove
To remove a DAG simply use :tool:`condor_rm` on the DAGMan proper job.
|