File: htcondor.rst

package info (click to toggle)
condor 23.9.6%2Bdfsg-2.1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 60,012 kB
  • sloc: cpp: 528,272; perl: 87,066; python: 42,650; ansic: 29,558; sh: 11,271; javascript: 3,479; ada: 2,319; java: 619; makefile: 615; xml: 613; awk: 268; yacc: 78; fortran: 54; csh: 24
file content (327 lines) | stat: -rw-r--r-- 14,151 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
.. _htcondor_command:

*htcondor*
===============

Manage HTCondor jobs, job sets, dags, event logs, and resources
:index:`htcondor<single: htcondor; HTCondor commands>`\ :index:`htcondor command`

Synopsis
--------

**htcondor** [ **-h** | **-\-help** ] [ **-v** | **-q** ]

| **htcondor** **job** *submit* [**-\-resource** *resource-type*] [**-\-runtime** *time-seconds*] [**-\-email** *email-address*] submit_file
| **htcondor** **job** *status* [**-\-resource** *resource-type*] [**-\-skip-history**] job_id
| **htcondor** **job** *out* [**-\-resource** *resource-type*] [**-\-skip-history**] job_id
| **htcondor** **job** *error* [**-\-resource** *resource-type*] [**-\-skip-history**] job_id
| **htcondor** **job** *log* [**-\-resource** *resource-type*] [**-\-skip-history**] job_id
| **htcondor** **job** *resources* [**-\-resource** *resource-type*] [**-\-skip-history**] job_id

| **htcondor** **jobset** *submit* description-file
| **htcondor** **jobset** *list* [**-\-allusers**]
| **htcondor** **jobset** *status* job-set-name [**-\-owner** *user-name*] [**-\-nobatch**] [**-\-skip-history**]
| **htcondor** **jobset** *remove* job-set-name [**-\-owner** *user-name*]

| **htcondor** **dag** *submit* dag-file
| **htcondor** **dag** *status* dagman-job-id

| **htcondor** **eventlog** *read* [**-csv** | **-json**] [**-\-groupby attribute**] eventlog [eventlog2 [eventlog3 ...]]
| **htcondor** **eventlog** *follow* [**-csv** | **-json**] [**-\-groupby attribute**] eventlog

| **htcondor** **annex** *create* [*description-options*] annex-name queue\@system
| **htcondor** **annex** *add* [*description-options*] annex-name queue\@system
| **htcondor** **annex** *status* annex-name
| **htcondor** **annex** *shutdown* annex-name
| **htcondor** **annex** *systems*

| **htcondor** **credential** *list*
| **htcondor** **credential** *add* password|kerberos|oauth2 credential-file [**-\-service service**] [**-\-handle handle**]
| **htcondor** **credential** *remove* password|kerberos|oauth2 [**-\-service service**] [**-\-handle handle**]

Description
-----------

*htcondor* is a tool for managing HTCondor jobs, job sets, resources, event
logs, DAGs, and annexes.  It can replace *condor_submit*, *condor_submit_dag*,
*condor_q*, *condor_status*, and *condor_userlog*, and adds new
functionality and features.  The user interface is more consistent than its
predecessor tools.

The first argument of the *htcondor* command (ignoring any global options) is
the *noun* representing an object in the HTCondor system to be operated on.
The nouns include an individual *job*, *jobset*, *eventlog*, *dag*,
or *annex*.  Each noun is then followed by a noun-specific *verb* that
describes the operation on that noun.

One of the following optional global option may appear before the noun:

Global Options
--------------

 **htcondor -h**, **htcondor -\-help**
     Display the help message.  Can also be specified after any
     noun or verb to display the options available for each noun or verb.
 **htcondor -q ...**
     Reduce verbosity of log messages.
 **htcondor -v ...**
     Increase verbosity of log messages.

A noun-specific verb appears after each noun; the verbs are sorted by noun in
the list, which includes with their individual option flags.

Job Verbs
---------

 **htcondor job submit** *submit_file*
     Takes as an argument a submit file in the *condor_submit* job submit
     description language, and places a new job in an Access Point

     **htcondor job submit options**

          **htcondor job submit -\-resource** *resource_type submit_file*
            Resource type used to run this job. Currently supports ``Slurm`` and ``EC2``.
            Assumes the necessary setup is complete and security tokens available.
          **htcondor job submit -\-runtime** *runtime_in_seconds submit_file*
            Amount of time in seconds to allocate resources.
            Used in conjunction with the *-\-resource* flag.
          **htcondor job submit -\-email** *address submit_file*
            Email address to receive notification messages.
            Used in conjunction with the *-\-resource* flag.

 **htcondor job status**
     Takes as an argument a job id in the form of clusterid.procid,
     and returns a human readable presentation of the status
     of that job.

     **job status option**

      **htcondor job status -\-skip-history** *job.id*

        Passed to the *status* verb to skip checking history
        if job not found in the active job queue.

 **htcondor job out**
     Takes as an argument a job id in the form of clusterid.procid,
     and prints out the contents of that job's standard output
     file, assuming that it exists on the AP.

 **htcondor job err**
     Takes as an argument a job id in the form of clusterid.procid,
     and prints out the contents of that job's standard error
     file, assuming that it exists on the AP.

 **htcondor job log**
     Takes as an argument a job id in the form of clusterid.procid,
     and prints out the contents of that job's event log
     file.  If the job shared an event log file with other jobs,
     the complete event log file will be printed, which may contain
     events for other jobs.

 **htcondor job resources**
     Takes as an argument a job id in the form of clusterid.procid,
     and returns a human readable presentation the machine resource
     used by this job.

Jobset Verbs
------------

 **htcondor jobset submit** *submit_file*
     Takes as an argument a submit file in the *condor_submit* job submit
     description language, and places a new job set in an Access Point

 **htcondor jobset list**
    Succinctly lists all the jobsets in the queue which are owned by the current user.

     **htcondor jobset list options**

          **htcondor jobset list -\-allusers**
            Shows jobs from all users, not just those owned by the current user.

 **htcondor jobset status** *submit_file*
     Takes as an argument a job set name, and shows detailed information about
     that job set.

     **htcondor jobset status options**

          **htcondor jobset status -\-nobatch**
            Shows jobs in a more detailed view, one line per job

          **htcondor jobset status -\-owner** *ownername*
            Shows jobs from the specified job owner.

          **htcondor jobset status -\-skiphistory**
            Shows detailed information only about active jobs in the queue, and
            ignore historical jobs which have left the queue.  This runs much
            faster.


 **htcondor jobset remove** *job_name*
     Takes as an argument a *job_name* in the queue, and removes it from
     the Access Point.

     **htcondor jobsets remove options**

          **htcondor jobset remove -\-owner=owner_name**
          Removes all jobs owned by the given owner.

Eventlog Verbs
--------------

 **htcondor eventlog read** *logfile* *optional-other-logfile*
     Takes one or more arguments, which are event log files to process.  It may be the per-job or
     per-jobset eventlog, which was specified by the *log = some_file* in the
     submit description language.  For a dag, it may also be the *nodes.log*
     file that all dags generate.  Or, if the global event log is enabled by an
     administrator with the *EVENT_LOG* configuration knob, it may be the global
     event log, containing information about all jobs on the Access point.

     Given this, `htcondor eventlog read` returns information about all
     the contained jobs, and their status. It runs much faster than
     *condor_history*, because these logs are more concise than the history
     files.  Unlike *condor_history*, it will also show information about
     jobs that have not yet left the queue.

 **htcondor eventlog follow** *logfile*
     Takes as an argument an event log to process, as above, but instead
     of processing that file to completion, it does the equivalent of
     *tail -f*, and runs until interruption, emitting information about
     jobs as it appears in the file.

     **Eventlog Options**

       **-\-csv**
          By default, *htcondor eventlog read* emits a table of information
          in human readable format.  With this option, the output is in
          a command separated value format, suitable for injestion by a spreadsheet
          or database.

       **-\-json**
          Emits output in the json format. Only one of **-csv** or **-json** should
          be given.

      **-\-group-by attributeName**
          With a job ad attribute name, instead of one line per job, emit one line
          summarizing all jobs that share the same value for the attribute name
          given.  In the OSG, the GLIDEIN_SITE attribute is injected into all jobs,
          so one can quickly get a count of all jobs running, idle and exitted
          per site by using this option.

Annex Verbs
-----------

An *annex* is a named set of leased resources.  If the AP's administrator
has enabled this command, any submitter who can run jobs on one of the
supported systems can use resources from that system to run jobs placed
at that AP.

  | **htcondor annex create** [*description-options*] *annex-name* *queue@system*
  | **htcondor annex add** [*description-options*] *annex-name* *queue@system*

    Create new annex with a given *annex-name* using resources from the
    specified *queue* at the specific *system*.  The description options
    are the same for creating a new annex and for adding more resources
    to the same annex.  You will be prompted to login to the system.

    **Description Options**

        **-\-nodes** *nodes*
            Number of nodes to request.  Defaults to 1.
        **-\-lifetime** *lifetime*
            Annex lifetime (in seconds).  Defaults to 3600.  After this
            length of time, the annex terminates even if jobs are running.
        **-\-cpus** *cpus*
            Number of CPUs to request (shared queues only).  Unset by
            default.
        **-\-mem_mb** *memory*
            Memory (in MB) to request (shared queues only).  Unset by
            default.
        **-\-gpus** *gpu-count*
            Number of GPUs to request (GPU queues only).  Unset by default.
        **-\-gpu-type** *type*
            Type of GPU to request (GPU queues only).  Unset by default.
        **-\-idle-time** *seconds*
            The number of seconds to remain idle (not running any jobs)
            before shutting down.  Default and suggested minimum is
            300 seconds.
        **-\-login-name** *login*
            The (SSH) login name to use for this capacity request.
            Uses SSH's default.
        **-\-login-host** *host*
            The (SSH) login name to use for this capacity request.
            The default is system-specific.

  **htcondor annex status** *annex-name*

    Prints human-readable information about the state of the named annex.

  **htcondor annex shutdown** *annex-name*

    Shuts the named annex down, releasing its resources.

  **htcondor annex systems**

    Displays the list of supported systems and their queues.

Credential Verbs
----------------

A *credential* is (part of) the authentication data necessary to verify
identity (or capability).  This noun refers to three different types of
credentials: ``password``, ``kerberos``, and ``oauth2``.  For this tool,
``password`` credentials are only useful on Windows, where they are
required to run a job as its submitter.  Likewise, ``kerberos``
credentials are only useful on APs which use Kerberos; HTCondor can run
jobs with the Kerberos credentials of their submitters, usually to allow
them to access files of AFS.  Finally, ``oauth2`` credentials refer to
a number of different kinds of credentials usually (but not always) obtained
via the OAuth2 protocol, but which HTCondor knows how to refresh and
distribute to jobs which request them.

  **htcondor credential list**

    Lists the credentials associated with the current user.  (To be precise,
    the identifier the current user authenticates as to HTCondor when they
    run this command.)  Windows passwords and Kerberos credentials are unique
    for each such identity, and only their presence (and last-refresh time)
    is reported.  A user may have multiple OAuth2 credentials, one or more
    from one or more different services, distinguished by their handles.  The
    service name, handle name, and file name in the ``$CONDOR_CREDS``
    directory are listed, in addition to the last-refresh time, for each
    OAuth2 credential.

  **htcondor credential add** **password|kerberos|oauth2** *credential-file* [**-\-service service**] [**-\-handle handle**]

    Sets the stored Windows password, Kerberos credential, or OAuth2
    credential to the contents of the named file.  For OAuth2 credentials,
    the service and handle will be derived from the file name unless
    specified with the corresponding flags.

  **htcondor credential remove** **password|kerberos|oauth2** [**-\-service service**] [**-\-handle handle**]

    Unsets the stored Windows password, Kerberos credential, or OAuth2
    credential(s).  If you specify a service, the credential from that
    service without a handle will be removed.  To remove a specific credential,
    you must specify both its service and its handle.  If you specify neither
    service nor handle, all OAuth2 tokens are removed.

Examples
--------

.. code-block:: console

    $ htcondor eventlog read logfile

    Job       Host            Start Time   Evict Time   Evictions   Wall Time     Good Time     CPU Usage
    19989.0   slot1_1@speedy  5/18 12:34   5/18 12:54   0           0+00:20:00    0+00:20:00    0+00:00:00
    19990.0   slot1_1@lumpy   5/22 18:51   5/22 18:51   1           0+00:02:00    0+00:00:00    0+00:00:43
    20003.0   slot1_1@chtc    8/9 23:33    8/9 23:37    1           0+00:04:00    0+00:00:00    0+00:00:00
    20004.0   slot1_1@wisc    8/9 23:38    8/9 23:58    0           0+00:20:00    0+00:20:00    0+00:00:00



Exit Status
-----------

*htcondor* will exit with a non-zero status value if it fails and
zero status if it succeeds.