File: logging.rst

package info (click to toggle)
condor 23.9.6%2Bdfsg-2.1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 60,012 kB
  • sloc: cpp: 528,272; perl: 87,066; python: 42,650; ansic: 29,558; sh: 11,271; javascript: 3,479; ada: 2,319; java: 619; makefile: 615; xml: 613; awk: 268; yacc: 78; fortran: 54; csh: 24
file content (303 lines) | stat: -rw-r--r-- 14,387 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
Files, Directories and Logs
===========================

:index:`logging`

HTCondor records many types of information in a variety of logs.
Administration may require locating and using the contents of a log to
debug issues. Listed here are details of the logs, to aid in
identification.

Job and Daemon Logs
-------------------

job event log
    The job event log is an optional, chronological list of events that
    occur as a job runs. The job event log is written on the submit
    machine. The submit description file for the job requests a job
    event log with the submit command
    :subcom:`log[definition]`. The log is created
    on and remains on the access point. Contents of the log are detailed
    in the :ref:`users-manual/managing-a-job:in the job event log file` section.
    Examples of events are that the job is running, that the job is placed on
    hold, or that the job completed.

daemon logs
    Each daemon configured to have a log writes events relevant to that
    daemon. Each event written consists of a timestamp and message. The
    name of the log file is set by the value of configuration variable
    :macro:`<SUBSYS>_LOG`, where :macro:`<SUBSYS>` is
    replaced by the name of the daemon. The log is not permitted to grow
    without bound; log rotation takes place after a configurable maximum
    size or length of time is encountered. This maximum is specified by
    configuration variable :macro:`MAX_<SUBSYS>_LOG`.

    Which events are logged for a particular daemon are determined by
    the value of configuration variable :macro:`<SUBSYS>_DEBUG`. The
    possible values for :macro:`<SUBSYS>_DEBUG` categorize events, such
    that it is possible to control the level and quantity of events
    written to the daemon's log.

    Configuration variables that affect daemon logs are

    +------------------------------------+
    |:macro:`MAX_NUM_<SUBSYS>_LOG`       |
    +------------------------------------+
    | :macro:`<SUBSYS>_LOG_KEEP_OPEN`    |
    +------------------------------------+
    |:macro:`TRUNC_<SUBSYS>_LOG_ON_OPEN` |
    +------------------------------------+
    | :macro:`TOUCH_LOG_INTERVAL`        |
    +------------------------------------+
    | :macro:`<SUBSYS>_LOCK`             |
    +------------------------------------+
    | :macro:`FILE_LOCK_VIA_MUTEX`       |
    +------------------------------------+
    | :macro:`LOGS_USE_TIMESTAMP`        |
    +------------------------------------+
    | :macro:`LOG_TO_SYSLOG`             |
    +------------------------------------+

    Daemon logs are often investigated to accomplish administrative
    debugging. :tool:`condor_config_val` can be used to determine the
    location and file name of the daemon log. For example, to display
    the location of the log for the *condor_collector* daemon, use

    .. code-block:: console

          $ condor_config_val COLLECTOR_LOG

job queue log
    The job queue log is a transactional representation of the current
    job queue. If the *condor_schedd* crashes, the job queue can be
    rebuilt using this log. The file name is set by configuration
    variable :macro:`JOB_QUEUE_LOG`, and defaults to ``$(SPOOL)/job_queue.log``.

    Within the log, each transaction is identified with an integer value
    and followed where appropriate with other values relevant to the
    transaction. To reduce the size of the log and remove any
    transactions that are no longer relevant, a copy of the log is kept
    by renaming the log at each time interval defined by configuration
    variable :macro:`QUEUE_CLEAN_INTERVAL`, and then a new log is written
    with only current and relevant transactions.

    Configuration variables that affect the job queue log are

    +------------------------------+--------------------------------------+
    | :macro:`SCHEDD_BACKUP_SPOOL` | :macro:`MAX_JOB_QUEUE_LOG_ROTATIONS` |
    +------------------------------+--------------------------------------+
    | :macro:`QUEUE_CLEAN_INTERVAL`|                                      |
    +------------------------------+--------------------------------------+

*condor_schedd* audit log
    The optional *condor_schedd* audit log records user-initiated
    events that modify the job queue, such as invocations of
    :tool:`condor_submit`, :tool:`condor_rm`, :tool:`condor_hold` and
    :tool:`condor_release`. Each event has a time stamp and a message that
    describes details of the event.

    This log exists to help administrators track the activities of pool
    users.

    The file name is set by configuration variable :macro:`SCHEDD_AUDIT_LOG`.

    Configuration variables that affect the audit log are

    +-------------------------------+----------------------------------+
    | :macro:`MAX_SCHEDD_AUDIT_LOG` | :macro:`MAX_NUM_SCHEDD_AUDIT_LOG`|
    +-------------------------------+----------------------------------+

*condor_shared_port* audit log
    The optional *condor_shared_port* audit log records connections
    made through the :macro:`DAEMON_SOCKET_DIR`. Each record includes the source
    address, the socket file name, and the target process's PID, UID,
    GID, executable path, and command line.

    This log exists to help administrators track the activities of pool
    users.

    The file name is set by configuration variable :macro:`SHARED_PORT_AUDIT_LOG`.

    Configuration variables that affect the audit log are

    +------------------------------------+----------------------------------------+
    | :macro:`MAX_SHARED_PORT_AUDIT_LOG` | :macro:`MAX_NUM_SHARED_PORT_AUDIT_LOG` |
    +------------------------------------+----------------------------------------+

event log
    The event log is an optional, chronological list of events that
    occur for all jobs and all users. The events logged are the same as
    those that would go into a job event log. The file name is set by
    configuration variable :macro:`EVENT_LOG`. The
    log is created only if this configuration variable is set.

    Configuration variables that affect the event log, setting details
    such as the maximum size to which this log may grow and details of
    file rotation and locking are

    +------------------------------------+--------------------------------------------+
    | :macro:`EVENT_LOG_MAX_SIZE`        | :macro:`EVENT_LOG_MAX_ROTATIONS`           |
    +------------------------------------+--------------------------------------------+
    | :macro:`EVENT_LOG_LOCKING`         |  :macro:`EVENT_LOG_ROTATION_LOCK`          |
    +------------------------------------+--------------------------------------------+
    | :macro:`EVENT_LOG_FSYNC`           | :macro:`EVENT_LOG_JOB_AD_INFORMATION_ATTRS`|
    +------------------------------------+--------------------------------------------+
    | :macro:`EVENT_LOG_USE_XML`         |                                            |
    +------------------------------------+--------------------------------------------+

accountant log
    The accountant log is a transactional representation of the
    *condor_negotiator* daemon's database of accounting information,
    which are user priorities. The file name of the accountant log is
    ``$(SPOOL)/Accountantnew.log``. Within the log, users are identified
    by username@uid_domain.

    To reduce the size and remove information that is no longer
    relevant, a copy of the log is made when its size hits the number of
    bytes defined by configuration variable
    :macro:`MAX_ACCOUNTANT_DATABASE_SIZE`, and then a new log is written in a
    more compact form.

    Administrators can change user priorities kept in this log by using
    the command line tool :tool:`condor_userprio`.

negotiator match log
    The negotiator match log is a second daemon log from the
    *condor_negotiator* daemon. Events written to this log are those
    with debug level of ``D_MATCH``. The file name is set by
    configuration variable :macro:`NEGOTIATOR_MATCH_LOG`, and defaults to
    ``$(LOG)/MatchLog``.

history log
    This optional log contains information about all jobs that have been
    completed. It is written by the *condor_schedd* daemon. The file
    name is ``$(SPOOL)/history``.

    Administrators can change view this historical information by using
    the command line tool :tool:`condor_history`.

    Configuration variables that affect the history log, setting details
    such as the maximum size to which this log may grow are

    +----------------------------------+--------------------------------+
    | :macro:`ENABLE_HISTORY_ROTATION` |                                |
    +----------------------------------+--------------------------------+
    | :macro:`MAX_HISTORY_LOG`         | :macro:`MAX_HISTORY_ROTATIONS` |
    +----------------------------------+--------------------------------+
    | :macro:`ROTATE_HISTORY_MONTHLY`  | :macro:`ROTATE_HISTORY_DAILY`  |
    +----------------------------------+--------------------------------+

DAGMan Logs
-----------

default node log
    A job event log of all node jobs within a single DAG. It is used to
    enforce the dependencies of the DAG.

    The file name is set by configuration variable
    :macro:`DAGMAN_DEFAULT_NODE_LOG`,
    and the full path name of this file must be unique while any and all
    submitted DAGs and other jobs from the submit host run. The syntax
    used in the definition of this configuration variable is different
    to enable the setting of a unique file name. See
    the :ref:`DAGMan Configuration` section for the complete definition.

the ``.dagman.out`` file
    A log created or appended to for each DAG submitted with timestamped
    events and extra information about the configuration applied to the
    DAG. The name of this log is formed by appending ``.dagman.out`` to
    the name of the DAG input file. The file remains after the DAG
    completes.

    This log may be helpful in debugging what has happened in the
    execution of a DAG, as well as help to determine the final state of
    the DAG.

    Configuration variables that affect this log are

    +---------------------------+-----------------------------------------+
    | :macro:`DAGMAN_VERBOSITY` | :macro:`DAGMAN_PENDING_REPORT_INTERVAL` |
    +---------------------------+-----------------------------------------+

the DAGMan job state log
    This optional, machine-readable log enables automated monitoring of
    DAG. The page :ref:`DAGMan Machine Readable History` details this log.


Directories
-----------

HTCondor uses a few different directories, some of which are role-specific.
Do not use these directories for any other purpose, and do not share these
directories between machines.  The directories are listed in here by the
name of the configuration option used to tell HTCondor where they are; you
will not normally need to change these.

Directories used by More than One Role
``````````````````````````````````````

 :macro:`LOG`
    Each HTCondor daemon writes its own log file, and each log file
    is placed in the :macro:`LOG` directory.  You can configure the name
    of each daemon's log by setting :macro:`<SUBSYS>_LOG`,
    although you should never need to do so.  You can also control the sizes
    of the log files or how often they rotate; see
    :ref:`admin-manual/configuration-macros:Daemon Logging Configuration File Entries`
    for details.  If you want to write your logs to a shared filesystem,
    we recommend including ``$(HOSTNAME)`` in the value of :macro:`LOG` rather
    than changing the names of each individual log to not collide.  If you
    set :macro:`LOG` to a shared filesystem, you should set :macro:`LOCK` to a local
    filesystem; see below.

 :macro:`LOCK`
    HTCondor uses a small number of lock files to synchronize access
    to certain files that are shared between multiple daemons.
    Because of problems encountered with file locking and network
    file systems (particularly NFS), these lock files should be
    placed on a local filesystem on each machine.  By default, they
    are placed in the :macro:`LOG` directory.

Directories use by the Submit Role
``````````````````````````````````

 :macro:`SPOOL`
    The :macro:`SPOOL` directory holds two types of files: system
    data and (user) job data.  The former includes the job queue and
    history files.  The latter includes:

    - the files transferred, if any, when a job which set
      ``when_to_transfer_files`` to ``EXIT_OR_EVICT`` is evicted.
    - the input and output files of remotely-submitted jobs.
    - the checkpoint files stored by self-checkpointing jobs.

    Disk usage therefore varies widely based on the job mix, but
    since the schedd will abort if it can't append to the job queue log,
    you want to make sure this directory is on a partition which
    won't run out of space.

    To help ensure this, you may set
    :macro:`JOB_QUEUE_LOG` to separate the job queue log (system data)
    from the (user) job data.  This can also be used to increase performance
    (or reliability) by moving the job queue log to specialized hardware (an
    SSD or a a high-redundancy RAID, for example).

Directories use by the Execute Role
```````````````````````````````````

 :macro:`EXECUTE`
    The :macro:`EXECUTE` directory is the parent directory of the
    current working directory for any HTCondor job that runs on a given
    execute-role machine.  HTCondor copies the executable and input files
    for a job to its subdirectory; the job's standard output and standard
    error streams are also logged here.  Jobs will also almost always
    generate their output here as well, so the :macro:`EXECUTE` directory should
    provide a plenty of space.  :macro:`EXECUTE` should not be placed under /tmp
    or /var/tmp if possible, as HTCondor loses the ability to make /tmp and
    /var/tmp private to the job.  While not a requirement, ideally :macro:`EXECUTE`
    should be on a distinct filesystem, so that it is impossible for a rogue job
    to fill up non-HTCondor related partitions.

    Usually, the per-job scratch execute directory is created by the startd
    as a directory under :macro:`EXECUTE`.  However, on Linux machines where HTCondor
    has root privilege, it can be configured to make an ephemeral per-job scratch
    filesystem. For more information visit :ref:`LVM Description`.