1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284
|
Job Event Log Codes
===================
:index:`event codes for jobs<single: event codes for jobs; log files>`
Table `B.2 <#x182-12460022>`_ lists codes that appear as the first
These are all of the events that can show up in a job log file:
| **Event Number:** 000
| **Event Name:** Job submitted
| **Event Description:** This event occurs when a user submits a job. It
is the first event you will see for a job, and it should only occur
once.
| **Event Number:** 001
| **Event Name:** Job executing
| **Event Description:** This shows up when a job is running. It might
occur more than once.
| **Event Number:** 002
| **Event Name:** Error in executable
| **Event Description:** The job could not be run because the executable
was bad.
| **Event Number:** 003
| **Event Name:** Job was checkpointed
| **Event Description:** No longer used.
| **Event Number:** 004
| **Event Name:** Job evicted from machine
| **Event Description:** A job was removed from a machine before it
finished, usually for a policy reason. Perhaps an interactive user has
claimed the computer, or perhaps another job is higher priority.
| **Event Number:** 005
| **Event Name:** Job terminated
| **Event Description:** The job has completed.
| **Event Number:** 006
| **Event Name:** Image size of job updated
| **Event Description:** An informational event, to update the amount of
memory that the job is using while running. It does not reflect the
state of the job.
| **Event Number:** 007
| **Event Name:** Shadow exception
| **Event Description:** The *condor_shadow*, a program on the submit
computer that watches over the job and performs some services for the
job, failed for some catastrophic reason. The job will leave the machine
and go back into the queue.
| **Event Number:** 008
| **Event Name:** Generic log event
| **Event Description:** Not used.
| **Event Number:** 009
| **Event Name:** Job aborted
| **Event Description:** The user canceled the job.
| **Event Number:** 010
| **Event Name:** Job was suspended
| **Event Description:** The job is still on the computer, but it is no
longer executing. This is usually for a policy reason, such as an
interactive user using the computer.
| **Event Number:** 011
| **Event Name:** Job was unsuspended
| **Event Description:** The job has resumed execution, after being
suspended earlier.
| **Event Number:** 012
| **Event Name:** Job was held
| **Event Description:** The job has transitioned to the hold state.
This might happen if the user applies the :tool:`condor_hold` command to the
job.
| **Event Number:** 013
| **Event Name:** Job was released
| **Event Description:** The job was in the hold state and is to be
re-run.
| **Event Number:** 014
| **Event Name:** Parallel node executed
| **Event Description:** A parallel universe program is running on a
node.
| **Event Number:** 015
| **Event Name:** Parallel node terminated
| **Event Description:** A parallel universe program has completed on a
node.
| **Event Number:** 016
| **Event Name:** POST script terminated
| **Event Description:** A node in a DAGMan work flow has a script that
should be run after a job. The script is run on the submit host. This
event signals that the post script has completed.
| **Event Number:** 021
| **Event Name:** Remote error
| **Event Description:** The *condor_starter* (which monitors the job
on the execution machine) has failed.
| **Event Number:** 022
| **Event Name:** Remote system call socket lost
| **Event Description:** The *condor_shadow* and *condor_starter*
(which communicate while the job runs) have lost contact.
| **Event Number:** 023
| **Event Name:** Remote system call socket reestablished
| **Event Description:** The *condor_shadow* and *condor_starter*
(which communicate while the job runs) have been able to resume contact
before the job lease expired.
| **Event Number:** 024
| **Event Name:** Remote system call reconnect failure
| **Event Description:** The *condor_shadow* and *condor_starter*
(which communicate while the job runs) were unable to resume contact
before the job lease expired.
| **Event Number:** 025
| **Event Name:** Grid Resource Back Up
| **Event Description:** A grid resource that was previously unavailable
is now available.
| **Event Number:** 026
| **Event Name:** Detected Down Grid Resource
| **Event Description:** The grid resource that a job is to run on is
unavailable.
| **Event Number:** 027
| **Event Name:** Job submitted to grid resource
| **Event Description:** A job has been submitted, and is under the
auspices of the grid resource.
| **Event Number:** 028
| **Event Name:** Job ad information event triggered.
| **Event Description:** Extra job ClassAd attributes are noted. This
event is written as a supplement to other events when the configuration
parameter :macro:`EVENT_LOG_JOB_AD_INFORMATION_ATTRS` is set.
| **Event Number:** 029
| **Event Name:** The job's remote status is unknown
| **Event Description:** No updates of the job's remote status have been
received for 15 minutes.
| **Event Number:** 030
| **Event Name:** The job's remote status is known again
| **Event Description:** An update has been received for a job whose
remote status was previous logged as unknown.
| **Event Number:** 031
| **Event Name:** Job stage in
| **Event Description:** A grid universe job is doing the stage in of
input files.
| **Event Number:** 032
| **Event Name:** Job stage out
| **Event Description:** A grid universe job is doing the stage out of
output files.
| **Event Number:** 033
| **Event Name:** Job ClassAd attribute update
| **Event Description:** A Job ClassAd attribute is changed due to
action by the *condor_schedd* daemon. This includes changes by
:tool:`condor_prio`.
| **Event Number:** 034
| **Event Name:** Pre Skip event
| **Event Description:** For DAGMan, this event is logged if a PRE
SCRIPT exits with the defined PRE_SKIP value in the DAG input file.
This makes it possible for DAGMan to do recovery in a workflow that has
such an event, as it would otherwise not have any event for the DAGMan
node to which the script belongs, and in recovery, DAGMan's internal
tables would become corrupted.
| **Event Number:** 035
| **Event Name:** Cluster Submit
| **Event Description:** This event occurs when a user submits a cluster
with multiple procs.
| **Event Number:** 036
| **Event Name:** Cluster Remove
| **Event Description:** This event occurs after all the jobs in a multi-proc
cluster have completed, or when the cluster is removed (by :tool:`condor_rm`).
| **Event Number:** 037
| **Event Name:** Factory Paused
| **Event Description:** This event occurs when job materialization for
a cluster has been paused.
| **Event Number:** 038
| **Event Name:** Factory Resumed
| **Event Description:** This event occurs when job materialization for
a cluster has been resumed
| **Event Number:** 039
| **Event Name:** None
| **Event Description:** This event should never occur in a log but may
be returned by log reading code in certain situations (e.g., timing out
while waiting for a new event to appear in the log).
| **Event Number:** 040
| **Event Name:** File Transfer
| **Event Description:** This event occurs when a file transfer event
occurs: transfer queued, transfer started, or transfer finished, for
both the input and output sandboxes.
Table B.2: Event Codes in a Job Event Log
+-------+---------------------------+---------------------------------------------------+
| 001 | EXECUTE | Execute |
+-------+---------------------------+---------------------------------------------------+
| 002 | EXECUTABLE_ERROR | Executable error |
+-------+---------------------------+---------------------------------------------------+
| 003 | CHECKPOINTED | no longer used |
+-------+---------------------------+---------------------------------------------------+
| 004 | JOB_EVICTED | Job evicted |
+-------+---------------------------+---------------------------------------------------+
| 005 | JOB_TERMINATED | Job terminated |
+-------+---------------------------+---------------------------------------------------+
| 006 | IMAGE_SIZE | Image size |
+-------+---------------------------+---------------------------------------------------+
| 007 | SHADOW_EXCEPTION | Shadow exception |
+-------+---------------------------+---------------------------------------------------+
| 009 | JOB_ABORTED | Job aborted |
+-------+---------------------------+---------------------------------------------------+
| 010 | JOB_SUSPENDED | Job suspended |
+-------+---------------------------+---------------------------------------------------+
| 011 | JOB_UNSUSPENDED | Job unsuspended |
+-------+---------------------------+---------------------------------------------------+
| 012 | JOB_HELD | Job held |
+-------+---------------------------+---------------------------------------------------+
| 013 | JOB_RELEASED | Job released |
+-------+---------------------------+---------------------------------------------------+
| 014 | NODE_EXECUTE | Node execute |
+-------+---------------------------+---------------------------------------------------+
| 015 | NODE_TERMINATED | Node terminated |
+-------+---------------------------+---------------------------------------------------+
| 016 | POST_SCRIPT_TERMINATED | Post script terminated |
+-------+---------------------------+---------------------------------------------------+
| 021 | REMOTE_ERROR | Remote error |
+-------+---------------------------+---------------------------------------------------+
| 022 | JOB_DISCONNECTED | Job disconnected |
+-------+---------------------------+---------------------------------------------------+
| 023 | JOB_RECONNECTED | Job reconnected |
+-------+---------------------------+---------------------------------------------------+
| 024 | JOB_RECONNECT_FAILED | Job reconnect failed |
+-------+---------------------------+---------------------------------------------------+
| 025 | GRID_RESOURCE_UP | Grid resource up |
+-------+---------------------------+---------------------------------------------------+
| 026 | GRID_RESOURCE_DOWN | Grid resource down |
+-------+---------------------------+---------------------------------------------------+
| 027 | GRID_SUBMIT | Grid submit |
+-------+---------------------------+---------------------------------------------------+
| 028 | JOB_AD_INFORMATION | Job ClassAd attribute values added to event log |
+-------+---------------------------+---------------------------------------------------+
| 029 | JOB_STATUS_UNKNOWN | Job status unknown |
+-------+---------------------------+---------------------------------------------------+
| 030 | JOB_STATUS_KNOWN | Job status known |
+-------+---------------------------+---------------------------------------------------+
| 031 | JOB_STAGE_IN | Grid job stage in |
+-------+---------------------------+---------------------------------------------------+
| 032 | JOB_STAGE_OUT | Grid job stage out |
+-------+---------------------------+---------------------------------------------------+
| 033 | ATTRIBUTE_UPDATE | Job ClassAd attribute update |
+-------+---------------------------+---------------------------------------------------+
| 034 | PRESKIP | DAGMan PRE_SKIP defined |
+-------+---------------------------+---------------------------------------------------+
| 035 | CLUSTER_SUBMIT | Cluster submitted |
+-------+---------------------------+---------------------------------------------------+
| 036 | CLUSTER_REMOVE | Cluster removed |
+-------+---------------------------+---------------------------------------------------+
| 037 | FACTORY_PAUSED | Factory paused |
+-------+---------------------------+---------------------------------------------------+
| 038 | FACTORY_RESUMED | Factory resumed |
+-------+---------------------------+---------------------------------------------------+
| 039 | NONE | No event could be returned |
+-------+---------------------------+---------------------------------------------------+
| 040 | FILE_TRANSFER | File transfer |
+-------+---------------------------+---------------------------------------------------+
|