File: sge_status.5

package info (click to toggle)
gridengine 8.1.9%2Bdfsg-13.1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 57,140 kB
  • sloc: ansic: 432,689; java: 87,068; cpp: 31,958; sh: 29,445; jsp: 7,757; perl: 6,336; xml: 5,828; makefile: 4,704; csh: 3,934; ruby: 2,221; tcl: 1,676; lisp: 669; yacc: 519; python: 503; lex: 361; javascript: 200
file content (250 lines) | stat: -rw-r--r-- 5,256 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
'\" t
.\" Copyright (C), 2012, 2013  Dave Love, University of Liverpool
.\" You may distribute this file under the terms of the GNU Free
.\" Documentation License.
.de M		\" SGE man page reference
\\fI\\$1\\fR\\|(\\$2)\\$3
..
.TH sge_status 5 2013-04-07
.SH NAME
sge_status \- xxQS_NAMExx job status values
.SH DESCRIPTION
.SS "Job state"
The following table lists the job states shown by
.M qstat 1
and returned by
.M drmaa_jobcontrol 3 .
The DRMAA
.I state
corresponds to the
.BI DRMAA_PS_ state
value that may be returned by
.M drmaa_job_ps 3 .
.PP
.TS
tab(@), allbox;
cbcbcbcb
ltltltlt.
Category@State@SGE@DRMAA state
Pending@pending@qw, Rq@QUEUED_ACTIVE
\^@pending, user hold@hqw@USER_ON_HOLD
\^@pending, system hold@hqw@SYSTEM_ON_HOLD
\^@T{
.na
pending, user and system hold
T}@hqw@USER_SYSTEM_ON_HOLD
\^@T{
.na
pending, user hold, re-queue
T}@hRwq@USER_ON_HOLD
\^@T{
.na
pending, system hold, re-queue
T}@hRwq@SYSTEM_ON_HOLD
\^@T{
.na
pending, user and system hold, re-queue
T}@hRwq@USER_SYSTEM_ON_HOLD
T{
.na
Running / transferring
T}@running, transferring@r, hr, t@RUNNING
\^@T{
.na
running, re-run / transferring
T}@Rr, Rt@RUNNING
Suspended@job suspended@s, ts@USER_SUSPENDED
\^@queue suspended@S, tS@SYSTEM_SUSPENDED
\^@T{
.na
queue suspended by alarm
T}@T, tT@SYSTEM_SUSPENDED
\^@T{
.na
all suspended with re-run
T}@T{
.na
Rs, Rts, RS, RtS, RT, RtT
T}@SYSTEM_SUSPENDED
Error@T{
.na
all pending states with error
T}@T{
Eqw, Ehqw, EhRqw
T}@FAILED
Deleting@T{
.na
all running and suspended states with deletion
T}@T{
.na
dr, dt, dRr, dRt, ds, dS, dT, dRs, dRS, dRT
T}@T{
.na
same as equivalent DRMAA states without the "d"
T}
Finished@T{
.na
job finished normally
T}@z@DONE
Unkown@T{
.na
status cannot be determined
T}@@UNDETERMINED
.TE
.SS "\"Failed\" states"
The following table lists the "failed" values reported by
.M qacct 1
(see 
.M accounting 5 ),
their description, also reported by
.IR qacct ,
whether the resource usage accounting data are valid for the job
("OK"), and an explanation.  The host's messages file or the shepherd
trace file (preserved with
.B execd_params
.B KEEP_ACTIVE
in
.M sge_conf 5 )
may provide more information about errors.
.\" See execution_states.c
.TS
tab(@), allbox;
lblblblb
ltltltlt.
Code@Description@OK@Explanation
0@no failure@Y@ran and exited normally
1@assumedly before job@N@failed early in execd
3@before writing config@N@failed before execd set up local spool
4@before writing PID@N@shepherd failed to record its pid \- filesystem problem?
.\" 5@on reading config file@N@
6@setting processor set@N@failed setting up processor set (obsolete)
7@before prolog@N@failed before prolog
8@in prolog@N@failed in prolog
9@before pestart@N@failed before starting PE
10@in pestart@N@failed in PE starter
11@before job@N@T{
.na
failed in shepherd before starting job
T}
12@before pestop@Y@T{
.na
ran, but failed before calling PE stop procedure
T}
13@in pestop@Y@T{
.na
ran, but PE stop procedure failed
T}
14@before epilog@Y@T{
.na
ran, but failed before calling epilog
T}
15@in epilog@Y@T{
.na
ran, but failed in epilog
T}
16@releasing processor set@Y@T{
.na
ran, but processor set could not be released (obsolete)
T}
17@through signal@Y@T{
.na
job killed by signal (possibly qdel)
T}
18@shepherd returned error@N@shepherd died somehow
19@before writing exit_status@N@T{
.na
shepherd didn't write reports correctly \- probably program or machine crash
T}
20@found unexpected error file@?@T{
.na
shepherd encountered a problem
T}
21@in recognizing job@N@T{
.na
qmaster asked about an unknown job (not in accounting?)
T}
24@T{
.na
migrating (checkpointing jobs)
T}@Y@ran, will be migrated
25@rescheduling@Y@T{
.na
ran, will be rescheduled
T}
26@opening output file@N@T{
.na
failed opening stderr/stdout file
T}
27@searching requested shell@N@failed finding specified shell
28@T{
.na
changing to working directory
T}@N@T{
.na
failed changing to start directory
T}
29@AFS setup@N@failed setting up AFS security
30@application error returned@Y@T{
.na
ran and exited 100 \- maybe re-scheduled
T}
31@accessing sgepasswd file@N@T{
.na
failed because sgepasswd not readable (MS Windows)
T}
32@T{
.na
entry is missing in password file
T}@N@T{
.na
failed because user not in sgepasswd (MS Windows)
T}
33@wrong password@N@T{
.na
failed because of wrong password against sgepasswd (MS Windows)
T}
34@T{
.na
communicating with Grid Engine Helper Service
T}@N@T{
.na
failed because of failure of helper service (MS Windows)
T}
35@T{
.na
before job in Grid Engine Helper Service
T}@N@T{
.na
failed because of failure running helper service (MS Windows)
T}
36@checking configured daemons@N@T{
.na
failed because of configured remote startup daemon
T}
37@T{
.na
qmaster enforced h_rt, h_cpu, or h_vmem limit
T}@Y@T{
.na
ran, but killed due to exceeding run time limit
T}
38@adding supplementary group@N@T{
.na
failed adding supplementary gid to job
T}
100@assumedly after job@Y@T{
.na
ran, but killed by a signal (perhaps due to exceeding resources), task
died, shepherd died (e.g. node crash), etc.
T}
.TE
.PP
See
.M sge_shepherd 8
for the effect of non-zero return codes from the various methods
(prolog etc.) executed by the shepherd.
.SH "SEE ALSO"
.M drmaa_jobcontrol 3 ,
.M qstat 1 ,
.M sge_shepherd 8 ,
.M accounting 5 .