1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
|
<HTML>
<BODY BGCOLOR=white>
<PRE>
<!-- Manpage converted by man2html 3.0.1 -->
NAME
sge_shepherd - Sun Grid Engine single job controlling agent
SYNOPSIS
sge_shepherd
DESCRIPTION
<I>sge</I>_<I>shepherd</I> provides the parent process functionality for a
single Sun Grid Engine job. The parent functionality is
necessary on UNIX systems to retrieve resource usage infor-
mation (see <B><A HREF="../htmlman2/getrusage.html?pathrev=V62u5_TAG">getrusage(2)</A></B>) after a job has finished. In addi-
tion, the <I>sge</I>_<I>shepherd</I> forwards signals to the job, such as
the signals for suspension, enabling, termination and the
Sun Grid Engine checkpointing signal (see <B><A HREF="../htmlman1/sge_ckpt.html?pathrev=V62u5_TAG">sge_ckpt(1)</A></B> for
details).
The <I>sge</I>_<I>shepherd</I> receives information about the job to be
started from the <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>. During the execution of the
job it actually starts up to 5 child processes. First a pro-
log script is run if this feature is enabled by the prolog
parameter in the cluster configuration. (See <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>.)
Next a parallel environment startup procedure is run if the
job is a parallel job. (See <B><A HREF="../htmlman5/sge_pe.html?pathrev=V62u5_TAG">sge_pe(5)</A></B> for more information.)
After that, the job itself is run, followed by a parallel
environment shutdown procedure for parallel jobs, and
finally an epilog script if requested by the epilog parame-
ter in the cluster configuration. The prolog and epilog
scripts as well as the parallel environment startup and
shutdown procedures are to be provided by the Sun Grid
Engine administrator and are intended for site-specific
actions to be taken before and after execution of the actual
user job.
After the job has finished and the epilog script is pro-
cessed, <I>sge</I>_<I>shepherd</I> retrieves resource usage statistics
about the job, places them in a job specific subdirectory of
the <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> spool directory for reporting through
<B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> and finishes.
<I>sge</I>_<I>shepherd</I> also places an exit status file in the spool
directory. This exit status can be viewed with qacct -j
JobId (see <B><A HREF="../htmlman1/qacct.html?pathrev=V62u5_TAG">qacct(1)</A></B>); it is not the exit status of
<I>sge</I>_<I>shepherd</I> itself but of one of the methods executed by
<I>sge</I>_<I>shepherd</I>. This exit status can have several meanings,
depending on in which method an error occurred (if any).
The possible methods are: prolog, parallel start, job,
parallel stop, epilog, suspend, restart, terminate, clean,
migrate, and checkpoint.
The following exit values are returned:
0 All methods: Operation was executed successfully.
99 Job script, prolog and epilog: When <I>FORBID</I>_<I>RESCHEDULE</I>
is not set in the configuration (see <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>),
the job gets re-queued. Otherwise see "Other".
100 Job script, prolog and epilog: When <I>FORBID</I>_<I>APPERROR</I>
is not set in the configuration (see <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>),
the job gets re-queued. Otherwise see "Other".
Other Job script: This is the exit status of the job
itself. No action is taken upon this exit status
because the meaning of this exit status is not known.
Prolog, epilog and parallel start: The queue is set
to error state and the job is re-queued.
Parallel stop: The queue is set to error state, but
the job is not re-queued. It is assumed that the job
itself ran successfully and only the clean up script
failed.
Suspend, restart, terminate, clean, and migrate:
Always successful.
Checkpoint: Success, except for kernel checkpointing:
checkpoint was not successful, did not happen (but
migration will happen by Sun Grid Engine).
RESTRICTIONS
<I>sge</I>_<I>shepherd</I> should not be invoked manually, but only by
<B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>.
FILES
sgepasswd contains a list of user names and their
corresponding encrypted passwords. If available, the pass-
word file will be used by sge_shepherd. To change
the contents of this file please use the sgepasswd command.
It is not advised to change that file manually.
<<I>execd</I>_<I>spool</I>>/<I>job</I>_<I>dir</I>/<<I>job</I>_<I>id</I>> job specific directory
SEE ALSO
<B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B>, <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>, <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>.
COPYRIGHT
See <B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B> for a full statement of rights and permis-
sions.
</PRE>
<HR>
<ADDRESS>
Man(1) output converted with
<a href="http://www.oac.uci.edu/indiv/ehood/man2html.html">man2html</a>
</ADDRESS>
</BODY>
</HTML>
|