File: sge_shepherd.html

package info (click to toggle)
gridengine 6.2u5-7.1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 57,216 kB
  • sloc: ansic: 438,030; java: 66,252; sh: 36,399; jsp: 7,757; xml: 5,850; makefile: 5,520; csh: 4,571; cpp: 2,848; perl: 2,401; tcl: 692; lisp: 669; yacc: 668; ruby: 642; lex: 344
file content (116 lines) | stat: -rw-r--r-- 5,808 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
<HTML>
<BODY BGCOLOR=white>
<PRE>
<!-- Manpage converted by man2html 3.0.1 -->
NAME
     sge_shepherd - Sun Grid Engine single job controlling agent

SYNOPSIS
     sge_shepherd

DESCRIPTION
     <I>sge</I>_<I>shepherd</I> provides the parent process functionality for a
     single  Sun  Grid  Engine  job.  The parent functionality is
     necessary on UNIX systems to retrieve resource usage  infor-
     mation (see <B><A HREF="../htmlman2/getrusage.html?pathrev=V62u5_TAG">getrusage(2)</A></B>) after a job has finished. In addi-
     tion, the <I>sge</I>_<I>shepherd</I> forwards signals to the job, such  as
     the  signals  for  suspension, enabling, termination and the
     Sun Grid Engine checkpointing signal  (see  <B><A HREF="../htmlman1/sge_ckpt.html?pathrev=V62u5_TAG">sge_ckpt(1)</A></B>  for
     details).

     The <I>sge</I>_<I>shepherd</I> receives information about the  job  to  be
     started  from the <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>.  During the execution of the
     job it actually starts up to 5 child processes. First a pro-
     log  script  is run if this feature is enabled by the prolog
     parameter in the cluster configuration.  (See  <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>.)
     Next  a parallel environment startup procedure is run if the
     job is a parallel job. (See <B><A HREF="../htmlman5/sge_pe.html?pathrev=V62u5_TAG">sge_pe(5)</A></B> for more information.)
     After  that,  the  job itself is run, followed by a parallel
     environment  shutdown  procedure  for  parallel  jobs,   and
     finally  an epilog script if requested by the epilog parame-
     ter in the cluster  configuration.  The  prolog  and  epilog
     scripts  as  well  as  the  parallel environment startup and
     shutdown procedures are to  be  provided  by  the  Sun  Grid
     Engine  administrator  and  are  intended  for site-specific
     actions to be taken before and after execution of the actual
     user job.

     After the job has finished and the  epilog  script  is  pro-
     cessed,  <I>sge</I>_<I>shepherd</I>  retrieves  resource  usage statistics
     about the job, places them in a job specific subdirectory of
     the  <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>  spool  directory  for  reporting  through
     <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> and finishes.

     <I>sge</I>_<I>shepherd</I> also places an exit status file  in  the  spool
     directory.  This  exit  status  can  be viewed with qacct -j
     JobId  (see  <B><A HREF="../htmlman1/qacct.html?pathrev=V62u5_TAG">qacct(1)</A></B>);  it  is  not  the  exit  status   of
     <I>sge</I>_<I>shepherd</I>  itself  but  of one of the methods executed by
     <I>sge</I>_<I>shepherd</I>. This exit status can  have  several  meanings,
     depending  on  in  which  method an error occurred (if any).
     The possible  methods  are:  prolog,  parallel  start,  job,
     parallel  stop,  epilog, suspend, restart, terminate, clean,
     migrate, and checkpoint.

     The following exit values are returned:

     0      All methods: Operation was executed successfully.
     99     Job script, prolog and epilog: When <I>FORBID</I>_<I>RESCHEDULE</I>
            is  not  set  in the configuration (see <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>),
            the job gets re-queued.  Otherwise see "Other".

     100    Job script, prolog and epilog:  When  <I>FORBID</I>_<I>APPERROR</I>
            is  not  set  in the configuration (see <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>),
            the job gets re-queued.  Otherwise see "Other".

     Other  Job script: This  is  the  exit  status  of  the  job
            itself.  No  action  is  taken  upon this exit status
            because the meaning of this exit status is not known.
            Prolog, epilog and parallel start: The queue  is  set
            to error state and the job is re-queued.
            Parallel stop: The queue is set to error  state,  but
            the  job is not re-queued. It is assumed that the job
            itself ran successfully and only the clean up  script
            failed.
            Suspend,  restart,  terminate,  clean,  and  migrate:
            Always successful.
            Checkpoint: Success, except for kernel checkpointing:
            checkpoint  was  not  successful, did not happen (but
            migration will happen by Sun Grid Engine).

RESTRICTIONS
     <I>sge</I>_<I>shepherd</I> should not be invoked  manually,  but  only  by
     <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>.

FILES
     sgepasswd  contains  a  list  of  user  names   and    their
     corresponding  encrypted  passwords. If available, the pass-
     word  file  will  be   used   by   sge_shepherd.  To  change
     the  contents of this file please use the sgepasswd command.
     It is not advised to  change that file manually.
     &lt;<I>execd</I>_<I>spool</I>&gt;/<I>job</I>_<I>dir</I>/&lt;<I>job</I>_<I>id</I>&gt;     job specific directory

SEE ALSO
     <B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B>, <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>, <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>.

COPYRIGHT
     See <B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B> for a full statement of rights and  permis-
     sions.











</PRE>
<HR>
<ADDRESS>
Man(1) output converted with
<a href="http://www.oac.uci.edu/indiv/ehood/man2html.html">man2html</a>
</ADDRESS>
</BODY>
</HTML>