File: README.qsched

package info (click to toggle)
gridengine 8.1.9%2Bdfsg-9
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 56,756 kB
  • sloc: ansic: 432,689; java: 87,068; cpp: 31,958; sh: 29,429; jsp: 7,757; perl: 6,336; xml: 5,828; makefile: 4,701; csh: 3,934; ruby: 2,221; tcl: 1,676; lisp: 669; yacc: 519; python: 503; lex: 361
file content (120 lines) | stat: -rw-r--r-- 3,578 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
qsched and tools
================

Grid Engine is a batch queue system with a mechanism to reserve resources
for jobs, thereby preventing them from being "starved" by other, easier to
schedule jobs.

Unfortunately, this has often been considered a black-box, with it being
difficult to divine what the scheduler is actually thinking.

The purpose of this collection of programs is to display what the scheduler's
resource reservation feature is thinking.


INSTALLATION
~~~~~~~~~~~~

1) Enable the scheduler logging feature:

  $ qconf -msconf

    Replace "params NONE" with "params MONITOR=1", save and exit the editor
    (or, if there is already a value for params, add it to the comma
    separated list).

  If you are beginning from the default configuration, the you should see
  the specific output from the following command:

  $ qconf -ssconf | egrep ^params
  params                            MONITOR=1

  This will create a file called $SGE_ROOT/$SGE_CELL/common/schedule

2) Start the schedule logfile processing program

  If you don't start it with the SGE_ROOT and SGE_CELL set suitably in
  the environment, look near the top of process-scheduler-log for the
  definition for variable $sge_common. Set it to the
  $SGE_ROOT/$SGE_CELL/common directory for your installation of Grid
  Engine.

  Execute the program from an account with write access to that
  directory.

  You will probably want to write a startup script appropriate for your
  operating system.

3) Install the qsched command

  Put file "qsched" somewhere in your path. Optionally edit $trim_string
  to shorten some of the output you may see.

  You might need to install the XML::Simple Perl module. On Red Hat
  Enterprise Linux or related operating systems, you will need to install
  the perl-XML-Simple RPM.


USAGE
~~~~~

Execute "qsched -a", which should report the list of jobs that resources
are currently being reserved for.

Options:

$ qsched -h
Usage: qsched [OPTION...]

  -u <user>      - view jobs belonging to user
  -a             - view jobs belonging to all users (synonym for '-u *')
  -j <jid_list>  - comma-separated list of JIDs to view
  -f             - print all available data for each job
  -s j           - sort by JID
  -q             - sort by queue
  -d             - print debugging info
  -h             - print this message

Example output:

$ qsched
job-ID     user     start                end                  slots  h_vmem
---------  -------  -------------------  -------------------  -----  ------
 347471.1  bob      03/06/2011 18:14:46  03/08/2011 18:19:45      8      1G

$ qsched -f -j 347471.1
Job 347471.1
    user      = bob
    start     = Mon Mar  7 19:40:18 2011
    end       = Wed Mar  9 19:45:17 2011
    slots     = 8
    h_vmem    = 1G
    pe        = iblocal (slots = 8)
    queue     = arc1.q@c2s3b0n0 (slots = 8)
    resources = c2s3b0n0 (exclusive = 8, h_vmem = 8G)


TIPS
~~~~

1) "Rotate" your schedule file

The schedule file is continually appended to by Grid Engine. Consider deleting
it once a day to prevent it from growing too large.

2) NFS performance problems with $SGE_ROOT

Our $SGE_ROOT is mounted from an NFS server onto the qmaster host, allowing
us to run a shadow qmaster. We recently found that Grid Engine was continually
making small writes to the schedule file, causing excessive I/O on the NFS
server, and poor responsiveness on the qmaster.

Replacing the schedule file with a symlink to an area of disk space local to
the qmaster resolved this issue.


AUTHOR
~~~~~~

Mark Dixon <m.c.dixon@leeds.ac.uk>