1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951
|
<HTML>
<BODY BGCOLOR=white>
<PRE>
<!-- Manpage converted by man2html 3.0.1 -->
NAME
queue_conf - Sun Grid Engine queue configuration file format
DESCRIPTION
This manual page describes the format of the template file
for the cluster queue configuration. Via the -aq and -mq
options of the <B><A HREF="../htmlman1/qconf.html?pathrev=V62u5_TAG">qconf(1)</A></B> command, you can add cluster queues
and modify the configuration of any queue in the cluster.
Any of these change operations can be rejected, as a result
of a failed integrity verification.
The queue configuration parameters take as values strings,
integer decimal numbers or boolean, time and memory specif-
iers (see <I>time</I>_<I>specifier</I> and <I>memory</I>_<I>specifier</I> in
<B><A HREF="../htmlman5/sge_types.html?pathrev=V62u5_TAG">sge_types(5)</A></B>) as well as comma separated lists.
Note, Sun Grid Engine allows backslashes (\) be used to
escape newline (\newline) characters. The backslash and the
newline are replaced with a space (" ") character before any
interpretation.
FORMAT
The following list of parameters specifies the queue confi-
guration file content:
qname
The name of the cluster queue as defined for <I>queue</I>_<I>name</I> in
<B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>. As template default "template" is used.
hostlist
A list of host identifiers as defined for <I>host</I>_<I>identifier</I> in
<B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>. For each host Sun Grid Engine maintains a
queue instance for running jobs on that particular host.
Large amounts of hosts can easily be managed by using host
groups rather than by single host names. As list separators
white-spaces and "," can be used. (template default: NONE).
If more than one host is specified it can be desirable to
specify divergences with the further below parameter set-
tings for certain hosts. These divergences can be expressed
using the enhanced queue configuration specifier syntax.
This syntax builds upon the regular parameter specifier syn-
tax separately for each parameter:
"["<I>host</I>_<I>identifier</I>=<parameters_specifier_syntax>"]"
[,"["<I>host</I>_<I>identifier</I>=<parameters_specifier_syntax>"]" ]
note, even in the enhanced queue configuration specifier
syntax an entry without brackets denoting the default set-
ting is required and used for all queue instances where no
divergences are specified. Tuples with a host group
<I>host</I>_<I>identifier</I> override the default setting. Tuples with a
host name host_identifier override both the default and the
host group setting.
Note that also with the enhanced queue configuration specif-
ier syntax a default setting is always needed for each con-
figuration attribute; otherwise the queue configuration gets
rejected. Ambiguous queue configurations with more than one
attribute setting for a particular host are rejected. Con-
figurations containing override values for hosts not
enlisted under 'hostname' are accepted but are indicated by
-sds of <B><A HREF="../htmlman1/qconf.html?pathrev=V62u5_TAG">qconf(1)</A></B>. The cluster queue should contain an unam-
biguous specification for each configuration attribute of
each queue instance specified under hostname in the queue
configuration. Ambiguous configurations with more than one
attribute setting resulting from overlapping host groups are
indicated by -explain c of <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B> and cause the queue
instance with ambiguous configurations to enter the
c(onfiguration ambiguous) state.
seq_no
In conjunction with the hosts load situation at a time this
parameter specifies this queue's position in the scheduling
order within the suitable queues for a job to be dispatched
under consideration of the queue_sort_method (see
<B><A HREF="../htmlman5/sched_conf.html?pathrev=V62u5_TAG">sched_conf(5)</A></B> ).
Regardless of the queue_sort_method setting, <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B>
reports queue information in the order defined by the value
of the seq_no. Set this parameter to a monotonically
increasing sequence. (type number; template default: 0).
load_thresholds
load_thresholds is a list of load thresholds. Already if one
of the thresholds is exceeded no further jobs will be
scheduled to the queues and <B><A HREF="../htmlman1/qmon.html?pathrev=V62u5_TAG">qmon(1)</A></B> will signal an overload
condition for this node. Arbitrary load values being defined
in the "host" and "global" complexes (see <B><A HREF="../htmlman5/complex.html?pathrev=V62u5_TAG">complex(5)</A></B> for
details) can be used.
The syntax is that of a comma separated list with each list
element consisting of the <I>complex</I>_<I>name</I> (see <B><A HREF="../htmlman5/sge_types.html?pathrev=V62u5_TAG">sge_types(5)</A></B>) of
a load value, an equal sign and the threshold value being
intended to trigger the overload situation (e.g.
load_avg=1.75,users_logged_in=5).
Note: Load values as well as consumable resources may be
scaled differently for different hosts if specified in the
corresponding execution host definitions (refer to
<B><A HREF="../htmlman5/host_conf.html?pathrev=V62u5_TAG">host_conf(5)</A></B> for more information). Load thresholds are com-
pared against the scaled load and consumable values.
suspend_thresholds
A list of load thresholds with the same semantics as that of
the load_thresholds parameter (see above) except that
exceeding one of the denoted thresholds initiates suspension
of one of multiple jobs in the queue. See the nsuspend
parameter below for details on the number of jobs which are
suspended. There is an important relationship between the
uspend_threshold and the cheduler_interval. If you have for
example a suspend threshold on the np_load_avg, and the load
exceeds the threshold, this does not have immediate effect.
Jobs continue running until the next scheduling run, where
the scheduler detects the threshold has been exceeded and
sends an order to qmaster to suspend the job. The same
applies for unsuspending.
nsuspend
The number of jobs which are suspended/enabled per time
interval if at least one of the load thresholds in the
suspend_thresholds list is exceeded or if no
suspend_threshold is violated anymore respectively.
Nsuspend jobs are suspended in each time interval until no
suspend_thresholds are exceeded anymore or all jobs in the
queue are suspended. Jobs are enabled in the corresponding
way if the suspend_thresholds are no longer exceeded. The
time interval in which the suspensions of the jobs occur is
defined in suspend_interval below.
suspend_interval
The time interval in which further nsuspend jobs are
suspended if one of the suspend_thresholds (see above for
both) is exceeded by the current load on the host on which
the queue is located. The time interval is also used when
enabling the jobs. The syntax is that of a <I>time</I>_<I>specifier</I> in
<B><A HREF="../htmlman5/sge_types.html?pathrev=V62u5_TAG">sge_types(5)</A></B>.
priority
The priority parameter specifies the <B><A HREF="../htmlman2/nice.html?pathrev=V62u5_TAG">nice(2)</A></B> value at which
jobs in this queue will be run. The type is number and the
default is zero (which means no nice value is set expli-
citly). Negative values (up to -20) correspond to a higher
scheduling priority, positive values (up to +20) correspond
to a lower scheduling priority.
Note, the value of priority has no effect, if Sun Grid
Engine adjusts priorities dynamically to implement ticket-
based entitlement policy goals. Dynamic priority adjustment
is switched off by default due to <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B> reprioritize
being set to false.
min_cpu_interval
The time between two automatic checkpoints in case of tran-
sparently checkpointing jobs. The maximum of the time
requested by the user via <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> and the time defined by
the queue configuration is used as checkpoint interval.
Since checkpoint files may be considerably large and thus
writing them to the file system may become expensive, users
and administrators are advised to choose sufficiently large
time intervals. min_cpu_interval is of type time and the
default is 5 minutes (which usually is suitable for test
purposes only). The syntax is that of a <I>time</I>_<I>specifier</I> in
<B><A HREF="../htmlman5/sge_types.html?pathrev=V62u5_TAG">sge_types(5)</A></B>.
processors
A set of processors in case of a multiprocessor execution
host can be defined to which the jobs executing in this
queue are bound. The value type of this parameter is a range
description like that of the -pe option of <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> (e.g. 1-
4,8,10) denoting the processor numbers for the processor
group to be used. Obviously the interpretation of these
values relies on operating system specifics and is thus per-
formed inside <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> running on the queue host. There-
fore, the parsing of the parameter has to be provided by the
execution daemon and the parameter is only passed through
<B><A HREF="../htmlman8/sge_qmaster.html?pathrev=V62u5_TAG">sge_qmaster(8)</A></B> as a string.
Currently, support is only provided for multiprocessor
machines running Solaris, SGI multiprocessor machines run-
ning IRIX 6.2 and Digital UNIX multiprocessor machines. In
the case of Solaris the processor set must already exist,
when this processors parameter is configured. So the proces-
sor set has to be created manually. In the case of Digital
UNIX only one job per processor set is allowed to execute at
the same time, i.e. slots (see above) should be set to 1
for this queue.
qtype
The type of queue. Currently <I>batch</I>, <I>interactive</I> or a combi-
nation in a comma separated list or <I>NONE</I>.
The formerly supported types parallel and checkpointing are
not allowed anymore. A queue instance is implicitly of type
parallel/checkpointing if there is a parallel environment or
a checkpointing interface specified for this queue instance
in pe_list/ckpt_list. Formerly possible settings e.g.
qtype PARALLEL
could be transferred into
qtype NONE
pe_list pe_name
(type string; default: batch interactive).
pe_list
The list of administrator-defined parallel environment (see
<B><A HREF="../htmlman5/sge_pe.html?pathrev=V62u5_TAG">sge_pe(5)</A></B>) names to be associated with the queue. The
default is <I>NONE</I>.
ckpt_list
The list of administrator-defined checkpointing interface
names (see <I>ckpt</I>_<I>name</I> in <B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>) to be associated with
the queue. The default is <I>NONE</I>.
rerun
Defines a default behavior for jobs which are aborted by
system crashes or manual "violent" (via <B><A HREF="../htmlman1/kill.html?pathrev=V62u5_TAG">kill(1)</A></B>) shutdown of
the complete Sun Grid Engine system (including the
<B><A HREF="../htmlman8/sge_shepherd.html?pathrev=V62u5_TAG">sge_shepherd(8)</A></B> of the jobs and their process hierarchy) on
the queue host. As soon as <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> is restarted and
detects that a job has been aborted for such reasons it can
be restarted if the jobs are restartable. A job may not be
restartable, for example, if it updates databases (first
reads then writes to the same record of a database/file)
because the abortion of the job may have left the database
in an inconsistent state. If the owner of a job wants to
overrule the default behavior for the jobs in the queue the
-r option of <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> can be used.
The type of this parameter is boolean, thus either TRUE or
FALSE can be specified. The default is FALSE, i.e. do not
restart jobs automatically.
slots
The maximum number of concurrently executing jobs allowed in
the queue. Type is number, valid values are 0 to 9999999.
tmpdir
The tmpdir parameter specifies the absolute path to the base
of the temporary directory filesystem. When <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>
launches a job, it creates a uniquely-named directory in
this filesystem for the purpose of holding scratch files
during job execution. At job completion, this directory and
its contents are removed automatically. The environment
variables TMPDIR and TMP are set to the path of each jobs
scratch directory (type string; default: /tmp).
shell
If either <I>posix</I>_<I>compliant</I> or <I>script</I>_<I>from</I>_<I>stdin</I> is specified
as the shell_start_mode parameter in <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B> the shell
parameter specifies the executable path of the command
interpreter (e.g. <B><A HREF="../htmlman1/sh.html?pathrev=V62u5_TAG">sh(1)</A></B> or <B><A HREF="../htmlman1/csh.html?pathrev=V62u5_TAG">csh(1)</A></B>) to be used to process
the job scripts executed in the queue. The definition of
shell can be overruled by the job owner via the <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> -S
option.
The type of the parameter is string. The default is
/bin/csh.
shell_start_mode
This parameter defines the mechanisms which are used to
actually invoke the job scripts on the execution hosts. The
following values are recognized:
<I>unix</I>_<I>behavior</I>
If a user starts a job shell script under UNIX interac-
tively by invoking it just with the script name the
operating system's executable loader uses the informa-
tion provided in a comment such as `#!/bin/csh' in the
first line of the script to detect which command inter-
preter to start to interpret the script. This mechanism
is used by Sun Grid Engine when starting jobs if
<I>unix</I>_<I>behavior</I> is defined as shell_start_mode.
<I>posix</I>_<I>compliant</I>
POSIX does not consider first script line comments such
a `#!/bin/csh' as being significant. The POSIX standard
for batch queuing systems (P1003.2d) therefore requires
a compliant queuing system to ignore such lines but to
use user specified or configured default command inter-
preters instead. Thus, if shell_start_mode is set to
<I>posix</I>_<I>compliant</I> Sun Grid Engine will either use the
command interpreter indicated by the -S option of the
<B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> command or the shell parameter of the queue to
be used (see above).
<I>script</I>_<I>from</I>_<I>stdin</I>
Setting the shell_start_mode parameter either to
<I>posix</I>_<I>compliant</I> or <I>unix</I>_<I>behavior</I> requires you to set
the umask in use for <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> such that every user
has read access to the active_jobs directory in the
spool directory of the corresponding execution daemon.
In case you have prolog and epilog scripts configured,
they also need to be readable by any user who may exe-
cute jobs.
If this violates your site's security policies you may
want to set shell_start_mode to <I>script</I>_<I>from</I>_<I>stdin</I>. This
will force Sun Grid Engine to open the job script as
well as the epilogue and prologue scripts for reading
into STDIN as root (if <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> was started as
root) before changing to the job owner's user account.
The script is then fed into the STDIN stream of the
command interpreter indicated by the -S option of the
<B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> command or the shell parameter of the queue to
be used (see above).
Thus setting shell_start_mode to <I>script</I>_<I>from</I>_<I>stdin</I> also
implies <I>posix</I>_<I>compliant</I> behavior. Note, however, that
feeding scripts into the STDIN stream of a command
interpreter may cause trouble if commands like <B><A HREF="../htmlman1/rsh.html?pathrev=V62u5_TAG">rsh(1)</A></B>
are invoked inside a job script as they also process
the STDIN stream of the command interpreter. These
problems can usually be resolved by redirecting the
STDIN channel of those commands to come from /dev/null
(e.g. rsh host date < /dev/null). Note also, that any
command-line options associated with the job are passed
to the executing shell. The shell will only forward
them to the job if they are not recognized as valid
shell options.
The default for shell_start_mode is <I>posix</I>_<I>compliant</I>. Note,
though, that the shell_start_mode can only be used for batch
jobs submitted by <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> and can't be used for interactive
jobs submitted by <B><A HREF="../htmlman1/qrsh.html?pathrev=V62u5_TAG">qrsh(1)</A></B>, <B><A HREF="../htmlman1/qsh.html?pathrev=V62u5_TAG">qsh(1)</A></B>, <B><A HREF="../htmlman1/qlogin.html?pathrev=V62u5_TAG">qlogin(1)</A></B>.
prolog
The executable path of a shell script that is started before
execution of Sun Grid Engine jobs with the same environment
setting as that for the Sun Grid Engine jobs to be started
afterwards. An optional prefix "user@" specifies the user
under which this procedure is to be started. The procedures
standard output and the error output stream are written to
the same file used also for the standard output and error
output of each job. This procedure is intended as a means
for the Sun Grid Engine administrator to automate the execu-
tion of general site specific tasks like the preparation of
temporary file systems with the need for the same context
information as the job. This queue configuration entry
overwrites cluster global or execution host specific prolog
definitions (see <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>).
The default for prolog is the special value NONE, which
prevents from execution of a prologue script. The special
variables for constituting a command line are the same like
in prolog definitions of the cluster configuration (see
<B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>).
Exit codes for the prolog attribute can be interpreted based
on the following exit values:
0: Success
99: Reschedule job
100: Put job in error state
Anything else: Put queue in error state
epilog
The executable path of a shell script that is started after
execution of Sun Grid Engine jobs with the same environment
setting as that for the Sun Grid Engine jobs that has just
completed. An optional prefix "user@" specifies the user
under which this procedure is to be started. The procedures
standard output and the error output stream are written to
the same file used also for the standard output and error
output of each job. This procedure is intended as a means
for the Sun Grid Engine administrator to automate the execu-
tion of general site specific tasks like the cleaning up of
temporary file systems with the need for the same context
information as the job. This queue configuration entry
overwrites cluster global or execution host specific epilog
definitions (see <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>).
The default for epilog is the special value NONE, which
prevents from execution of a epilogue script. The special
variables for constituting a command line are the same like
in prolog definitions of the cluster configuration (see
<B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>).
Exit codes for the epilog attribute can be interpreted based
on the following exit values:
0: Success
99: Reschedule job
100: Put job in error state
Anything else: Put queue in error state
starter_method
The specified executable path will be used as a job starter
facility responsible for starting batch jobs. The execut-
able path will be executed instead of the configured shell
to start the job. The job arguments will be passed as argu-
ments to the job starter. The following environment vari-
ables are used to pass information to the job starter con-
cerning the shell environment which was configured or
requested to start the job.
<I>SGE</I>_<I>STARTER</I>_<I>SHELL</I>_<I>PATH</I>
The name of the requested shell to start the job
<I>SGE</I>_<I>STARTER</I>_<I>SHELL</I>_<I>START</I>_<I>MODE</I>
The configured shell_start_mode
<I>SGE</I>_<I>STARTER</I>_<I>USE</I>_<I>LOGIN</I>_<I>SHELL</I>
Set to "true" if the shell is supposed to be used as a
login shell (see login_shells in <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>)
The starter_method will not be invoked for qsh, qlogin or
qrsh acting as rlogin.
suspend_method
resume_method
terminate_method
These parameters can be used for overwriting the default
method used by Sun Grid Engine for suspension, release of a
suspension and for termination of a job. Per default, the
signals SIGSTOP, SIGCONT and SIGKILL are delivered to the
job to perform these actions. However, for some applications
this is not appropriate.
If no executable path is given, Sun Grid Engine takes the
specified parameter entries as the signal to be delivered
instead of the default signal. A signal must be either a
positive number or a signal name with "SIG" as prefix and
the signal name as printed by <I>kill</I> -<I>l</I> (e.g. SIGTERM).
If an executable path is given (it must be an <I>absolute</I> <I>path</I>
starting with a "/") then this command together with its
arguments is started by Sun Grid Engine to perform the
appropriate action. The following special variables are
expanded at runtime and can be used (besides any other
strings which have to be interpreted by the procedures) to
constitute a command line:
$<I>host</I>
The name of the host on which the procedure is started.
$<I>job</I>_<I>owner</I>
The user name of the job owner.
$<I>job</I>_<I>id</I>
Sun Grid Engine's unique job identification number.
$<I>job</I>_<I>name</I>
The name of the job.
$<I>queue</I>
The name of the queue.
$<I>job</I>_<I>pid</I>
The pid of the job.
notify
The time waited between delivery of SIGUSR1/SIGUSR2 notifi-
cation signals and suspend/kill signals if job was submitted
with the <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> -<I>notify</I> option.
owner_list
The owner_list enlists comma separated the <B><A HREF="../htmlman1/login.html?pathrev=V62u5_TAG">login(1)</A></B> user
names (see <I>user</I>_<I>name</I> in <B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>) of those users who are
authorized to disable and suspend this queue through <B><A HREF="../htmlman1/qmod.html?pathrev=V62u5_TAG">qmod(1)</A></B>
(Sun Grid Engine operators and managers can do this by
default). It is customary to set this field for queues on
interactive workstations where the computing resources are
shared between interactive sessions and Sun Grid Engine
jobs, allowing the workstation owner to have priority
access. (default: NONE).
user_lists
The user_lists parameter contains a comma separated list of
Sun Grid Engine user access list names as described in
<B><A HREF="../htmlman5/access_list.html?pathrev=V62u5_TAG">access_list(5)</A></B>. Each user contained in at least one of the
enlisted access lists has access to the queue. If the
user_lists parameter is set to NONE (the default) any user
has access being not explicitly excluded via the xuser_lists
parameter described below. If a user is contained both in
an access list enlisted in xuser_lists and user_lists the
user is denied access to the queue.
xuser_lists
The xuser_lists parameter contains a comma separated list of
Sun Grid Engine user access list names as described in
<B><A HREF="../htmlman5/access_list.html?pathrev=V62u5_TAG">access_list(5)</A></B>. Each user contained in at least one of the
enlisted access lists is not allowed to access the queue. If
the xuser_lists parameter is set to NONE (the default) any
user has access. If a user is contained both in an access
list enlisted in xuser_lists and user_lists the user is
denied access to the queue.
projects
The projects parameter contains a comma separated list of
Sun Grid Engine projects (see <B><A HREF="../htmlman5/project.html?pathrev=V62u5_TAG">project(5)</A></B>) that have access
to the queue. Any project not in this list are denied access
to the queue. If set to NONE (the default), any project has
access that is not specifically excluded via the xprojects
parameter described below. If a project is in both the pro-
jects and xprojects parameters, the project is denied access
to the queue.
xprojects
The xprojects parameter contains a comma separated list of
Sun Grid Engine projects (see <B><A HREF="../htmlman5/project.html?pathrev=V62u5_TAG">project(5)</A></B>) that are denied
access to the queue. If set to NONE (the default), no pro-
jects are denied access other than those denied access based
on the projects parameter described above. If a project is
in both the projects and xprojects parameters, the project
is denied access to the queue.
subordinate_list
There are two different types of subordination:
1. Queuewise subordination
A list of Sun Grid Engine queue names as defined for
<I>queue</I>_<I>name</I> in <B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>. Subordinate relationships are
in effect only between queue instances residing at the same
host. The relationship does not apply and is ignored when
jobs are running in queue instances on other hosts. Queue
instances residing on the same host will be suspended when a
specified count of jobs is running in this queue instance.
The list specification is the same as that of the
load_thresholds parameter above, e.g. low_pri_q=5,small_q.
The numbers denote the job slots of the queue that have to
be filled in the superordinated queue to trigger the suspen-
sion of the subordinated queue. If no value is assigned a
suspension is triggered if all slots of the queue are
filled.
On nodes which host more than one queue, you might wish to
accord better service to certain classes of jobs (e.g.,
queues that are dedicated to parallel processing might need
priority over low priority production queues; default:
NONE).
2. Slotwise preemption
The slotwise preemption provides a means to ensure that high
priority jobs get the resources they need, while at the same
time low priority jobs on the same host are not unneces-
sarily preempted, maximizing the host utilization. The
slotwise preemption is designed to provide different preemp-
tion actions, but with the current implementation only
suspension is provided. This means there is a subordination
relationship defined between queues similar to the queuewise
subordination, but if the suspend threshold is exceeded, not
the whole subordinated queue is suspended, there are only
single tasks running in single slots suspended.
Like with queuewise subordination, the subordination rela-
tionships are in effect only between queue instances resid-
ing at the same host. The relationship does not apply and is
ignored when jobs and tasks are running in queue instances
on other hosts.
The syntax is:
slots=<threshold>(<queue_list>)
where
<threshold> =a positive integer number
<queue_list>=<queue_def>[,<queue_list>]
<queue_def> =<queue>[:<seq_no>][:<action>]
<queue> =a Sun Grid Engine queue name as defined for
<I>queue</I>_<I>name</I> in <B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>.
<seq_no> =sequence number among all subordinated queues
of the same depth in the tree. The higher the
sequence number, the lower is the priority of
the queue.
Default is 0, which is the highest priority.
<action> =the action to be taken if the threshold is
exceeded. Supported is:
"sr": Suspend the task with the shortest run
time.
"lr": Suspend the task with the longest run
time.
Default is "sr".
Some examples of possible configurations and their func-
tionalities:
a) The simplest configuration
subordinate_list slots=2(B.q)
which means the queue "B.q" is subordinated to the current
queue (let's call it "A.q"), the suspend threshold for all
tasks running in "A.q" and "B.q" on the current host is two,
the sequence number of "B.q" is "0" and the action is
"suspend task with shortest run time first". This subordina-
tion relationship looks like this:
A.q
|
B.q
This could be a typical configuration for a host with a dual
core CPU. This subordination configuration ensures that
tasks that are scheduled to "A.q" always get a CPU core for
themselves, while jobs in "B.q" are not preempted as long as
there are no jobs running in "A.q".
If there is no task running in "A.q", two tasks are running
in "B.q" and a new task is scheduled to "A.q", the sum of
tasks running in "A.q" and "B.q" is three. Three is greater
than two, this triggers the defined action. This causes the
task with the shortest run time in the subordinated queue
"B.q" to be suspended. After suspension, there is one task
running in "A.q", on task running in "B.q" and one task
suspended in "B.q".
b) A simple tree
subordinate_list slots=2(B.q:1, C.q:2)
This defines a small tree that looks like this:
A.q
/ \
B.q C.q
A use case for this configuration could be a host with a
dual core CPU and queue "B.q" and "C.q" for jobs with dif-
ferent requirements, e.g. "B.q" for interactive jobs, "C.q"
for batch jobs. Again, the tasks in "A.q" always get a CPU
core, while tasks in "B.q" and "C.q" are suspended only if
the threshold of running tasks is exceeded. Here the
sequence number among the queues of the same depth comes
into play. Tasks scheduled to "B.q" can't directly trigger
the suspension of tasks in "C.q", but if there is a task to
be suspended, first "C.q" will be searched for a suitable
task.
If there is one task running in "A.q", one in "C.q" and a
new task is scheduled to "B.q", the threshold of "2" in
"A.q", "B.q" and "C.q" is exceeded. This triggers the
suspension of one task in either "B.q" or "C.q". The
sequence number gives "B.q" a higher priority than "C.q",
therefore the task in "C.q" is suspended. After suspension,
there is one task running in "A.q", one task running in
"B.q" and one task suspended in "C.q".
c) More than two levels
Configuration of A.q: subordinate_list slots=2(B.q)
Configuration of B.q: subordinate_list slots=2(C.q)
looks like this:
A.q
|
B.q
|
C.q
These are three queues with high, medium and low priority.
If a task is scheduled to "C.q", first the subtree consist-
ing of "B.q" and "C.q" is checked, the number of tasks run-
ning there is counted. If the threshold which is defined in
"B.q" is exceeded, the job in "C.q" is suspended. Then the
whole tree is checked, if the number of tasks running in
"A.q", "B.q" and "C.q" exceeds the threshold defined in
"A.q" the task in "C.q" is suspended. This means, the effec-
tive threshold of any subtree is not higher than the thres-
hold of the root node of the tree. If in this example a
task is scheduled to "A.q", immediately the number of tasks
running in "A.q", "B.q" and "C.q" is checked against the
threshold defined in "A.q".
d) Any tree
A.q
/ \
B.q C.q
/ / \
D.q E.q F.q
\
G.q
The computation of the tasks that are to be (un)suspended
always starts at the queue instance that is modified, i.e. a
task is scheduled to, a task ends at, the configuration is
modified, a manual or other automatic (un)suspend is issued,
except when it is a leaf node, like "D.q", "E.q" and "G.q"
in this example. Then the computation starts at its parent
queue instance (like "B.q", "C.q" or "F.q" in this example).
From there first all running tasks in the whole subtree of
this queue instance are counted. If the sum exceeds the
threshold configured in the subordinate_list, in this sub-
tree a task is searched to be suspended. Then the algorithm
proceeds to the parent of this queue instance, counts all
running tasks in the whole subtree below the parent and
checks if the number exceeds the threshold configured at the
parent's subordinate_list. If so, it searches for a task to
suspend in the whole subtree below the parent. And so on,
until it did this computation for the root node of the tree.
complex_values
complex_values defines quotas for resource attributes
managed via this queue. The syntax is the same as for
load_thresholds (see above). The quotas are related to the
resource consumption of all jobs in a queue in the case of
consumable resources (see <B><A HREF="../htmlman5/complex.html?pathrev=V62u5_TAG">complex(5)</A></B> for details on consum-
able resources) or they are interpreted on a per queue slot
(see slots above) basis in the case of non-consumable
resources. Consumable resource attributes are commonly used
to manage free memory, free disk space or available floating
software licenses while non-consumable attributes usually
define distinctive characteristics like type of hardware
installed.
For consumable resource attributes an available resource
amount is determined by subtracting the current resource
consumption of all running jobs in the queue from the quota
in the complex_values list. Jobs can only be dispatched to a
queue if no resource requests exceed any corresponding
resource availability obtained by this scheme. The quota
definition in the complex_values list is automatically
replaced by the current load value reported for this attri-
bute, if load is monitored for this resource and if the
reported load value is more stringent than the quota. This
effectively avoids oversubscription of resources.
Note: Load values replacing the quota specifications may
have become more stringent because they have been scaled
(see <B><A HREF="../htmlman5/host_conf.html?pathrev=V62u5_TAG">host_conf(5)</A></B>) and/or load adjusted (see <B><A HREF="../htmlman5/sched_conf.html?pathrev=V62u5_TAG">sched_conf(5)</A></B>).
The -<I>F</I> option of <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B> and the load display in the
<B><A HREF="../htmlman1/qmon.html?pathrev=V62u5_TAG">qmon(1)</A></B> queue control dialog (activated by clicking on a
queue icon while the "Shift" key is pressed) provide
detailed information on the actual availability of consum-
able resources and on the origin of the values taken into
account currently.
Note also: The resource consumption of running jobs (used
for the availability calculation) as well as the resource
requests of the jobs waiting to be dispatched either may be
derived from explicit user requests during job submission
(see the -<I>l</I> option to <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B>) or from a "default" value
configured for an attribute by the administrator (see <I>com-</I>
<B><A HREF="../htmlman5/plex.html?pathrev=V62u5_TAG">plex(5)</A></B>). The -<I>r</I> option to <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B> can be used for
retrieving full detail on the actual resource requests of
all jobs in the system.
For non-consumable resources Sun Grid Engine simply compares
the job's attribute requests with the corresponding specifi-
cation in complex_values taking the relation operator of the
complex attribute definition into account (see <B><A HREF="../htmlman5/complex.html?pathrev=V62u5_TAG">complex(5)</A></B>).
If the result of the comparison is "true", the queue is
suitable for the job with respect to the particular attri-
bute. For parallel jobs each queue slot to be occupied by a
parallel task is meant to provide the same resource attri-
bute value.
Note: Only numeric complex attributes can be defined as con-
sumable resources and hence non-numeric attributes are
always handled on a per queue slot basis.
The default value for this parameter is NONE, i.e. no
administrator defined resource attribute quotas are associ-
ated with the queue.
calendar
specifies the calendar to be valid for this queue or con-
tains NONE (the default). A calendar defines the availabil-
ity of a queue depending on time of day, week and year.
Please refer to <B><A HREF="../htmlman5/calendar_conf.html?pathrev=V62u5_TAG">calendar_conf(5)</A></B> for details on the Sun Grid
Engine calendar facility.
Note: Jobs can request queues with a certain calendar model
via a "-<I>l</I> <I>c</I>=<<I>cal</I>_<I>name</I>>" option to <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B>.
initial_state
defines an initial state for the queue either when adding
the queue to the system for the first time or on start-up of
the <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B> on the host on which the queue resides.
Possible values are:
default The queue is enabled when adding the queue or is
reset to the previous status when <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>
comes up (this corresponds to the behavior in ear-
lier Sun Grid Engine releases not supporting
initial_state).
enabled The queue is enabled in either case. This is
equivalent to a manual and explicit '<I>qmod</I> -<I>e</I>' com-
mand (see <B><A HREF="../htmlman1/qmod.html?pathrev=V62u5_TAG">qmod(1)</A></B>).
disabled The queue is disable in either case. This is
equivalent to a manual and explicit '<I>qmod</I> -<I>d</I>' com-
mand (see <B><A HREF="../htmlman1/qmod.html?pathrev=V62u5_TAG">qmod(1)</A></B>).
RESOURCE LIMITS
The first two resource limit parameters, s_rt and h_rt, are
implemented by Sun Grid Engine. They define the "real time"
or also called "elapsed" or "wall clock" time having passed
since the start of the job. If h_rt is exceeded by a job
running in the queue, it is aborted via the SIGKILL signal
(see <B><A HREF="../htmlman1/kill.html?pathrev=V62u5_TAG">kill(1)</A></B>). If s_rt is exceeded, the job is first
"warned" via the SIGUSR1 signal (which can be caught by the
job) and finally aborted after the notification time defined
in the queue configuration parameter notify (see above) has
passed. In cases when s_rt is used in combination with job
notification it might be necessary to configure a signal
other than SIGUSR1 using the NOTIFY_KILL and NOTIFY_SUSP
execd_params (see <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>) so that the jobs' signal-
catching mechanism can "differ" the cases and react accord-
ingly.
The resource limit parameters s_cpu and h_cpu are imple-
mented by Sun Grid Engine as a job limit. They impose a
limit on the amount of combined CPU time consumed by all the
processes in the job. If h_cpu is exceeded by a job running
in the queue, it is aborted via a SIGKILL signal (see
<B><A HREF="../htmlman1/kill.html?pathrev=V62u5_TAG">kill(1)</A></B>). If s_cpu is exceeded, the job is sent a SIGXCPU
signal which can be caught by the job. If you wish to allow
a job to be "warned" so it can exit gracefully before it is
killed then you should set the s_cpu limit to a lower value
than h_cpu. For parallel processes, the limit is applied
per slot which means that the limit is multiplied by the
number of slots being used by the job before being applied.
The resource limit parameters s_vmem and h_vmem are imple-
mented by Sun Grid Engine as a job limit. They impose a
limit on the amount of combined virtual memory consumed by
all the processes in the job. If h_vmem is exceeded by a job
running in the queue, it is aborted via a SIGKILL signal
(see <B><A HREF="../htmlman1/kill.html?pathrev=V62u5_TAG">kill(1)</A></B>). If s_vmem is exceeded, the job is sent a
SIGXCPU signal which can be caught by the job. If you wish
to allow a job to be "warned" so it can exit gracefully
before it is killed then you should set the s_vmem limit to
a lower value than h_vmem. For parallel processes, the
limit is applied per slot which means that the limit is mul-
tiplied by the number of slots being used by the job before
being applied.
The remaining parameters in the queue configuration template
specify per job soft and hard resource limits as implemented
by the <B><A HREF="../htmlman2/setrlimit.html?pathrev=V62u5_TAG">setrlimit(2)</A></B> system call. See this manual page on
your system for more information. By default, each limit
field is set to infinity (which means RLIM_INFINITY as
described in the <B><A HREF="../htmlman2/setrlimit.html?pathrev=V62u5_TAG">setrlimit(2)</A></B> manual page). The value type
for the CPU-time limits s_cpu and h_cpu is time. The value
type for the other limits is memory. Note: Not all systems
support <B><A HREF="../htmlman2/setrlimit.html?pathrev=V62u5_TAG">setrlimit(2)</A></B>.
Note also: s_vmem and h_vmem (virtual memory) are only
available on systems supporting RLIMIT_VMEM (see
<B><A HREF="../htmlman2/setrlimit.html?pathrev=V62u5_TAG">setrlimit(2)</A></B> on your operating system).
The UNICOS operating system supplied by SGI/Cray does not
support the <B><A HREF="../htmlman2/setrlimit.html?pathrev=V62u5_TAG">setrlimit(2)</A></B> system call, using their own
resource limit-setting system call instead. For UNICOS sys-
tems only, the following meanings apply:
s_cpu The per-process CPU time limit in seconds.
s_core The per-process maximum core file size in bytes.
s_data The per-process maximum memory limit in bytes.
s_vmem The same as s_data (if both are set the minimum is
used).
h_cpu The per-job CPU time limit in seconds.
h_data The per-job maximum memory limit in bytes.
h_vmem The same as h_data (if both are set the minimum is
used).
h_fsize The total number of disk blocks that this job can
create.
SEE ALSO
<B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B>, <B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>, <B><A HREF="../htmlman1/csh.html?pathrev=V62u5_TAG">csh(1)</A></B>, <B><A HREF="../htmlman1/qconf.html?pathrev=V62u5_TAG">qconf(1)</A></B>, <B><A HREF="../htmlman1/qmon.html?pathrev=V62u5_TAG">qmon(1)</A></B>, <I>qres-</I>
<B><A HREF="../htmlman1/tart.html?pathrev=V62u5_TAG">tart(1)</A></B>, <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B>, <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B>, <B><A HREF="../htmlman1/sh.html?pathrev=V62u5_TAG">sh(1)</A></B>, <B><A HREF="../htmlman2/nice.html?pathrev=V62u5_TAG">nice(2)</A></B>, <B><A HREF="../htmlman2/setrlimit.html?pathrev=V62u5_TAG">setrlimit(2)</A></B>,
<B><A HREF="../htmlman5/access_list.html?pathrev=V62u5_TAG">access_list(5)</A></B>, <B><A HREF="../htmlman5/calendar_conf.html?pathrev=V62u5_TAG">calendar_conf(5)</A></B>, <B><A HREF="../htmlman5/sge_conf.html?pathrev=V62u5_TAG">sge_conf(5)</A></B>, <B><A HREF="../htmlman5/complex.html?pathrev=V62u5_TAG">complex(5)</A></B>,
<B><A HREF="../htmlman5/host_conf.html?pathrev=V62u5_TAG">host_conf(5)</A></B>, <B><A HREF="../htmlman5/sched_conf.html?pathrev=V62u5_TAG">sched_conf(5)</A></B>, <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>, <B><A HREF="../htmlman8/sge_qmaster.html?pathrev=V62u5_TAG">sge_qmaster(8)</A></B>,
<B><A HREF="../htmlman8/sge_shepherd.html?pathrev=V62u5_TAG">sge_shepherd(8)</A></B>.
COPYRIGHT
See <B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B> for a full statement of rights and permis-
sions.
</PRE>
<HR>
<ADDRESS>
Man(1) output converted with
<a href="http://www.oac.uci.edu/indiv/ehood/man2html.html">man2html</a>
</ADDRESS>
</BODY>
</HTML>
|