1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562
|
Specification of 6.0 changes in Grid Engine Enterprise Edition
policy module as preparation for resource reservation
and for the SGE 5.3 mimic mode
1. Concepts
a) New job priority based on urgency and ticket sub-policy
The new Enterprise Edition priority used by the scheduler
to determine pending job list order unites the existing
5.3 ticket policy, the new urgency policy and the policy
that is implemented through the priority controlled via
-p <priority> submit option. The latter one is also called
POSIX priority which indicates it's origin. This term is
used to delimit this priority from priorty concept that is
used by the scheduler to finally decide about pending job
list order. For each policy a weighting factor can be specified
determining to degree it shall be in effect. To facilitate
easier control on each policies values range a normalized
ticket amount ("ntckts"), a normalized urgency value ("nurg")
and a normalized POSIX priority ("pprio") is used rather than
the raw values ticket amount resp. urgency value ("tckts"/"urg"):
prio = weight_urgency * nurg +
weight_ticket * ntckts +
weight_priority * pprio
nurg = normalized(urg)
ntckts = normalized(tckts)
pprio = normalized(-p <priority>)
b) New urgency sub-policy
The urgency policy defines a so-called urgency value ("urg") for each
job. The urgency value consists of resource requirement contribution
("rrcontr"), waiting time contribution ("wtcontr") and deadline
contribution ("dlcontr")
urg = rrcontr + wtcontr + dlcontr
The resource requirement contribution is a sum with one addend
for each hard resource request ("rraddend")
rrcontr = Sum over all(rraddend)
Depending on the resource type two different methods are used
to determine the resource_addend. For numeric type resource
requests the addend is a product of the resources' urgency value
("rurg") from complex(5), the assumed slot allocation of the job and the
per slot request specified using -l submit(1) option:
rraddend = "rurg" * assumed_slot_allocation * request
For string type requests the resources' urgency value is directly
used as addend:
rraddend = "rurg"
The waiting time contribution is the product of the jobs waiting time
in seconds ("waiting_time") and the "waiting_weight" value specified
in sched_conf(5).
wtcontr = waiting_time * waiting_weight
The deadline contribution is 0 for jobs without a deadline
dlcontr = 0
or the quotient of the 'weight_deadline' value from sched_conf(5) and
the free time in seconds until deadline initiation time.
The urgency policy provides a multitude of means for different resource
dependent prioritization schemes. Amongst them are general preference for
parallel jobs to implement "largest jobs first" filling or preference for
jobs that request particular resources in order to keep expensive licenses
utilized.
c) 6.0 Changes with the ticket policy
The ticket policy represents the three sub-policies functional
policy, override policy and share tree policy. The ticket value
("tckts") is defined as the sum of the specific ticket values
for each sub-policy ("ftckt"/"otckt"/"stckt")
tckts = ftckt + otckt + stckt
The ticket policies provide a broad range of means for influencing
both jobs pending-time and run-time priority on a per job, per user, per
project, per department and job class basis. Irrespective of the
disappearance of the deadline ticket policy (as reflected by this
document) the 5.3 documentation is still applicable.
d) 6.0 Changes to unite SGE 5.3 -p priority concept with new GEEE 6.0
priority concept
The -p <priority> as described in qsub(1), qalter(1), etc. can be
used for implementing site specific priority policies. Priorities can
be specified within a range -1023 to 1024 whereas higher numbers
always mean higher priority.
2. Use cases
The following new use cases show how problems for known problems
can be solved based on the changes with job priority in GEEE 6.0:
a) Resource-based priority classes
* use three resources each one representing a priority class
#name shortcut type relop requestable consumable default urgency
low lw BOOL == YES NO 0 0
medium md BOOL == YES NO 0 100
high hg BOOL == YES NO 0 10000
* don't consider slots with urgency computation
slots s INT <= YES YES 1 0
* configure these three complex attributes in the global host
qconf -rattr exechost complex_values low=true,medium=true,high=true global
* use 'weight_urgency' of '1' and 'weight_ticket' of '0'
weight_ticket 0
weight_urgency 1.0
--> -l high jobs always get dispatched before -l medium and -l low jobs
--> -l medium jobs always get dispatched before -l low jobs
b) Resource-based priority classes and use of functional policy to implement 50:50
project-sort
* same resources as in a)
* use 'weight_urgency' of '1' and 'weight_ticket' of '0.001'
* use 'weight_tickets_functional' of '10000' and 'weight_project' of '1' (others 0)
* use two project 'p1' and 'p2' with 50 functional shares each
--> -l high jobs always get dispatched before -l medium and -l low jobs
--> -l medium jobs always get dispatched before -l low jobs
--> functional policy must cause 'p1' and 'p2' jobs be dispatched with a relation
50:50 at any time
c) Resource-based priority classes and use of share-tree policy to implement over time
50:50 project-sort
* same resources as in a)
* use 'weight_urgency' of '1' and 'weight_ticket' of '0.001'
* use 'weight_tickets_share' of '10000'
* use a share tree and assign 50 shares to the projects 'p1' and 'p2'
--> -l high jobs always must be dispatched before -l medium and -l low jobs
--> -l medium jobs always must be dispatched before -l low jobs
--> share-tree policy must cause 'p1' and 'p2' jobs be dispatched with a resource
usage relation 50:50 over time
d) Strict -p priority classes overlaying resource-based urgency overlaying
functional policy to implement equal share user sort
* use 1 as 'weight_priority', 0.001 as 'weight_urgency' and 0.00001 as
'weight_ticket' in sched_conf(5)
* use auto_user_fshare of '100' in sge_conf(5)
* use auto_user_delete_time of '720:0:0' in sge_conf(5)
* use enforce_user of 'auto' in sge_conf(5)
--> for jobs with the same -p priority that request the same resources
the equal share user sort will be in effect
--> for jobs with the same -p priority that differ regarding their requested
resources the urgency policy will be in effect whereas the equal share user
sort will not be determining
--> for jobs with different -p priority the priority determines which one is
dispatched first and other policies will not be determining
3. Changes with command line interface and configuration file formats
qstat(1)
>
> -urg This option is only supported in case of a Sun Grid
> Engine, Enterprise Edition system. It is not available
> for Sun Grid Engine systems. Displays additional
> information for each job related to the job urgency policy
> scheme.
>
< -ext This option is only supported in case of a Sun Grid
< Engine, Enterprise Edition system. It is not available
< for Sun Grid Engine systems.
< Displays additional Sun Grid Engine, Enterprise Edition
< relevant information for each job (see OUTPUT FORMATS
< below).
<
>
> -ext This option is only supported in case of a Sun Grid
> Engine, Enterprise Edition system. It is not available
> for Sun Grid Engine systems.
> Displays additional information for each job related to
> the job ticket policy scheme.
>
>
> -g t Displays for running parallel job subtask information
> in multiple lines. By default, parallel job tasks are
> displayed in a single line. See under OUTPUT FORMATS for
> more details on -g t output format differences.
>
< -t Prints extended information about the controlled sub-
< tasks of the displayed parallel jobs. Please refer to
< the OUTPUT FORMATS sub-section Expanded Format below
< for detailed information. Sub-tasks of parallel jobs
< should not be confused with array job tasks (see -g
< option above and -t option to qsub(1)).
> -t Prints extended information about the controlled sub-
> tasks of the displayed parallel jobs. Please refer to
> the OUTPUT FORMATS sub-section Reduced Format below
> for detailed information. Sub-tasks of parallel jobs
> should not be confused with array job tasks (see -g
> option above and -t option to qsub(1)).
< Reduced Format (without -f and -F)
< Following the header line a line is printed for each job
< consisting of
<
< o the job ID.
<
< o the priority of the jobs as assigned to them via the -p
< option to qsub(1) or qalter(1) determining the order of
< the pending jobs list.
<
> Reduced Format (without -f and -F)
> Following the header line a line is printed for each job
> consisting of
>
> o the job ID.
>
> o the priority of the jobs determining the order of the pending list.
> In case of a Sun Grid Engine system this is the priority
> as assigned to a job via the -p option to qsub(1) or qalter(1). In case
> of Sun Grid Engine, Enterprise Edition system the priority value is
> determined dynamically by sge_schedd(8) as described in sched_conf(5).
< o the function of the running jobs (MASTER or SLAVE - the
< latter for parallel jobs only).
> o The function of the running jobs or the number of slots depending
> whether -g t is specified or not:
>
> If a single line is printed for a parallel job the number of slots
> occupied or requested by the job is diplayed. For pending parallel that
> got a slot range assigned via -pe submit(1) option a preliminary assumed
> slot amount is displayed. This is always the range minimum in a Sun Grid
> Engine, in a Sun Grid Engine, Enterprise Edition system system this slot
> number can be controlled by the urgency_slots parameter in sge_pe(5).
>
> If multiple lines are printed for running parallel jobs (see -g t) the
> function of the running parallel job subtask (MASTER or SLAVE - the
> latter for parallel jobs only) is displayed.
< If the -t option is supplied, each job status line also con-
< tains
> If the -t option is supplied, each status line always contains
> parallel job subtask information as if -g t were specified and
> each line contains the following parallel job subtask information:
< Expanded Format (with -r)
< If the -r option was specified together with qstat, the fol-
< lowing information for each displayed job is printed (a sin-
< gle line for each of the following job characteristics):
<
< o The hard and soft resource requirements of the job as
< specified with the qsub(1) -l option.
<
< o The requested parallel environment including the desired
< queue slot range (see -pe option of qsub(1)).
<
> Expanded Format (with -r)
> If the -r option was specified together with qstat, the fol-
> lowing information for each displayed job is printed (a sin-
> gle line for each of the following job characteristics):
>
> o The hard and soft resource requirements of the job as
> specified with the qsub(1) -l option. In a Sun Grid Engine,
> Enterprise Edition system also the per resource addend used
> to determine the urgency contribution rrcontr value is printed.
>
> o The requested parallel environment including the desired
> queue slot range (see -pe option of qsub(1)).
>
< Enhanced Sun Grid Engine, Enterprise Edition Output (with -ext)
< For each job the following additional items are displayed:
<
< project
< The project to which the job is assigned as specified
< in the qsub(1) -P option.
<
< department
< The department, to which the user belongs (use the -sul
< and -su options of qconf(1) to display the current
< department definitions).
<
< deadline
< The deadline initiation time of the job as specified
< with the qsub(1) -dl option.
<
< cpu The current accumulated CPU usage of the job.
<
< mem The current accumulated memory usage of the job.
<
< io The current accumulated IO usage of the job.
<
< tckts
< The total number of tickets assigned to the job
< currently
<
< ovrts
< The override tickets as assigned by the -ot option of
< qalter(1).
<
< otckt
< The override portion of the total number of tickets
< assigned to the job currently
<
< dtckt
< The deadline portion of the total number of tickets
< assigned to the job currently
<
< ftckt
< The functional portion of the total number of tickets
< assigned to the job currently
<
< stckt
< The share portion of the total number of tickets
< assigned to the job currently
<
< share
< The share of the total system to which the job is enti-
< tled currently.
<
> Enhanced Sun Grid Engine, Enterprise Edition Output (with -ext)
> For each job the following additional items are displayed:
>
> ntckts
> The normalized ticket value.
>
> project
> The project to which the job is assigned as specified
> in the qsub(1) -P option.
>
> department
> The department, to which the user belongs (use the -sul
> and -su options of qconf(1) to display the current
> department definitions).
>
> cpu The current accumulated CPU usage of the job.
>
> mem The current accumulated memory usage of the job.
>
> io The current accumulated IO usage of the job.
>
> tckts
> The total number of tickets assigned to the job
> currently
>
> ovrts
> The override tickets as assigned by the -ot option of
> qalter(1).
>
> otckt
> The override portion of the total number of tickets
> assigned to the job currently
>
> ftckt
> The functional portion of the total number of tickets
> assigned to the job currently
>
> stckt
> The share portion of the total number of tickets
> assigned to the job currently
>
> share
> The share of the total system to which the job is enti-
> tled currently.
>
> Enhanced Sun Grid Engine, Enterprise Edition Output (with -urg)
> For each job the following additional urgency policy related
> items are displayed:
>
> nurg
> The normalized urgency value.
>
> urg
> The urgency value that is a sum of resource request contribution
> (rrcontr), waiting time contribution (wtcntr) and deadline
> contribution (dlcontr).
>
> rrcontr
> The urgency contribution that reflects the urgency
> that is related to the jobs overall resource requirement. The
> resource requirement contribution is a sum consisting
> of one addend per resource request (hard requests only). For
> resource requests of numeric type (i.e. INT, DOUBLE, TIME,
> MEMORY, BOOL) the addend is computed is a product of the jobs
> request, the preliminarily assumed slot allocation (see
> urgency_slots in sge_pe(5)) and the per resource urgency
> weighting factor specified in complex(5) number. For resource
> requests of string type (i.e. STRING, CSTRING, RSTRING, HOST)
> the resources ugency value as specified in complex(5) is
> directly used as added. The addend for each resource request in
> this formula can be investigated with the -r option.
>
> wtcontr
> The urgency contribution that reflects the urgency related to
> the jobs waiting time. The waiting time contribution is
> the product of the jobs waiting time (in seconds ??) and the
> 'weight_waiting_time' value as specified in sched_conf(5).
>
> dlcontr
> The urgency contribution that reflects the urgency related to
> the jobs deadline initiation time, if specified with -dl
> submit(1) option. For jobs with a deadline initiation time the
> deadline contribution is the quotient of 'weight_deadline' value
> as specified in sched_conf(5) and the free time until deadline
> initiation time (in seconds ??).
>
> deadline
> The deadline initiation time of the job as specified with the
> qsub(1) -dl option.
>
sched_conf(5)
< weight_tickets_deadline (was wrongly documented as weight_deadline!)
< This parameter is only available in a Sun Grid Engine,
< Enterprise Edition system. Sun Grid Engine does not support
< this parameter.
<
< The maximum number of deadline tickets available for distri-
< bution by Sun Grid Engine, Enterprise Edition. Determines
< the relative importance of the deadline policy.
> weight_deadline
> This parameter is only available in a Sun Grid Engine,
> Enterprise Edition system. Sun Grid Engine does not support
> this parameter.
>
> The weight applied on the remaining time until latest deadline
> initiation.
>
> weight_waiting_time
> This parameter is only available in a Sun Grid Engine,
> Enterprise Edition system. Sun Grid Engine does not support
> this parameter.
>
> The weight applied on the waiting time since job submission.
>
> weight_urgency
> This parameter is only available in a Sun Grid Engine,
> Enterprise Edition system. Sun Grid Engine does not support
> this parameter.
>
> The weight applied on normalized urgency (nurg) when
> determining priority (prio) finally used.
>
> weight_ticket
> This parameter is only available in a Sun Grid Engine,
> Enterprise Edition system. Sun Grid Engine does not support
> this parameter.
>
> The weight applied on normalized ticket amount (ntix) when
> determining priority (prio) finally used.
>
sge_conf(5)
< SHARE_DEADLINE_TICKETS
< If set to "true" or "1", the total deadline tickets are
< shared among all deadline jobs. If set to "false" or
< "0", each deadline job will receive the total deadline
< tickets once the job deadline has been reached. The
< default value is "true". This parameter is only valid
< in a SGEEE system.
< The "POLICY_HIERARCHY" parameter can be a up to 4
< letter combination of the first letters of the 4 poli-
< cies S(hare-based), F(unctional), D(eadline) and
< O(verride). So a value "OFSD" means that the override
< policy takes precedence over the functional policy,
< which influences the share-based policy and which
< finaly preceeds the deadline policy. Less than 4
< letters mean that some of the policies do not influence
< other policies and also are not influenced by other
< policies. So a value of "FS" means that the functional
< policy influences the share-based policy and that there
< is no interference with the other policies.
> The "POLICY_HIERARCHY" parameter can be a up to 3
> letter combination of the first letters of the 3 poli-
> cies S(hare-based), F(unctional) and
> O(verride). So a value "OFS" means that the override
> policy takes precedence over the functional policy,
> which finally preceeds the share-based policy. Less
> than 3 letters mean that some of the policies do not
> influence other policies and also are not influenced
> by other policies. So a value of "FS" means that the
> functional policy influences the share-based policy and
> that there is no interference with the other policies.
complex(5)
> urgency
> Sun Grid Engine does not support this parameter.
> The urgency weight is applied by the sge_schedd(8) on the
> resources total requirement of a job when determining the
> resource request contribution (rrcontr) to the urgency (urg).
> Qstat(1) -r can be used to monitor all resource request
> contributions of a job affecting it's urgency (urg).
>
sge_pe(5)
> urgency_slots
> This parameter is only available in a Sun Grid Engine,
> Enterprise Edition system. Sun Grid Engine does not support
> this parameter.
>
> For jobs that are submitted with a slot range PE request the
> number of slots can't be determined without considering possible
> assignments. This setting specifies what method used by
> sge_schedd(8) to determine the estimated number slots the job
> will get finally. The slot amount is then assumed when determining
> the total resource requirement is computed to for each resource.
> When qstat(1) is run without -g t option the slot amount displayed
> is determined using the method specified with urgency_slots.
>
> Please note, when wildcards and slot ranges are used with qsub(1) -pe
> option the method used to determine the assumed slot amount is
> ambiguous. In this case it is recommended to use the same urgency_slots
> settings with all PEs. (??? check this with -w e ???)
>
> The following methods are supported:
>
> <int> The specified integer number is assumed as prospective
> slot amount.
>
> 'min' The minimum of the slot range is assumed as prospective
> slot amount. If no lower bound is specified with the range
> 1 is assumed.
>
> 'max' The maximum of the slot range is assumed as prospective
> slot amount. If no upper bound is specified with the range
> the absolute maximum possible due to the PE's 'slot' setting
> is assumed.
>
> 'avg' The average of all numbers within the jobs PE range request
> is assumed.
>
|