File: breadth-first.rst

package info (click to toggle)
condor 23.9.6%2Bdfsg-2.1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 60,012 kB
  • sloc: cpp: 528,272; perl: 87,066; python: 42,650; ansic: 29,558; sh: 11,271; javascript: 3,479; ada: 2,319; java: 619; makefile: 615; xml: 613; awk: 268; yacc: 78; fortran: 54; csh: 24
file content (74 lines) | stat: -rw-r--r-- 3,120 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Recipe: How to Fill a Pool Breadth-First
========================================

Some pool administrators prefer a policy where, when there are fewer jobs
than total cores in their pool, those jobs are "spread out" as much as
possible, so that each machine is running the fewest number of jobs.  If each
machine is identical, such a policy may result in better performance than one
in which each machine is "filled up" before assigning jobs to the next machine,
but may use more power to do so.

For a Pool with Partitionable Slots
-----------------------------------

HTCondor uses partitionable slots by default.

The following recipe assumes that
:ref:`consumption policies<consumption-policy>`
have not been enabled.

For efficiency reasons, HTCondor usually tries to match as many jobs to a
single machine as it can.  The main idea is to configure HTCondor to instead
to prefer to match as many machines it can, given the number of jobs available.
The downside of doing so is that each job can only match one new job per
negotiation cycle, so it could take a lot longer to get jobs started.

On the schedd, you need to unset :macro:`CLAIM_PARTITIONABLE_LEFTOVERS`.
Set by default, this macro tells the schedd to try to start as
many jobs as it can on each match given to it by the negotiator.  Since the
negotiator matches jobs to entire partitionable slots, that could be a
large number.

On the schedd, make the following configuration change.
This requires a ``condor_reconfig`` of the schedd to take effect.

.. code-block:: condor-config

    CLAIM_PARTITIONABLE_LEFTOVERS = false

On the central manager, you need to unset :macro:`NEGOTIATOR_DEPTH_FIRST`.
Set by default, this macro tells the negotiator to try and match as many
jobs as it can to the same slot.  Since the
negotiator matches jobs to entire partitionable slots, that could be a
large number.

On the central manager, make the following configuration change.
This requires a ``condor_reconfig`` of the negotiator to take effect.

.. code-block:: condor-config

    NEGOTIATOR_DEPTH_FIRST = false

For a Pool with Static Slots
----------------------------

If you've configured your pool with static slots, the situation is much
simpler.

The main idea is to set the :macro:`NEGOTIATOR_PRE_JOB_RANK` expression in the
negotiator to prefer to give to the schedds machines that are already
running the fewest numbers of jobs.  We use :macro:`NEGOTIATOR_PRE_JOB_RANK`
instead of :macro:`NEGOTIATOR_POST_JOB_RANK` so that the job's ``RANK``
expression doesn't come into play.  If you trust your users to override this
policy, you could use :macro:`NEGOTIATOR_POST_JOB_RANK` instead.  (Note that
because this policy happens in the negotiator, if :macro:`CLAIM_WORKLIFE` is
set to a high value, the schedds are free to reuse the slots they have been
assigned for some time, which may cause the policy to be out of balance for
the :macro:`CLAIM_WORKLIFE` duration.)

.. code-block:: condor-config

    NEGOTIATOR_PRE_JOB_RANK = isUndefined(RemoteOwner) * (- SlotId)

Changing this will require a ``condor_reconfig`` of the negotiator to take
effect.