File: condor_drain.rst

package info (click to toggle)
condor 23.9.6%2Bdfsg-2.1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 60,012 kB
  • sloc: cpp: 528,272; perl: 87,066; python: 42,650; ansic: 29,558; sh: 11,271; javascript: 3,479; ada: 2,319; java: 619; makefile: 615; xml: 613; awk: 268; yacc: 78; fortran: 54; csh: 24
file content (124 lines) | stat: -rw-r--r-- 5,317 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
*condor_drain*
===============

Control draining of an execute machine
:index:`condor_drain<single: condor_drain; HTCondor commands>`\ :index:`condor_drain command`

Synopsis
--------

**condor_drain** [**-help** ]

**condor_drain** [**-debug** ] [**-pool** *pool-name*]
[**-graceful | -quick | -fast**] [**-reason** *reason-text*]
[**-resume-on-completion | -restart-on-completion | -reconfig-on-completion | -exit-on-completion**]
[**-check** *expr*] [**-start** *expr*] *machine-name*

**condor_drain** [**-debug** ] [**-pool** *pool-name*] **-cancel**
[**-request-id** *id*] *machine-name*

Description
-----------

*condor_drain* is an administrative command used to control the
draining of all slots on an execute machine. When a machine is draining,
it will not accept any new jobs unless the **-start** expression
specifies otherwise. Which machine to drain is specified by the argument
*machine-name*, and will be the same as the machine ClassAd attribute
``Machine``.

How currently running jobs are treated depends on the draining schedule
that is chosen with a command-line option:

 **-graceful**
    Initiate a graceful eviction of the job. This means all promises
    that have been made to the job are honored, including
    ``MaxJobRetirementTime``. The eviction of jobs is coordinated to
    reduce idle time. This means that if one slot has a job with a long
    retirement time and the other slots have jobs with shorter
    retirement times, the effective retirement time for all of the jobs
    is the longer one. If no draining schedule is specified,
    **-graceful** is chosen by default.
 **-quick**
    ``MaxJobRetirementTime`` is not honored. Eviction of jobs is
    immediately initiated. Jobs are given time to shut down
    according to the usual policy, that is, given by
    :macro:`MachineMaxVacateTime`.
 **-fast**
    Jobs are immediately hard-killed, with no chance to gracefully shut
    down.

If you specify **-graceful**, you may also specify **-start**. On a
gracefully-draining machine, some jobs may finish retiring before
others. By default, the resources used by the newly-retired jobs do not
become available for use by other jobs until the machine exits the
draining state (see below). The **-start** expression you supply
replaces the draining machine's normal :macro:`START` expression for the
duration of the draining state, potentially making those resources
available. See the
:ref:`admin-manual/ep-policy-configuration:*condor_startd* Policy Configuration`
section for more information.

Once draining is complete, the machine will enter the Drained/Idle
state. To resume normal operation (negotiation) at that time or any
previous time during draining, the **-cancel** option may be used. The
**-resume-on-completion** option results in automatic resumption of
normal operation once draining has completed, and may be used when
initiating draining. This is useful for forcing a machine with a
partitionable slots to join all of the resources back together into one
machine, facilitating de-fragmentation and whole machine negotiation.

Options
-------

 **-help**
    Display brief usage information and exit.
 **-debug**
    Causes debugging information to be sent to ``stderr``, based on the
    value of the configuration variable :macro:`TOOL_DEBUG`.
 **-pool** *pool-name*
    Specify an alternate HTCondor pool, if the default one is not
    desired.
 **-graceful**
    (the default) Honor the maximum vacate and retirement time policy.
 **-quick**
    Honor the maximum vacate time, but not the retirement time policy.
 **-fast**
    Honor neither the maximum vacate time policy nor the retirement time
    policy.
 **-reason** *reason-text*
    Set the drain reason to *reason-text*. While the *condor_startd* is draining
    it will advertise the given reason. If this option is not used the
    reason defaults to the name of the user that started the drain.
 **-resume-on-completion**
    When done draining, resume normal operation, such that potentially
    the whole machine could be claimed.
 **-restart-on-completion**
    When done draining, restart the *condor_startd* daemon so that
    configuration changes will take effect.
 **-reconfig-on-completion**
    When done draining, reconfig and then resume normal operation. A reconfig
    will not change the resources assigned to slots, but most other configuration
    changes will be applied, including changes to the :macro:`START` expression
    and to offline GPUs and universes.
 **-exit-on-completion**
    When done draining, shut down the *condor_startd* daemon and tell
    the *condor_master* not to restart it automatically.
 **-check** *expr*
    Abort draining, if ``expr`` is not true for all slots to be drained.
 **-start** *expr*
    The :macro:`START` expression to use while the machine is draining. You
    can't reference the machine's existing :macro:`START` expression.
 **-cancel**
    Cancel a prior draining request, to permit the *condor_negotiator*
    to use the machine again.
 **-request-id** *id*
    Specify a specific draining request to cancel, where *id* is given
    by the :ad-attr:`DrainingRequestId` machine ClassAd attribute.

Exit Status
-----------

*condor_drain* will exit with a non-zero status value if it fails and
zero status if it succeeds.