File: README_ipp-os-shutdown.adoc

package info (click to toggle)
nut 2.8.3-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 24,356 kB
  • sloc: ansic: 123,904; sh: 14,718; cpp: 12,558; makefile: 5,212; python: 1,114; perl: 855; xml: 47
file content (254 lines) | stat: -rw-r--r-- 11,568 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
= README for the ipp-os-shutdown and early shutdown features in IPP - UNIX

== General information

This version of IPP - Unix includes enhanced scripts and configuration
which enable to configure "early shutdown" functionality on chosen hosts.
This allows selected systems to power themselves off after a certain
time that less than `$MINSUPPLIES` power sources are protected by UPSes 
that are fully "ONLINE".

In order for this to happen, a new configuration variable was introduced
in `ipp.conf` file, the `SHUTDOWN_TIMER` which allows to specify the
number of minutes that the power protection is insufficient, after which
the irreversible shutdown of this host begins. Default value for this
variable is `-1` which retains the standard behavior of staying up as
long as possible, and shutting down only when the UPS sends an alert
that too little battery runtime remains (as configured by default or
customized by `shutdown_duration` option for the `netxml-ups` driver).

[NOTE]
NOTE: If this feature is used, it is recommended to specify the timeout
for an early-shutdown feature of at least 5 minutes (`SHUTDOWN_TIMER=5`).
Reason: When an NMC in the UPS is rebooted, it can announce dummy values
about the UPS and battery status for some time, while it is collecting
real data from the UPS hardware, and this data can cause a protected host
to begin an early shutdown needlessly. A sufficiently long `SHUTDOWN_TIMER`
value allows the host to receive real data from the UPS and so to cancel
the pending shutdown (if the battery is indeed well charged and the UPS
is really online).

Another added option is `POWERDOWNFLAG_USER` which may be pre-set to
`enforce` or `forbid` to enable or disable UPS power-cycling at the
end of the shutdown procedure. Generally it does not need to be pre-set
in the `ipp.conf` file as the script relies on the `POWERDOWNFLAG` file
managed by `upsmon` (this in turn depends on whether `upsmon` triggered
the shutdown due to an alarm, such as low-battery condition, or just
the delayed early shutdown was used as scheduled by "notification").
However it is possible to configure a specific behavior on this host,
e.g. to be sure to avoid power-cycling when early shutdown hosts are
shutting down due to whatever powerfail-driven reason.

These options should be configured by an administrator of the particular
host in the `/usr/local/ups/etc/ipp.conf` file.

When UPS events are processed by `upsmon` with its configured `NOTIFYCMD`
(the packaged `ipp-notifier.sh` script, enhanced for this delivery) it
has an ability to detect how many UPSes are `ONLINE` and how many are
required by the `MINSUPPLIES` setting. UPSes whose state is currently
`unknown` are not considered, in order to avoid erroneous shutdowns
while the communications are just starting up. If the protection is
deemed insufficient, the new `ipp-os-shutdown` script is launched,
which can invoke a customized copy of `ipp-host-shutdown.sample` script
for host-specific procedures (such as clusterware shutdown).

The `ipp-os-shutdown` script has several roles:

* it manages the delayed shutdown (which can be canceled before the timer
expires, and can not be aborted after the timer expires - so it proceeds
to the end)
* it wraps the call to a customized shutdown script, if one is configured
and detected
* it detects whether the UPSes should be told to power-cycle or not,
based on the `killpower` flag-file from `upsmon`, a setting in `ipp.conf`
file, or an explicit request from the caller of this shutdown script
* in the end it detects whether the UPSes came back online, so the host
should try to `reboot` rather than `poweroff` (unless a specific action
was requested by the caller of this shutdown script) and calls the OS
shutdown program to complete this activity
* the same script (with an option of zero delay) is configured as the
`SHUTDOWNCMD` in `upsmon.conf` so the same logic is executed in all
the different supported shutdown scenarios.

It is recommended to configure certain `SHUTDOWN_TIMER` values for the
hosts which should shut down early and leave more battery runtime power
remaining for the more important hosts. Note that if the external power
returns after the early-shutdown hosts have powered off, they will stay
down until an administrator boots them. However, if the external power
becomes sufficient again during the shutdown procedure (checked between
cluster-ware shutdown and the OS shutdown steps) then a reboot of the
host is requested instead of a power-off.

It is recommended to configure `SHUTDOWN_TIMER=-1` (default) on those
more important hosts which should stay up as long as possible and only
shut down if all required UPSes have posted a low-battery status or
forced-shutdown command. These hosts would by default schedule delayed
UPS power-cycling. To be on the safe side, sufficient `shutdown_duration`
seconds should be configured in their `netxml-ups` driver blocks in the
`/usr/local/ups/etc/ups.conf` file on the host.

If the customer elects to use the early shutdown strategy for all hosts,
the `POWERDOWNFLAG_USER=enforce` should be configured in the hosts with
the highest `SHUTDOWN_TIMER` value so they would cause UPS power-cycling.
Do not forget to define sufficient `DELAY` value for the OS shutdown to
complete before the UPS turns itself off. The UPS would turn on the load
automatically after some time, when external power is back and it has
charged the battery sufficiently (configurable in the Network Management
web-interface for the UPS).

== Installation

* Install the package as usual

- Un-compress the `ipp-*.tar.gz` archive for your OS

- Change into the resulting `ipp-*` subdirectory

- (optional) If a complete re-installation is desired (including removal
of old configuration files so they do not conflict with the new delivery),
please also execute `IPP_WIPE_OLD_CONFIG=yes; export IPP_WIPE_OLD_CONFIG`
in the shell before running the installation script

- Launch the `install.sh` script and follow the installer instructions

* Configure the early shutdown timer

You can later change the early shutdown timer, for the cluster nodes,
by editing `/usr/local/ups/etc/ipp.conf`, and set `SHUTDOWN_TIMER` to
a suitable value (in minutes).

* Configure clusterware shutdown command

You can configure the variable `SHUTDOWNSCRIPT_CUSTOM` in `ipp.conf` to
point at a custom complementary shutdown script with a shutdown routine
required by this particular host, which will be called by the master
powerfail shutdown script (`/usr/local/ups/sbin/ipp-os-shutdown`).
This variable is not set by default.

* Configure operating system shutdown command and options

Optionally, modify the operating system shutdown command and type.
To modify the shutdown command or specific option for the various types
of shutdown), edit the file `/usr/local/ups/etc/ipp-os-shutdown.conf`
and set or adapt the variables:

- `CMD_SHUTDOWN` to point at the shutdown command. This may include
the basic mandatory option options, such as the non-interactive flag
(`-y`) on some OS such as HP-UX or Solaris,
- `SDFLAG_*` to point at the right option for poweroff, reboot or halt.

Note that depending on the OS and hardware features, there may be no
difference between "halt" (stop the OS and keep the hardware running)
and a "poweroff" (halt and instruct the server's PSU to cut the power
going to the motherboard -- if that is supported; might be unavailable
in e.g. virtualized environments or older hardware).

To modify the default shutdown option to halt or reboot, edit the file
`/usr/local/ups/etc/ipp.conf` and set `SDFLAG_POWERSTATE_DEFAULT` to
either `$SDFLAG_HALT` (halt) or `$SDFLAG_REBOOT` (reboot).

* (optionally) Configure UPS power-cycling

If the customer elects to use the early shutdown strategy for all hosts,
the `POWERDOWNFLAG_USER=enforce` should be configured in the `ipp.conf`
file on hosts with the highest `SHUTDOWN_TIMER` value so they would cause
UPS power-cycling explicitly. By default, it may be enabled or forbidden
depending on the cause of shutdown.

Make sure to define a sufficient `DELAY` value (in seconds) as well, for
the OS shutdown to complete safely before the UPS(es) power is cut.


== Testing

The following points about this delivery were verified on AIX 7.1 and
Solaris 11. It was tested:

- to be installable (and runnable via init-script)

- to work as a `SHUTDOWNCMD` handler, including interaction with the
killpower flag-file maintained by `upsmon` (with the default setting
of `SHUTDOWN_TIMER=-1` in the `ipp.conf` file)

- to report progress of the powerfail shutdown onto the system consoles
(with `wall`) and into the `syslog` (note: default syslog priority from
IPP - Unix is `user.notice` which may be ignored by default `syslog`
configuration, check with `/etc/syslog.conf` or equivalent in your OS).

- to execute early shutdown when `SHUTDOWN_TIMER=5` (after 5 minutes
ONBATT) is defined manually in the `ipp.conf` file

- to cancel shutdown if power returns back before timeout expires

- to not cancel shutdown if power returns back after timeout expires
and shutdown routine has started (and reported to be irreversible)

- to report that the shutdown is in irreversible stage, as reaction
to CTRL+C during console-run invocations like `ipp-os-shutdown -t now`

- to execute custom shutdown script if it is found, and to skip it
if not available

- to power-off the protected host if power remains lost at the moment
when we are about to proceed to `/sbin/shutdown`

- to reboot the host if power is already back when we are about to
proceed to `/sbin/shutdown`

- to set the `netxml-ups` driver argument `shutdown_timer` during
installation to a value which matches the chosen `SHUTDOWN_TIMER`
(if it is non-negative) so the selected ONBATT timeout is shown in
the Eaton NMC Web-GUI

- to implement numerous fixes and improvements in the `install.sh`
script, including integration of new settings for early shutdown
and UPS power-cycling strategy


=== A few important notes helpful during testing

* currently running IPP - Unix processes, UPS states and the pending
shutdown status can be queried with the following command:

----
:; ps -ef | grep -v grep | egrep 'ipp|ups|nut|shut|sleep' ; \
   ls -la /usr/local/ups/etc/killpower ; \
   /usr/local/ups/bin/ipp-status; \
   /usr/local/ups/sbin/ipp-os-shutdown -s; date
----

* a pending shutdown that is not yet irreversible can be aborted
manually with:

----
:; /usr/local/ups/sbin/ipp-os-shutdown -c
----

* the administrator can create a special file to abort the script
just before proceeding to irreversible shutdown; this is automated
in the `ipp-os-shutdown` script (undocumented option):

----
:; /usr/local/ups/sbin/ipp-os-shutdown block
----

Do not forget to remove this file when testing is completed to
allow actual shutdowns to happen:

----
:; /usr/local/ups/sbin/ipp-os-shutdown unblock
----

* also note that if the host is booted with an administrative
action while the remaining UPS battery runtime is under the
threshold set with `shutdown_duration`, an emergency powerfail
can be triggered by the `netxml-ups` driver as soon as IPP - Unix
services are initialized, even if the battery state is "CHARGING".

To avoid such shutdowns, an administrator can log in and quickly
create the special file described above (temporarily).

The recommended procedure is to wait for the hosts to boot up in
due time, when the batteries are charged enough to survive another
power failure (if one occurs) at least for as long as it takes to
shut down the server gracefully.