1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254
|
= README for the ipp-os-shutdown and early shutdown features in IPP - UNIX
== General information
This version of IPP - Unix includes enhanced scripts and configuration
which enable to configure "early shutdown" functionality on chosen hosts.
This allows selected systems to power themselves off after a certain
time that less than `$MINSUPPLIES` power sources are protected by UPSes
that are fully "ONLINE".
In order for this to happen, a new configuration variable was introduced
in `ipp.conf` file, the `SHUTDOWN_TIMER` which allows to specify the
number of minutes that the power protection is insufficient, after which
the irreversible shutdown of this host begins. Default value for this
variable is `-1` which retains the standard behavior of staying up as
long as possible, and shutting down only when the UPS sends an alert
that too little battery runtime remains (as configured by default or
customized by `shutdown_duration` option for the `netxml-ups` driver).
[NOTE]
NOTE: If this feature is used, it is recommended to specify the timeout
for an early-shutdown feature of at least 5 minutes (`SHUTDOWN_TIMER=5`).
Reason: When an NMC in the UPS is rebooted, it can announce dummy values
about the UPS and battery status for some time, while it is collecting
real data from the UPS hardware, and this data can cause a protected host
to begin an early shutdown needlessly. A sufficiently long `SHUTDOWN_TIMER`
value allows the host to receive real data from the UPS and so to cancel
the pending shutdown (if the battery is indeed well charged and the UPS
is really online).
Another added option is `POWERDOWNFLAG_USER` which may be pre-set to
`enforce` or `forbid` to enable or disable UPS power-cycling at the
end of the shutdown procedure. Generally it does not need to be pre-set
in the `ipp.conf` file as the script relies on the `POWERDOWNFLAG` file
managed by `upsmon` (this in turn depends on whether `upsmon` triggered
the shutdown due to an alarm, such as low-battery condition, or just
the delayed early shutdown was used as scheduled by "notification").
However it is possible to configure a specific behavior on this host,
e.g. to be sure to avoid power-cycling when early shutdown hosts are
shutting down due to whatever powerfail-driven reason.
These options should be configured by an administrator of the particular
host in the `/usr/local/ups/etc/ipp.conf` file.
When UPS events are processed by `upsmon` with its configured `NOTIFYCMD`
(the packaged `ipp-notifier.sh` script, enhanced for this delivery) it
has an ability to detect how many UPSes are `ONLINE` and how many are
required by the `MINSUPPLIES` setting. UPSes whose state is currently
`unknown` are not considered, in order to avoid erroneous shutdowns
while the communications are just starting up. If the protection is
deemed insufficient, the new `ipp-os-shutdown` script is launched,
which can invoke a customized copy of `ipp-host-shutdown.sample` script
for host-specific procedures (such as clusterware shutdown).
The `ipp-os-shutdown` script has several roles:
* it manages the delayed shutdown (which can be canceled before the timer
expires, and can not be aborted after the timer expires - so it proceeds
to the end)
* it wraps the call to a customized shutdown script, if one is configured
and detected
* it detects whether the UPSes should be told to power-cycle or not,
based on the `killpower` flag-file from `upsmon`, a setting in `ipp.conf`
file, or an explicit request from the caller of this shutdown script
* in the end it detects whether the UPSes came back online, so the host
should try to `reboot` rather than `poweroff` (unless a specific action
was requested by the caller of this shutdown script) and calls the OS
shutdown program to complete this activity
* the same script (with an option of zero delay) is configured as the
`SHUTDOWNCMD` in `upsmon.conf` so the same logic is executed in all
the different supported shutdown scenarios.
It is recommended to configure certain `SHUTDOWN_TIMER` values for the
hosts which should shut down early and leave more battery runtime power
remaining for the more important hosts. Note that if the external power
returns after the early-shutdown hosts have powered off, they will stay
down until an administrator boots them. However, if the external power
becomes sufficient again during the shutdown procedure (checked between
cluster-ware shutdown and the OS shutdown steps) then a reboot of the
host is requested instead of a power-off.
It is recommended to configure `SHUTDOWN_TIMER=-1` (default) on those
more important hosts which should stay up as long as possible and only
shut down if all required UPSes have posted a low-battery status or
forced-shutdown command. These hosts would by default schedule delayed
UPS power-cycling. To be on the safe side, sufficient `shutdown_duration`
seconds should be configured in their `netxml-ups` driver blocks in the
`/usr/local/ups/etc/ups.conf` file on the host.
If the customer elects to use the early shutdown strategy for all hosts,
the `POWERDOWNFLAG_USER=enforce` should be configured in the hosts with
the highest `SHUTDOWN_TIMER` value so they would cause UPS power-cycling.
Do not forget to define sufficient `DELAY` value for the OS shutdown to
complete before the UPS turns itself off. The UPS would turn on the load
automatically after some time, when external power is back and it has
charged the battery sufficiently (configurable in the Network Management
web-interface for the UPS).
== Installation
* Install the package as usual
- Un-compress the `ipp-*.tar.gz` archive for your OS
- Change into the resulting `ipp-*` subdirectory
- (optional) If a complete re-installation is desired (including removal
of old configuration files so they do not conflict with the new delivery),
please also execute `IPP_WIPE_OLD_CONFIG=yes; export IPP_WIPE_OLD_CONFIG`
in the shell before running the installation script
- Launch the `install.sh` script and follow the installer instructions
* Configure the early shutdown timer
You can later change the early shutdown timer, for the cluster nodes,
by editing `/usr/local/ups/etc/ipp.conf`, and set `SHUTDOWN_TIMER` to
a suitable value (in minutes).
* Configure clusterware shutdown command
You can configure the variable `SHUTDOWNSCRIPT_CUSTOM` in `ipp.conf` to
point at a custom complementary shutdown script with a shutdown routine
required by this particular host, which will be called by the master
powerfail shutdown script (`/usr/local/ups/sbin/ipp-os-shutdown`).
This variable is not set by default.
* Configure operating system shutdown command and options
Optionally, modify the operating system shutdown command and type.
To modify the shutdown command or specific option for the various types
of shutdown), edit the file `/usr/local/ups/etc/ipp-os-shutdown.conf`
and set or adapt the variables:
- `CMD_SHUTDOWN` to point at the shutdown command. This may include
the basic mandatory option options, such as the non-interactive flag
(`-y`) on some OS such as HP-UX or Solaris,
- `SDFLAG_*` to point at the right option for poweroff, reboot or halt.
Note that depending on the OS and hardware features, there may be no
difference between "halt" (stop the OS and keep the hardware running)
and a "poweroff" (halt and instruct the server's PSU to cut the power
going to the motherboard -- if that is supported; might be unavailable
in e.g. virtualized environments or older hardware).
To modify the default shutdown option to halt or reboot, edit the file
`/usr/local/ups/etc/ipp.conf` and set `SDFLAG_POWERSTATE_DEFAULT` to
either `$SDFLAG_HALT` (halt) or `$SDFLAG_REBOOT` (reboot).
* (optionally) Configure UPS power-cycling
If the customer elects to use the early shutdown strategy for all hosts,
the `POWERDOWNFLAG_USER=enforce` should be configured in the `ipp.conf`
file on hosts with the highest `SHUTDOWN_TIMER` value so they would cause
UPS power-cycling explicitly. By default, it may be enabled or forbidden
depending on the cause of shutdown.
Make sure to define a sufficient `DELAY` value (in seconds) as well, for
the OS shutdown to complete safely before the UPS(es) power is cut.
== Testing
The following points about this delivery were verified on AIX 7.1 and
Solaris 11. It was tested:
- to be installable (and runnable via init-script)
- to work as a `SHUTDOWNCMD` handler, including interaction with the
killpower flag-file maintained by `upsmon` (with the default setting
of `SHUTDOWN_TIMER=-1` in the `ipp.conf` file)
- to report progress of the powerfail shutdown onto the system consoles
(with `wall`) and into the `syslog` (note: default syslog priority from
IPP - Unix is `user.notice` which may be ignored by default `syslog`
configuration, check with `/etc/syslog.conf` or equivalent in your OS).
- to execute early shutdown when `SHUTDOWN_TIMER=5` (after 5 minutes
ONBATT) is defined manually in the `ipp.conf` file
- to cancel shutdown if power returns back before timeout expires
- to not cancel shutdown if power returns back after timeout expires
and shutdown routine has started (and reported to be irreversible)
- to report that the shutdown is in irreversible stage, as reaction
to CTRL+C during console-run invocations like `ipp-os-shutdown -t now`
- to execute custom shutdown script if it is found, and to skip it
if not available
- to power-off the protected host if power remains lost at the moment
when we are about to proceed to `/sbin/shutdown`
- to reboot the host if power is already back when we are about to
proceed to `/sbin/shutdown`
- to set the `netxml-ups` driver argument `shutdown_timer` during
installation to a value which matches the chosen `SHUTDOWN_TIMER`
(if it is non-negative) so the selected ONBATT timeout is shown in
the Eaton NMC Web-GUI
- to implement numerous fixes and improvements in the `install.sh`
script, including integration of new settings for early shutdown
and UPS power-cycling strategy
=== A few important notes helpful during testing
* currently running IPP - Unix processes, UPS states and the pending
shutdown status can be queried with the following command:
----
:; ps -ef | grep -v grep | egrep 'ipp|ups|nut|shut|sleep' ; \
ls -la /usr/local/ups/etc/killpower ; \
/usr/local/ups/bin/ipp-status; \
/usr/local/ups/sbin/ipp-os-shutdown -s; date
----
* a pending shutdown that is not yet irreversible can be aborted
manually with:
----
:; /usr/local/ups/sbin/ipp-os-shutdown -c
----
* the administrator can create a special file to abort the script
just before proceeding to irreversible shutdown; this is automated
in the `ipp-os-shutdown` script (undocumented option):
----
:; /usr/local/ups/sbin/ipp-os-shutdown block
----
Do not forget to remove this file when testing is completed to
allow actual shutdowns to happen:
----
:; /usr/local/ups/sbin/ipp-os-shutdown unblock
----
* also note that if the host is booted with an administrative
action while the remaining UPS battery runtime is under the
threshold set with `shutdown_duration`, an emergency powerfail
can be triggered by the `netxml-ups` driver as soon as IPP - Unix
services are initialized, even if the battery state is "CHARGING".
To avoid such shutdowns, an administrator can log in and quickly
create the special file described above (temporarily).
The recommended procedure is to wait for the hosts to boot up in
due time, when the batteries are charged enough to survive another
power failure (if one occurs) at least for as long as it takes to
shut down the server gracefully.
|