1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290
|
@sbd
Feature: configure sbd delay start correctly
Tag @clean means need to stop cluster service if the service is available
@clean
Scenario: disk-based SBD with small sbd_watchdog_timeout
Given Run "test -f /etc/crm/profiles.yml" OK
Given Yaml "default:corosync.totem.token" value is "5000"
Given Yaml "default:sbd.watchdog_timeout" value is "15"
Given Has disk "/dev/sda1" on "hanode1"
Given Cluster service is "stopped" on "hanode1"
When Run "crm cluster init -s /dev/sda1 -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
And Service "sbd" is "started" on "hanode1"
And Resource "stonith-sbd" type "fence_sbd" is "Started"
And SBD option "SBD_DELAY_START" value is "no"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "30"
# original value is 43, which is calculated by external/sbd RA
# now fence_sbd doesn't calculate it, so this value is the default one
# from pacemaker
And Cluster property "stonith-timeout" is "60"
And Parameter "pcmk_delay_max" not configured in "stonith-sbd"
Given Has disk "/dev/sda1" on "hanode2"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
And Service "sbd" is "started" on "hanode2"
# SBD_DELAY_START >= (token + consensus + pcmk_delay_max + msgwait) # for disk-based sbd
And SBD option "SBD_DELAY_START" value is "71"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "30"
# value_from_sbd >= 1.2 * msgwait # for disk-based sbd
# stonith_timeout >= max(value_from_sbd, constants.STONITH_TIMEOUT_DEFAULT) + token + consensus
And Cluster property "stonith-timeout" is "71"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
Given Has disk "/dev/sda1" on "hanode3"
Given Cluster service is "stopped" on "hanode3"
When Run "crm cluster join -c hanode1 -y" on "hanode3"
Then Cluster service is "started" on "hanode3"
And Service "sbd" is "started" on "hanode3"
# SBD_DELAY_START >= (token + consensus + pcmk_delay_max + msgwait) # for disk-based sbd
# runtime value is "41", we keep the larger one here
And SBD option "SBD_DELAY_START" value is "71"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "30"
# value_from_sbd >= 1.2 * msgwait # for disk-based sbd
# stonith_timeout >= max(value_from_sbd, constants.STONITH_TIMEOUT_DEFAULT) + token + consensus
# runtime value is "71", we keep ther larger one here
And Cluster property "stonith-timeout" is "71"
And Parameter "pcmk_delay_max" not configured in "stonith-sbd"
When Run "crm cluster remove hanode3 -y" on "hanode1"
Then Cluster service is "stopped" on "hanode3"
And Service "sbd" is "stopped" on "hanode3"
And SBD option "SBD_DELAY_START" value is "71"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "30"
And Cluster property "stonith-timeout" is "71"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
@clean
Scenario: disk-less SBD with small sbd_watchdog_timeout
Given Run "test -f /etc/crm/profiles.yml" OK
Given Yaml "default:corosync.totem.token" value is "5000"
Given Yaml "default:sbd.watchdog_timeout" value is "15"
Given Cluster service is "stopped" on "hanode1"
When Run "crm cluster init -S -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
And SBD option "SBD_DELAY_START" value is "no"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "15"
And Cluster property "stonith-timeout" is "60"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
# SBD_DELAY_START >= (token + consensus + 2*SBD_WATCHDOG_TIMEOUT) # for disk-less sbd
And SBD option "SBD_DELAY_START" value is "41"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "15"
# stonith-timeout >= 1.2 * max(stonith_watchdog_timeout, 2*SBD_WATCHDOG_TIMEOUT) # for disk-less sbd
# stonith_timeout >= max(value_from_sbd, constants.STONITH_TIMEOUT_DEFAULT) + token + consensus
And Cluster property "stonith-timeout" is "71"
Given Cluster service is "stopped" on "hanode3"
When Run "crm cluster join -c hanode1 -y" on "hanode3"
Then Cluster service is "started" on "hanode3"
And SBD option "SBD_DELAY_START" value is "41"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "15"
And Cluster property "stonith-timeout" is "71"
When Run "crm cluster remove hanode3 -y" on "hanode1"
Then Cluster service is "stopped" on "hanode3"
And SBD option "SBD_DELAY_START" value is "41"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "15"
And Cluster property "stonith-timeout" is "71"
@clean
Scenario: disk-based SBD with big sbd_watchdog_timeout
When Run "sed -i 's/watchdog_timeout: 15/watchdog_timeout: 60/' /etc/crm/profiles.yml" on "hanode1"
Given Yaml "default:corosync.totem.token" value is "5000"
Given Yaml "default:sbd.watchdog_timeout" value is "60"
Given Has disk "/dev/sda1" on "hanode1"
Given Cluster service is "stopped" on "hanode1"
When Run "crm cluster init -s /dev/sda1 -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
And Service "sbd" is "started" on "hanode1"
And Resource "stonith-sbd" type "fence_sbd" is "Started"
And SBD option "SBD_DELAY_START" value is "no"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "120"
# original value is 172, which is calculated by external/sbd RA
# now fence_sbd doesn't calculate it, so this value is the default one
# from pacemaker
And Cluster property "stonith-timeout" is "60"
And Parameter "pcmk_delay_max" not configured in "stonith-sbd"
Given Has disk "/dev/sda1" on "hanode2"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
And Service "sbd" is "started" on "hanode2"
# SBD_DELAY_START >= (token + consensus + pcmk_delay_max + msgwait) # for disk-based sbd
And SBD option "SBD_DELAY_START" value is "161"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "120"
# stonith-timeout >= 1.2 * msgwait # for disk-based sbd
# stonith_timeout >= max(value_from_sbd, constants.STONITH_TIMEOUT_DEFAULT) + token + consensus
And Cluster property "stonith-timeout" is "155"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
# since SBD_DELAY_START value(161s) > default systemd startup value(1min 30s)
And Run "test -f /etc/systemd/system/sbd.service.d/sbd_delay_start.conf" OK
# 1.2*SBD_DELAY_START
And Run "grep 'TimeoutSec=193' /etc/systemd/system/sbd.service.d/sbd_delay_start.conf" OK
Given Has disk "/dev/sda1" on "hanode3"
Given Cluster service is "stopped" on "hanode3"
When Run "crm cluster join -c hanode1 -y" on "hanode3"
Then Cluster service is "started" on "hanode3"
And Service "sbd" is "started" on "hanode3"
And SBD option "SBD_DELAY_START" value is "161"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "120"
And Cluster property "stonith-timeout" is "155"
And Parameter "pcmk_delay_max" not configured in "stonith-sbd"
And Run "test -f /etc/systemd/system/sbd.service.d/sbd_delay_start.conf" OK
And Run "grep 'TimeoutSec=193' /etc/systemd/system/sbd.service.d/sbd_delay_start.conf" OK
When Run "crm cluster remove hanode3 -y" on "hanode1"
Then Cluster service is "stopped" on "hanode3"
And Service "sbd" is "stopped" on "hanode3"
And SBD option "SBD_DELAY_START" value is "161"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "120"
And Cluster property "stonith-timeout" is "155"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
And Run "test -f /etc/systemd/system/sbd.service.d/sbd_delay_start.conf" OK
And Run "grep 'TimeoutSec=193' /etc/systemd/system/sbd.service.d/sbd_delay_start.conf" OK
When Run "sed -i 's/watchdog_timeout: 60/watchdog_timeout: 15/g' /etc/crm/profiles.yml" on "hanode1"
@clean
Scenario: Add sbd via stage on a running cluster
Given Run "test -f /etc/crm/profiles.yml" OK
Given Yaml "default:corosync.totem.token" value is "5000"
Given Yaml "default:sbd.watchdog_timeout" value is "15"
Given Has disk "/dev/sda1" on "hanode1"
Given Has disk "/dev/sda1" on "hanode2"
Given Cluster service is "stopped" on "hanode1"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster init -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
When Run "crm cluster init sbd -s /dev/sda1 -y" on "hanode1"
Then Service "sbd" is "started" on "hanode1"
Then Service "sbd" is "started" on "hanode2"
And SBD option "SBD_DELAY_START" value is "71"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "30"
And Cluster property "stonith-timeout" is "71"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
@clean
Scenario: Add disk-based sbd with qdevice
Given Run "test -f /etc/crm/profiles.yml" OK
Given Yaml "default:corosync.totem.token" value is "5000"
Given Yaml "default:sbd.watchdog_timeout" value is "15"
Given Has disk "/dev/sda1" on "hanode1"
Given Has disk "/dev/sda1" on "hanode2"
Given Cluster service is "stopped" on "hanode1"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster init -s /dev/sda1 --qnetd-hostname=qnetd-node -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
And Service "corosync-qdevice" is "started" on "hanode1"
And Service "corosync-qdevice" is "started" on "hanode2"
And Service "sbd" is "started" on "hanode1"
And Service "sbd" is "started" on "hanode2"
And SBD option "SBD_DELAY_START" value is "41"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "5"
And SBD option "msgwait" value for "/dev/sda1" is "30"
And Cluster property "stonith-timeout" is "71"
And Parameter "pcmk_delay_max" not configured in "stonith-sbd"
@clean
Scenario: Add disk-less sbd with qdevice
Given Run "test -f /etc/crm/profiles.yml" OK
Given Yaml "default:corosync.totem.token" value is "5000"
Given Yaml "default:sbd.watchdog_timeout" value is "15"
Given Cluster service is "stopped" on "hanode1"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster init -S --qnetd-hostname=qnetd-node -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
And Service "corosync-qdevice" is "started" on "hanode1"
And Service "corosync-qdevice" is "started" on "hanode2"
And Service "sbd" is "started" on "hanode1"
And Service "sbd" is "started" on "hanode2"
And SBD option "SBD_DELAY_START" value is "81"
And SBD option "SBD_WATCHDOG_TIMEOUT" value is "35"
And Cluster property "stonith-timeout" is "95"
And Cluster property "stonith-watchdog-timeout" is "70"
@clean
Scenario: Add and remove qdevice from cluster with sbd running
Given Cluster service is "stopped" on "hanode1"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster init -s /dev/sda1 -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
And Service "sbd" is "started" on "hanode1"
And Service "sbd" is "started" on "hanode2"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
When Run "crm cluster init qdevice --qnetd-hostname=qnetd-node -y" on "hanode1"
Then Service "corosync-qdevice" is "started" on "hanode1"
And Service "corosync-qdevice" is "started" on "hanode2"
And Parameter "pcmk_delay_max" not configured in "stonith-sbd"
When Run "crm cluster remove --qdevice -y" on "hanode1"
Then Service "corosync-qdevice" is "stopped" on "hanode1"
And Service "corosync-qdevice" is "stopped" on "hanode2"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
@clean
Scenario: Test priority-fence-delay and priority
Given Cluster service is "stopped" on "hanode1"
Given Cluster service is "stopped" on "hanode2"
When Run "crm cluster init -y" on "hanode1"
Then Cluster service is "started" on "hanode1"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
And Property "priority" in "rsc_defaults" is "1"
When Run "crm cluster remove hanode2 -y" on "hanode1"
Then Cluster service is "stopped" on "hanode2"
And Property "priority" in "rsc_defaults" is "0"
When Run "crm cluster join -c hanode1 -y" on "hanode2"
Then Cluster service is "started" on "hanode2"
And Property "priority" in "rsc_defaults" is "1"
When Run "crm cluster init qdevice --qnetd-hostname=qnetd-node -y" on "hanode1"
Then Service "corosync-qdevice" is "started" on "hanode1"
And Service "corosync-qdevice" is "started" on "hanode2"
And Property "priority" in "rsc_defaults" is "0"
When Run "crm cluster remove --qdevice -y" on "hanode1"
Then Service "corosync-qdevice" is "stopped" on "hanode1"
And Service "corosync-qdevice" is "stopped" on "hanode2"
And Property "priority" in "rsc_defaults" is "1"
When Run "crm cluster init sbd -s /dev/sda1 -y" on "hanode1"
Then Service "sbd" is "started" on "hanode1"
And Service "sbd" is "started" on "hanode2"
And Parameter "pcmk_delay_max" configured in "stonith-sbd"
And Cluster property "stonith-timeout" is "71"
And Cluster property "priority-fencing-delay" is "60"
When Run "crm cluster remove hanode2 -y" on "hanode1"
Then Cluster service is "stopped" on "hanode2"
And Property "priority" in "rsc_defaults" is "0"
And Cluster property "priority-fencing-delay" is "0"
And Parameter "pcmk_delay_max" not configured in "stonith-sbd"
|