1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561
|
Getting Started with Linux-HA (heartbeat)
Intro
Let me preface this document by saying most of this is _not_ original work.
My purpose for writing this document is just trying to contribute in some
way to possibly help those who REALLY get things done. The "work" I am
contributing is mostly compiling bits and pieces from other HA documents
(such as Volker Wiegand's Hardware Installation Guide) into a document that
can help novices get started on HA without pestering Alan (like I did!) and
to cut down on repeat questions on the mailing list.
Getting Started
The first thing you'll need is two computers. You need not have identical
hardware in both machines (or amount of memory, etc.), but if you did, it
would make your life that much easier when a component fails.
Now you have to decide on some of your implementation. Your "cluster" is
established via a "heartbeat" between the two computers (nodes) generated by
the software package of the same name. However, this heartbeat needs one or
more media paths (serial via a null modem cable, ethernet via a crossover
cable, etc.) between the nodes.
At this point, you're actually ready to begin hardware-wise. Of course,
since you're looking into HA, you'll mostly likely want to avoid having only
one point of failure. In this case, that would be your null modem
cable/serial port or network interface card(NIC)/crossover cable. So, you
need to decide whether you wish to add a second serial/null modem connection
or a second network interface card (NIC)/crossover connnection to each
node. See Appendix A for instructions on how to build a Cat-5 crossover
cable. My heartbeat path setup uses one serial port and one extra NIC
because I only had one null modem cable, had an extra of NIC on hand and
thought it was good to have two medium types for the heartbeats.
Once your hardware is in order, you must install your OS and configure your
networking (I used Red Hat). Assuming you have 2 NICs, one should be
configured for your "normal" network and the other as a private network
between your clustered nodes (via the crossover cable). For an example, we
will assume that our cluster will have the following addresses:
Node 1 (linuxha1): 192.168.85.1 (normal 192x net)
10.0.0.1 (private 10x net for heartbeat)
Node 2 (linuxha2): 192.168.85.2 (192x)
10.0.0.2 (10x)
Note: None of these addresses should be your "cluster address" - the
address handled by heartbeat and failed over between nodes!
Most *nix distributions this easy during installation, however, if you are
having any problems, refer to either the Ethernet HOWTO, or the
documentation for your distribution. To check your configuration, type:
ifconfig
This will show your network interfaces and their configuration. You can
obtain your network routing information from "netstat -nr".
If it looks good, make sure you can ping between both nodes on all
interfaces.
Next, if you're using one, you'll need to test your serial connection. On
one node, which will be the receiver, type:
cat </dev/ttyS0
On the other node, type,:
echo hello >/dev/ttyS0
You should see the text on the receiver node. If it works, change their
roles and try again. If it doesn't, it may be as simple as having the wrong
device file. Volker's HA Hardware Guide and the Serial HOWTO are two good
resources for troubleshooting your serial connection.
Installing Heartbeat.
You can now install the heartbeat package. If you're reading this, you
already have it, but in any case it's available at:
[1]http://linux-ha.org/download
There are binary RPMs at the website, or you can build heartbeat from
source. Grab the tarball (or install the source RPM). Untar it into your
favorite source directory. From the top of the source tree, type
"./ConfigureMe configure", followed by "make" and "make install". If you
have problems installing the RPMs found at the website and want a way to
make your own, there may be help in the [2]FAQ.
Configuring Heartbeat
Configuring ha.cf
There are three files you will need to configure before starting up
heartbeat. First, is ha.cf. This will be placed in the /etc/ha.d directory
that is created after installation. It tells heartbeat what types of media
paths to use and how to configure them. The ha.cf in the source directory
contains all the various options you can use, I'll go through it line by
line...
serial /dev/ttyS0
Use a serial heartbeat - if you don't use a serial heartbeat, you
must use another medium, such as a bcast (ethernet) heartbeat.
Replace /dev/ttyS0 with the appropriate device file for your required
serial heartbeat.
watchdog /dev/watchdog
Optional. The watchdog function provides a way to have a system that
is still minimally functioning, but not providing a heartbeat, reboot
itself after a minute of being sick. This could help to avoid a
scenario where the machine recovers its heartbeat after being
pronounced dead. If that happened and a disk mount failed over, you
could have two nodes mounting a disk simultaneously. If you wish to
use this feature, then in addition to this line, you will need to
load the "softdog" kernel module and create the actual device file.
To do this, first type "insmod softdog" to load the module. Then,
type "grep misc /proc/devices" and note the number it reports (should
be 10). Next, type "cat /proc/misc | grep watchdog" and note that
number (should be 130). Now you can create the device file with that
info typing, "mknod /dev/watchdog c 10 130".
bcast eth1
Specifies to use a broadcast heartbeat over the eth1 interface
(replace with eth0, eth2, or whatever you use).
keepalive 2
Sets the time between heartbeats to 2 seconds.
warntime 10
Time in seconds before issuing a "late heartbeat" warning in the
logs.
deadtime 30
Node is pronounced dead after 30 seconds.
initdead 120
With some configurations, the network takes some time to start
working after a reboot. This is a separate "deadtime" to handle
that case. It should be at least twice the normal deadtime.
hopfudge 1
Optional. For ring topologies, number of hops allowed in addition to
the number of nodes in the cluster.
baud 19200
Speed at which to run the serial line (bps).
udpport 694
Use port number 694 for bcast or ucast communication. This is the
default, and the official IANA registered port number.
auto_failback on
Required. For those familiar with Tru64 Unix, heartbeat acts as if in
"favored member" mode. The master listed in the haresources
file holds all the resources until a failover, at which time
the slave takes over. When auto_failback is set to on once the
master comes back online, it will take everything back from the
slave. When set to off this option will prevent the master
node from re-acquiring cluster resources after a failover. This
option is similar to to the obsolete nice_failback option. If
you want to upgrade from a cluster which had nice_failback set
off, to this or later versions, special considerations apply in
order to want to avoid requiring a flash cut. Please see the
[3]FAQ for details on how to deal with this situation.
node linuxha1.linux-ha.org
Mandatory. Hostname of machine in cluster as described by `uname
-n`.
node linuxha2.linux-ha.org
Mandatory. Hostname of machine in cluster as described by `uname
-n`.
respawn userid cmd
Optional: Lists a command to be spawned and monitored. Eg: To
spawn ccm daemons the following line has to be added:
respawn hacluster /usr/lib/heartbeat/ccm
Informs heartbeat to spawn the command with the credentials of that
of userid (hacluster, in this example) and monitors the health of the
process, respawning it if dead. For ipfail, the line would be:
respawn hacluster /usr/lib/heartbeat/ipfail
NOTE: If the process dies with exit code 100, the process is not
respawned.
ping ping1.linux-ha.org ping2.linux-ha.org ....
Optional: Specify ping nodes. These nodes are not considered as
cluster nodes. They are used to check network connectivity for
modules like ipfail.
ping_group name ping1.linux-ha.org ping2.linux-ha.org ....
Optional: Specify a group ping nodes. These are the similar to ping
nodes, but if any node in a group is available then the group is
considered available. The group name can be any string and is used to
uniquely identify the group. Each group must appear on a separate
line. Like ping nodes the group is not considered to be a cluster
node. They appear to be the same as ping nodes and are used to check
network connectivity for modules like ipfail.
Configuring haresources
Once you've got your ha.cf set up, you need to configure haresources. This
file specifies the services for the cluster and who the default owner is.
Note: This file must be the same on both nodes!
For our example, we'll assume the high availability services are Apache and
Samba. The IP for the cluster is mandatory, and don't configure the cluster
IP outside of the haresources file!. The haresources will need one line:
linuxha1.linux-ha.org 192.168.85.3 httpd smb
So, this line dictates that on startup, have linuxha1 serve the IP
192.168.85.3 and start apache and samba as well.
On shutdown, heartbeat will first stop smb, then apache, then give up the
IP. This assumes that the command "uname -n" spits out
"linuxha1.linux-ha.org" - yours may well produce "linuxha1" and if it does,
use that instead!
Note: httpd and smb are the name of startup scripts for Apache and Samba,
respectively. Heartbeat will look for startup scripts of the same name in
the following paths:
/etc/ha.d/resource.d
/etc/rc.d/init.d
These scripts must start services via "scriptname start" and stop them via
"scriptname stop".
So you can use any services as long as they conform to the above standard.
Should you need to pass arguments to a custom script, the format would be:
scriptname::argument
So, if we added a service "maid" which needed the argument "vacuum", our
haresources line would modify to the following:
linuxha1 192.168.85.3 httpd smb maid::vacuum
This brings us to some added flexibility with the service IP address. We
are actually using a shorthand notation above. The actual line could have
read (we've canned the maid):
linuxha1 IPaddr::192.168.85.3 httpd smb
Where IPaddr is the name of our service script, taking the argument
192.168.85.3. Sure enough, if you look in the directory
/etc/ha.d/resource.d, you will find a script called IPaddr. This script
will also allow you to manipulate the netmask, broadcast address and base
interface of this IP service. To specify a subnet with 32 addresses, you
could define the service as (leaving off the IPaddr because we can!):
linuxha1 192.168.85.3/27 httpd smb
This sets the IP service address to 192.168.85.3, the netmask to
255.255.255.224 and the broadcast address would default to 192.168.85.31
(which is the highest address on the subnet). The last parameter you can
set is the broadcast address. To override the default and set it to
192.168.85.16, your entry would read:
linuxha1 192.168.85.3/27/192.168.85.16 httpd smb
You may be wondering whether any of the above is necessary for you. It
depends. If you've properly established a net route (independent of
heartbeat) for the service's IP address, with the correct netmask and
broadcast address, then no, it's not necessary for you. However, this case
won't fit everybody and that's why the option's there! In addition, you may
have more than one possible interface that could be used for the service
IP. Read on to see how heartbeat treats this...
Once you straighten out your haresources file, copy ha.cf and haresources to
/etc/ha.d and you're ready to start!
Configuring ipfail
The ipfail plugin attempts to provide detection of network failures, and
then intelligently react, directing the cluster to failover resources as
necessary. In order to accomplish this goal, it uses ping nodes or ping
groups which work as "dumb" third parties in the cluster. Provided both HA
nodes can communicate with each other, ipfail can reliably detect when one
of their network links has become unusable, and compensate.
To configure ipfail, the following steps must be performed.
1. Select good ping node candidates.
It is essential that good strategic ping nodes be selected. The better
your choices, the stronger your HA cluster becomes. Choosing solid
network devices like switches and routers is a good idea. Do not choose
either of the members of the HA cluster. Nor should you select someone's
workstation. It is also important to select ping nodes that reflect the
connectivity of your HA nodes. If you wish to monitor the connectivity
of two interfaces, it is wise to select a ping node for each interface,
that is reachable exclusively from said interface. Consult
[4]ipfail-diagram.pdf for a graphical representation of this idea.
2. Set auto_failback to on or off.
ipfail will only operate if heartbeat has been configured to something
other than legacy In ha.cf, set the auto_failback option to "on" or
"off" like so:
auto_failback on
or
auto_failback off
3. Configure your ha.cf to start ipfail.
Add a line like the following to ha.cf (assuming your compile PREFIX is
/usr)
respawn hacluster /usr/lib/heartbeat/ipfail
4. Add the ping nodes to ha.cf.
The ping nodes can be added to the cluster by using a line like the
following:
ping pnode1 pnode2 pnodeN
Simply replace pnode1, pnode2, ... pnodeN with the IP addresses of your
ping nodes.
Ensure that the above configuration directives are added to the ha.cf on
both members of the cluster, and that they are identical.
NOTE: You will want to check on the availability of the ping nodes prior
to using them. If you cannot ping them from both of the HA nodes, they are
useless.
Selecting an Interface
One important aspect of configuring the haresources file for a machine which
has multiple ethernet interfaces is to know how heartbeat selects which
interface will wind up supporting the service addresses that are configured
in haresources. After all, no interface was specified in the haresources
file.
Heartbeat decides which interface will be used by looking at the routing
table. It tries to select the lowest cost route to the IP address to be
taken over. In the case of a tie, it chooses the first route found. For
most configurations this means the default route will be least preferred.
If you don't specify a netmask for the IP address in the haresources file,
the netmask associated with the selected route will be used. Simmilarly, if
an interface is not specivied, then the virtual ip address will be added to
the interface associated with the selected route. If the broadcast address
is omitted then the hightest address in the subnet is used.
Configuring Authkeys
The third file to configure determines your authentication keys. There are
three types of authentication methods available: crc, md5, and sha1.
"Well, which should I use?", you ask. Since this document is called
"Getting Started", we'll keep it simple......
If your heartbeat runs over a secure network, such as the crossover cable in
our example, you'll want to use crc. This is the cheapest method from a
resources perspective. If the network is insecure, but you're either not
very paranoid or concerned about minimizing CPU resources, use md5.
Finally, if you want the best authentication without regard for CPU
resources, use sha1. It's the hardest to crack.
The format of the file is as follows:
auth <number>
<number> <authmethod> [<authkey>]
SO, for sha1, a sample /etc/ha.d/authkeys could be:
auth 1
1 sha1 key-for-sha1-any-text-you-want
For md5, you could use the same as the above, but replace "sha1" with "md5".
Finally, for crc, a sample might be:
auth 2
2 crc
Whatever index you put after the keyword auth must be found below in the
keys listed in the file. If you put "auth 4", then there must be an "4
signaturetype" line in the list below.
Make sure its permissions are safe, like 600. And "any text you want" is
not quite right. There's a limit to the number of characters you can use.
That's it!
Starting and testing heartbeat
From Red Hat, or other distributions which use /etc/init.d startup files,
simply type /etc/init.d/heartbeat start on both nodes. I would recommend
starting on the system master (in our example linuxha1) first.
If you want heartbeat to run on startup, what to do will differ on your
distribution. You may need to place links to the startup script in the
appropriate init level directories, but the RPM versions will do this for
you. I have heartbeat start at its default sequential priority (75, which
means it starts after services 74 and lower and before services with
priority 76-99), end at its default sequential priority (05), and only care
about the 0(halt), 6(reboot), 3(text-only), 5(X) run levels.
So, if I had to do it by hand, I'd need to type in the following (as root,
of course):
cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeat K05heartbeat
cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeat S75heartbeat
cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeat S75heartbeat
cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeat K05heartbeat
The last time I ran slackware, there was no /etc/rc.d/init.d directory (may
have changed by now) and to do the same thing, I would have placed in
/etc/rc.d/rc.local:
/etc/ha.d/heartbeat start
***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat. If you
can't find /etc/rc.d/init.d with your distribution and you're unsure of how
processes start, you can use the rc.local method. But you're on your own
for shutdown, I just don't remember...
Note: If you use the watchdog function, you'll need to load its module at
bootup as well. You can put the following command at the bottom of the
/etc/rc.d/rc.sysinit file:
/sbin/insmod softdog
For the rc.local method, just put the same line right above where you start
heartbeat.
Once you've started heartbeat, take a peek at your log file (default is
/var/log/ha-log) before testing it. If all is peachy, the service owner's
log (linuxha1 in our example) should look something like this:
heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacility found.
heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to /var/log/ha-log
heartbeat: 2003/02/10_13:52:22 info: **************************
heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Starting
heartbeat 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.
heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17
heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty
/dev/ttyS0 (19200 baud)
heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started on port
694 (694) interface eth1
heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.
heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device: /dev/watchdog
heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'
heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.
heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead
heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.org held no
resources.
heartbeat: 2003/02/10_13:53:23 info: Resources being acquired from
linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down:
nice_failback: acquiring foreign resources
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete for node
linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group:
linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/resource.d/IPaddr
192.168.85.3 start
heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3
netmask 255.255.255.0 broadcast 192.168.85.255
heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for 192.168.85.3
on eth0:0 [eth0]
heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:24 info: Running /etc/ha.d/resource.d/datadisk
drbd0 start
heartbeat: 2003/02/10_13:53:24 info: Running /etc/ha.d/resource.d/datadisk
drbd1 start
heartbeat: 2003/02/10_13:53:25 info: Running /etc/ha.d/resource.d/mirror
start
heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.
heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisition completed.
(none)
heartbeat: 2003/02/10_13:53:33 info: local resource transition completed.
heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:56:30 info: Status update for node
linuxha2.linux-ha.org: status up
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2003/02/10_13:56:30 info: Status update for node
linuxha2.linux-ha.org: status active
heartbeat: 2003/02/10_13:56:30 info: remote resource transition completed.
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2003/02/10_13:56:31 info: Link linuxha2.linux-ha.org:/dev/ttyS0
up.
NOTE: Your log may differ depending on when you started heartbeat on
linuxha2!!! I started heartbeat on the linuxha2 @13:56:30...
_____________________________________
OK, now try to ping your cluster's IP (192.168.85.3 in the example). If this
works, ssh to it and verify you're on linuxha1.
Next, make sure your services are tied to the .3 address. Bring up netscape
and type in 192.168.85.3 for the URL. For Samba, try to map the drive
"\\192.168.85.3\test" assuming you set up a share called "test". See Samba
docs to get that going. As an aside, however, you'll want to use the
"netbios name" parameter to have your Samba share listed under the cluster
name and not the hostname of your cluster member!
NOTE: If you can't bring up the service IP address and you get ha-log
entries similar to this:
SIOCSIFADDR: No such device
SIOCSIFFLAGS: No such device
SIOCSIFNETMASK: No such device
SIOCSIFBRDADDR: No such device
SIOCSIFFLAGS: No such device
SIOCADDRT: No such device
It may mean that you need to enable IP aliasing in your kernel build.
Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if you don't have it,
you'll have the line "CONFIG_IP_ALIAS is not set". Rebuild your kernel
with IP aliasing enabled.
If this all works, you've got availability. Now let's see if we have High
Availability :-)
Take down linuxha1. Kill power, kill heartbeat, whatever you have the
stomach for, but don't just yank both the serial and eth1 heartbeat cables.
If you do that, you'll have services running on both nodes and when you
re-connect the heartbeat, a bit of chaos....
Now ping the cluster IP. Approximately 5-10 seconds later it should start
responding again. Telnet again and verify you're on linuxha2. If it happens
but takes more like 30 seconds, something is wrong.
If you get this far, it's probably working, but you should probably check
all your heartbeats, too.
First, check your serial heartbeat. Unplug the crossover cable from your
eth1 NIC that you're using for your bcast heartbeat. Wait about 10 seconds.
Now, look at /var/log/ha-log on linuxha2 and make sure there's no line like
this:
1999/08/16_12:40:58 node linuxha1.linux-ha.org: is dead
If you get that, your serial heartbeat isn't working and your second node is
taking over. To avoid any problems, shut down heartbeat on the first node,
then test your null modem cable. Run the above serial tests again.
If your log is clean, great. Re-connect the crossover cable. Once that's
done, disconnect the serial cable, wait 10 seconds and check the linuxha2
log again.
If it's clean, congrats! If not, you can check /var/log/ha-log and
/var/log/ha-debug for more clues.
Appendix A - Ethernet Crossover Cable Construction
Your cable diagram should be as follows:
Connector A Connector B
Connector A Connector B
Pin # Pin #
1 3
2 6
3 1
6 2
4 7
5 8
7 4
8 5
Rev 1.2.0
(c) 2003 Rudy Pawul
[5]rpawul@iso-ne.com
References
1. http://linux-ha.org/download
2. file://localhost/tmp/tmp.jJPS5052/linux-ha/doc/faqntips.html
3. http://linux-ha.org/download/faqnstuff.html
4. file://localhost/tmp/tmp.jJPS5052/linux-ha/doc/ipfail-diagram.pdf
5. mailto:rpawul@iso-ne.com
|