File: GettingStarted.txt

package info (click to toggle)
heartbeat-2 2.0.7-2
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 16,732 kB
  • ctags: 13,635
  • sloc: ansic: 137,128; sh: 24,241; perl: 2,430; makefile: 2,127; yacc: 140; lex: 105; python: 39
file content (561 lines) | stat: -rw-r--r-- 27,737 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561

                   Getting Started with Linux-HA (heartbeat)

Intro

   Let me preface this document by saying most of this is _not_ original work.
   My purpose for writing this document is just trying to contribute in some
   way to possibly help those who REALLY get things done.  The "work" I am
   contributing is mostly compiling bits and pieces from other HA documents
   (such as Volker Wiegand's Hardware Installation Guide) into a document that
   can help novices get started on HA without pestering Alan (like I did!) and
   to cut down on repeat questions on the mailing list.


Getting Started

   The first thing you'll need is two computers.  You need not have identical
   hardware in both machines (or amount of memory, etc.), but if you did, it
   would make your life that much easier when a component fails.

   Now you have to decide on some of your implementation.  Your "cluster" is
   established via a "heartbeat" between the two computers (nodes) generated by
   the software package of the same name.  However, this heartbeat needs one or
   more media paths (serial via a null modem cable, ethernet via a crossover
   cable, etc.) between the nodes.

   At this point, you're actually ready to begin hardware-wise.  Of course,
   since you're looking into HA, you'll mostly likely want to avoid having only
   one point of failure.  In this case, that would be your null modem
   cable/serial port or network interface card(NIC)/crossover cable.  So, you
   need to decide whether you wish to add a second serial/null modem connection
   or a second network interface card (NIC)/crossover connnection to each
   node.  See Appendix A for instructions on how to build a Cat-5 crossover
   cable.  My heartbeat path setup uses one serial port and one extra NIC
   because I only had one null modem cable, had an extra of NIC on hand and
   thought it was good to have two medium types for the heartbeats.

   Once your hardware is in order, you must install your OS and configure your
   networking (I used Red Hat).  Assuming you have 2 NICs, one should be
   configured for your "normal" network and the other as a private network
   between your clustered nodes (via the crossover cable).  For an example, we
   will assume that our cluster will have the following addresses:

   Node 1 (linuxha1):   192.168.85.1  (normal 192x net)
                        10.0.0.1 (private 10x net for heartbeat)
   Node 2 (linuxha2):   192.168.85.2  (192x)
                        10.0.0.2  (10x)
   Note:  None of these addresses should be your "cluster address" - the
   address handled by heartbeat and failed over between nodes!

   Most *nix distributions this easy during installation, however, if you are
   having any problems, refer to either the Ethernet HOWTO, or the
   documentation for your distribution.     To check your configuration, type:

            ifconfig

   This will show your network interfaces and their configuration.  You can
   obtain your network routing information from "netstat -nr".

   If it looks good, make sure you can ping between both nodes on all
   interfaces.

   Next, if you're using one, you'll need to test your serial connection.  On
   one node, which will be the receiver, type:
              cat </dev/ttyS0

   On the other node, type,:
              echo hello >/dev/ttyS0

   You should see the text on the receiver node.  If it works, change their
   roles and try again.  If it doesn't, it may be as simple as having the wrong
   device file.  Volker's HA Hardware Guide and the Serial HOWTO are two good
   resources for troubleshooting your serial connection.

Installing Heartbeat.

   You can now install the heartbeat package.  If you're reading this, you
   already have it, but in any case it's available at:

          [1]http://linux-ha.org/download

   There are binary RPMs at the website, or you can build heartbeat from
   source.  Grab the tarball (or install the source RPM).  Untar it into your
   favorite source directory.   From the top of the source tree, type
   "./ConfigureMe configure", followed by "make" and "make install".  If you
   have problems installing the RPMs found at the website and want a way to
   make your own, there  may be help in the [2]FAQ.

Configuring Heartbeat

   Configuring ha.cf
   There are three files you will need to configure before starting up
   heartbeat.  First, is ha.cf.  This will be placed in the /etc/ha.d directory
   that is created after installation.  It tells heartbeat what types of media
   paths to use and how to configure them.   The ha.cf in the source directory
   contains all the various options you can use, I'll go through it line by
   line...

   serial /dev/ttyS0
          Use a serial heartbeat - if you don't use a serial heartbeat, you
          must use another medium, such as a bcast (ethernet) heartbeat.
          Replace /dev/ttyS0 with the appropriate device file for your required
          serial heartbeat.

   watchdog /dev/watchdog
          Optional.  The watchdog function provides a way to have a system that
          is still minimally functioning, but not providing a heartbeat, reboot
          itself after a minute of being sick.  This could help to avoid a
          scenario where the machine recovers its heartbeat after being
          pronounced dead.  If that happened and a disk mount failed over, you
          could have two nodes mounting a disk simultaneously. If you wish to
          use this feature, then in addition to this line, you will need to
          load the "softdog" kernel module and create the actual device file.
          To do this, first type "insmod softdog" to load the module. Then,
          type "grep misc /proc/devices" and note the number it reports (should
          be 10).  Next, type "cat /proc/misc | grep watchdog" and note that
          number (should be 130).  Now you can create the device file with that
          info typing, "mknod /dev/watchdog c 10 130".

   bcast eth1
          Specifies to use a broadcast heartbeat over the eth1 interface
          (replace with eth0, eth2, or whatever you use).

   keepalive 2
          Sets the time between heartbeats to 2 seconds.

   warntime 10
          Time in seconds before issuing a "late heartbeat" warning in the
          logs.

   deadtime 30
          Node is pronounced dead after 30 seconds.

   initdead 120
          With some configurations, the network takes some time to start
          working after a reboot.   This is a separate "deadtime" to handle
          that case.  It should be at least twice the normal deadtime.

   hopfudge 1
          Optional.  For ring topologies, number of hops allowed in addition to
          the number of nodes in the cluster.

   baud 19200
          Speed at which to run the serial line (bps).

   udpport 694
          Use port number 694 for bcast or ucast communication. This is the
          default, and the official IANA registered port number.

   auto_failback on

        Required.  For those familiar with Tru64 Unix, heartbeat acts as if in
                "favored member" mode.  The master listed in the haresources
                file holds all the resources until a failover, at which time
                the slave takes over.  When auto_failback is set to on once the
                master comes back online, it will take everything back from the
                slave.  When set to off this option will prevent the master
                node from re-acquiring cluster resources after a failover. This
                option is similar to to the obsolete nice_failback option. If
                you want to upgrade from a cluster which had nice_failback set
                off, to this or later versions, special considerations apply in
                order to want to avoid requiring a flash cut. Please see the
                [3]FAQ for details on how to deal with this situation.

   node linuxha1.linux-ha.org
          Mandatory.  Hostname of machine in cluster as described by `uname
          -n`.

   node linuxha2.linux-ha.org
          Mandatory.  Hostname of machine in cluster as described by `uname
          -n`.

   respawn  userid  cmd
          Optional:  Lists a command to be spawned  and monitored.  Eg:  To
          spawn ccm daemons the following line has to be added:
                  respawn hacluster /usr/lib/heartbeat/ccm
          Informs heartbeat to spawn the command with the credentials of that
          of userid (hacluster, in this example) and monitors the health of the
          process, respawning it if dead.  For ipfail, the line would be:
                    respawn hacluster /usr/lib/heartbeat/ipfail
          NOTE: If the process dies with exit code 100, the process is not
          respawned.

   ping    ping1.linux-ha.org  ping2.linux-ha.org ....
          Optional: Specify ping nodes.  These nodes are not considered as
          cluster nodes.  They are used to check  network connectivity for
          modules like ipfail.

   ping_group    name  ping1.linux-ha.org  ping2.linux-ha.org ....
          Optional: Specify a group ping nodes.  These are the similar to ping
          nodes, but if any node in a group is available then the group is
          considered available. The group name can be any string and is used to
          uniquely identify the group. Each group must appear on a separate
          line. Like ping nodes the group is not considered to be a cluster
          node. They appear to be the same as ping nodes and are used to check
          network connectivity for modules like ipfail.

   Configuring haresources
   Once you've got your ha.cf set up, you need to configure haresources.  This
   file specifies the services for the cluster and who the default owner is.
   Note:  This file must be the same on both nodes!

   For our example, we'll assume the high availability services are Apache and
   Samba.  The IP for the cluster is mandatory, and don't configure the cluster
   IP outside of the haresources file!.  The haresources will need one line:
                  linuxha1.linux-ha.org 192.168.85.3 httpd smb

   So, this line dictates that on startup, have linuxha1 serve the IP
   192.168.85.3 and start apache and samba as well.
   On shutdown, heartbeat will first stop smb, then apache, then give up the
   IP.  This assumes that the command "uname -n" spits out
   "linuxha1.linux-ha.org" - yours may well produce "linuxha1" and if it does,
   use that instead!

   Note:  httpd and smb are the name of startup scripts for Apache and Samba,
   respectively.  Heartbeat will look for startup scripts of the same name in
   the following paths:
       /etc/ha.d/resource.d
       /etc/rc.d/init.d

   These scripts must start services via "scriptname start" and stop them via
   "scriptname stop".
   So you can use any services as long as they conform to the above standard.

   Should you need to pass arguments to a custom script, the format would be:
                scriptname::argument

   So, if we added a service "maid" which needed the argument "vacuum", our
   haresources line would modify to the following:
                linuxha1 192.168.85.3 httpd smb maid::vacuum

   This brings us to some added flexibility with the service IP address.  We
   are actually using a shorthand notation above.  The actual line could have
   read (we've canned the maid):
                linuxha1 IPaddr::192.168.85.3 httpd smb

   Where IPaddr is the name of our service script, taking the argument
   192.168.85.3.  Sure enough, if you look in the directory
   /etc/ha.d/resource.d, you will find a script called IPaddr.  This script
   will also allow you to manipulate the netmask, broadcast address and base
   interface of this IP service.  To specify a subnet with 32 addresses, you
   could define the service as (leaving off the IPaddr because we can!):
                linuxha1 192.168.85.3/27 httpd smb

   This sets the IP service address to 192.168.85.3, the netmask to
   255.255.255.224 and the broadcast address would default to 192.168.85.31
   (which is the highest address on the subnet).  The last parameter you can
   set is the broadcast address.  To override the default  and set it to
   192.168.85.16, your entry would read:
                linuxha1 192.168.85.3/27/192.168.85.16 httpd smb

   You may be wondering whether any of the above is necessary for you.  It
   depends.  If you've properly established a net route (independent of
   heartbeat) for the service's IP address, with the correct netmask and
   broadcast address, then no, it's not necessary for you.  However, this case
   won't fit everybody and that's why the option's there!  In addition, you may
   have more than one possible interface that could be used for the service
   IP.  Read on to see how heartbeat treats this...

   Once you straighten out your haresources file, copy ha.cf and haresources to
   /etc/ha.d and you're ready to start!

   Configuring ipfail
   The ipfail plugin attempts to provide detection of network failures, and
   then intelligently react, directing the cluster to failover resources as
   necessary. In order to accomplish this goal, it uses ping nodes or ping
   groups which work as "dumb" third parties in the cluster. Provided both HA
   nodes can communicate with each other, ipfail can reliably detect when one
   of their network links has become unusable, and compensate.
   To configure ipfail, the following steps must be performed.
    1. Select good ping node candidates.
       It is essential that good strategic ping nodes be selected. The better
       your choices, the stronger your HA cluster becomes. Choosing solid
       network devices like switches and routers is a good idea. Do not choose
       either of the members of the HA cluster. Nor should you select someone's
       workstation. It is also important to select ping nodes that reflect the
       connectivity of your HA nodes. If you wish to monitor the connectivity
       of two interfaces, it is wise to select a ping node for each interface,
       that is reachable exclusively from said interface. Consult
       [4]ipfail-diagram.pdf for a graphical representation of this idea.
    2. Set auto_failback to on or off.
       ipfail will only operate if heartbeat has been configured to something
       other than legacy In ha.cf, set the auto_failback option to "on" or
       "off" like so:

     auto_failback on
       or

     auto_failback off
    3. Configure your ha.cf to start ipfail.
       Add a line like the following to ha.cf (assuming your compile PREFIX is
       /usr)

     respawn hacluster /usr/lib/heartbeat/ipfail
    4. Add the ping nodes to ha.cf.
       The ping nodes can be added to the cluster by using a line like the
       following:

     ping pnode1 pnode2 pnodeN
       Simply replace pnode1, pnode2, ... pnodeN with the IP addresses of your
       ping nodes.

   Ensure that the above configuration directives are added to the ha.cf on
   both members of the cluster, and that they are identical.

     NOTE: You will want to check on the availability of the ping nodes prior
     to using them. If you cannot ping them from both of the HA nodes, they are
     useless.

Selecting an Interface

   One important aspect of configuring the haresources file for a machine which
   has multiple ethernet interfaces is to know how heartbeat selects which
   interface will wind up supporting the service addresses that are configured
   in haresources.  After all, no interface was specified in the haresources
   file.

   Heartbeat decides which interface will be used by looking at the routing
   table.  It tries to select the lowest cost route to the IP address to be
   taken over.  In the case of a tie, it chooses the first route found.  For
   most configurations this means the default route will be least preferred.

   If you don't specify a netmask for the IP address in the haresources file,
   the netmask associated with the selected route will be used. Simmilarly, if
   an interface is not specivied, then the virtual ip address will be added to
   the interface associated with the selected route. If the broadcast address
   is omitted then the hightest address in the subnet is used.

   Configuring Authkeys

   The third file to configure determines your authentication keys.  There are
   three types of authentication methods available:  crc, md5, and sha1.
   "Well, which should I use?", you ask.  Since this document is called
   "Getting Started", we'll keep it simple......

   If your heartbeat runs over a secure network, such as the crossover cable in
   our example, you'll want to use crc.  This is the cheapest method from a
   resources perspective.  If the network is insecure, but you're either not
   very paranoid or concerned about minimizing CPU resources, use md5.
   Finally, if you want the best authentication without regard for CPU
   resources, use sha1.  It's the hardest to crack.

   The format of the file is as follows:
   auth <number>
   <number> <authmethod> [<authkey>]

   SO, for sha1, a sample /etc/ha.d/authkeys could be:
   auth 1
   1 sha1 key-for-sha1-any-text-you-want

   For md5, you could use the same as the above, but replace "sha1" with "md5".

   Finally, for crc, a sample might be:
   auth 2
   2 crc

   Whatever index you put after the keyword auth must be found below in the
   keys listed in the file. If you put "auth 4", then there must be an "4
   signaturetype" line in the list below.

   Make sure its permissions are safe, like 600.  And "any text you want" is
   not quite right.  There's a limit to the number of characters you can use.
   That's it!

Starting and testing heartbeat

   From Red Hat, or other distributions which use /etc/init.d startup files,
   simply type /etc/init.d/heartbeat start on both nodes.  I would recommend
   starting on the system master (in our example linuxha1) first.

   If you want heartbeat to run on startup, what to do will differ on your
   distribution.  You may need to place links to the startup script in the
   appropriate init level directories, but the RPM versions will do this for
   you.  I have heartbeat start at its default sequential priority (75, which
   means it starts after services 74 and lower and before services with
   priority 76-99), end at its default sequential priority (05), and only care
   about the 0(halt), 6(reboot), 3(text-only), 5(X) run levels.

   So, if I had to do it by hand, I'd need to type in the following (as root,
   of course):

       cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeat K05heartbeat
       cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeat S75heartbeat
       cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeat S75heartbeat
       cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeat K05heartbeat

   The last time I ran slackware, there was no /etc/rc.d/init.d directory (may
   have changed by now) and to do the same thing, I would have placed in
   /etc/rc.d/rc.local:
       /etc/ha.d/heartbeat start
   ***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat.  If you
   can't find /etc/rc.d/init.d with your distribution and you're unsure of how
   processes start, you can use the rc.local method.  But you're on your own
   for shutdown, I just don't remember...

   Note:  If you use the watchdog function, you'll need to load its module at
   bootup as well.  You can put the following command at the bottom of the
   /etc/rc.d/rc.sysinit file:
       /sbin/insmod softdog
   For the rc.local method, just put the same line right above where you start
   heartbeat.

   Once you've started heartbeat, take a peek at your log file (default is
   /var/log/ha-log) before testing it.  If all is peachy, the service owner's
   log (linuxha1 in our example) should look something like this:
   heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacility found.
   heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to /var/log/ha-log
   heartbeat: 2003/02/10_13:52:22 info: **************************
   heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Starting
   heartbeat 0.4.9f
   heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.
   heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f
   heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17
   heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty
   /dev/ttyS0 (19200 baud)
   heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started on port
   694 (694) interface eth1
   heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.
   heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device: /dev/watchdog
   heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'
   heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.
   heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.
   heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.
   heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead
   heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.org held no
   resources.
   heartbeat: 2003/02/10_13:53:23 info: Resources being acquired from
   linuxha2.linux-ha.org.
   heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'
   heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/status status
   heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down:
   nice_failback: acquiring foreign resources
   heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.
   heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete for node
   linuxha2.linux-ha.org.
   heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group:
   linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror
   heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/resource.d/IPaddr
   192.168.85.3 start
   heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3
   netmask 255.255.255.0  broadcast 192.168.85.255
   heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for 192.168.85.3
   on eth0:0 [eth0]
   heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
   00304823BD48 192.168.85.3 ffffffffffff
   heartbeat: 2003/02/10_13:53:24 info: Running /etc/ha.d/resource.d/datadisk
   drbd0 start
   heartbeat: 2003/02/10_13:53:24 info: Running /etc/ha.d/resource.d/datadisk
   drbd1 start
   heartbeat: 2003/02/10_13:53:25 info: Running /etc/ha.d/resource.d/mirror
   start
   heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
   00304823BD48 192.168.85.3 ffffffffffff
   heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.
   heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
   00304823BD48 192.168.85.3 ffffffffffff
   heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
   00304823BD48 192.168.85.3 ffffffffffff
   heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0 192.168.85.3
   00304823BD48 192.168.85.3 ffffffffffff
   heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisition completed.
   (none)
   heartbeat: 2003/02/10_13:53:33 info: local resource transition completed.
   heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.
   heartbeat: 2003/02/10_13:56:30 info: Status update for node
   linuxha2.linux-ha.org: status up
   heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status status
   heartbeat: 2003/02/10_13:56:30 info: Status update for node
   linuxha2.linux-ha.org: status active
   heartbeat: 2003/02/10_13:56:30 info: remote resource transition completed.
   heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status status
   heartbeat: 2003/02/10_13:56:31 info: Link linuxha2.linux-ha.org:/dev/ttyS0
   up.
   NOTE:  Your log may differ depending on when you started heartbeat on
   linuxha2!!!  I started heartbeat on the linuxha2 @13:56:30...
                   _____________________________________

   OK, now try to ping your cluster's IP (192.168.85.3 in the example). If this
   works, ssh to it and verify you're on linuxha1.
   Next, make sure your services are tied to the .3 address.  Bring up netscape
   and type in 192.168.85.3 for the URL.  For Samba, try to map the drive
   "\\192.168.85.3\test"  assuming you set up a share called "test".  See Samba
   docs to get that going.  As an aside, however, you'll want to use the
   "netbios name" parameter to have your Samba share listed under the cluster
   name and not the hostname of your cluster member!

   NOTE: If you can't bring up the service IP address and you get ha-log
   entries similar to this:

             SIOCSIFADDR: No such device
             SIOCSIFFLAGS: No such device
             SIOCSIFNETMASK: No such device
             SIOCSIFBRDADDR: No such device
             SIOCSIFFLAGS: No such device
             SIOCADDRT: No such device

     It may mean that you need to enable IP aliasing in your kernel build.
     Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if you don't have it,
     you'll have the line "CONFIG_IP_ALIAS is not set".  Rebuild your kernel
     with IP aliasing enabled.

   If this all works, you've got availability.  Now let's see if we have High
   Availability :-)

   Take down linuxha1.  Kill power, kill heartbeat, whatever you have the
   stomach for, but don't just yank both the serial and eth1 heartbeat cables.
   If you do that, you'll have services running on both nodes and when you
   re-connect the heartbeat, a bit of chaos....
   Now ping the cluster IP. Approximately 5-10 seconds later it should start
   responding again. Telnet again and verify you're on linuxha2.  If it happens
   but takes more like 30 seconds, something is wrong.

   If you get this far, it's probably working, but you should probably check
   all your heartbeats, too.
   First, check your serial heartbeat.  Unplug the crossover cable from your
   eth1 NIC that you're using for your bcast heartbeat.  Wait about 10 seconds.
   Now, look at /var/log/ha-log on linuxha2 and make sure there's no line like
   this:
       1999/08/16_12:40:58 node linuxha1.linux-ha.org: is dead
   If you get that, your serial heartbeat isn't working and your second node is
   taking over.  To avoid any problems, shut down heartbeat on the first node,
   then test your null modem cable.  Run the above serial tests again.

   If your log is clean, great.  Re-connect the crossover cable.  Once that's
   done, disconnect the serial cable, wait 10 seconds and check the linuxha2
   log again.
   If it's clean, congrats!  If not, you can check /var/log/ha-log and
   /var/log/ha-debug for more clues.

   Appendix A - Ethernet Crossover Cable Construction

   Your cable diagram should be as follows:

       Connector A     Connector B


   Connector A Connector B
      Pin #       Pin #
        1           3
        2           6
        3           1
        6           2
        4           7
        5           8
        7           4
        8           5

   Rev 1.2.0
   (c) 2003 Rudy Pawul
   [5]rpawul@iso-ne.com

References

   1. http://linux-ha.org/download
   2. file://localhost/tmp/tmp.jJPS5052/linux-ha/doc/faqntips.html
   3. http://linux-ha.org/download/faqnstuff.html
   4. file://localhost/tmp/tmp.jJPS5052/linux-ha/doc/ipfail-diagram.pdf
   5. mailto:rpawul@iso-ne.com