File: GettingStarted.html

package info (click to toggle)
heartbeat-2 2.0.7-2
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 16,732 kB
  • ctags: 13,635
  • sloc: ansic: 137,128; sh: 24,241; perl: 2,430; makefile: 2,127; yacc: 140; lex: 105; python: 39
file content (628 lines) | stat: -rw-r--r-- 32,462 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="Content-Type"
 content="text/html; charset=iso-8859-1">
  <meta name="Description"
 content="This document is a short description of how to get started with Linux-HA (heartbeat), especially from the software perspective.">
  <meta name="Author" content="Rudy Pawul - rpawul@iso-ne.com">
  <meta name="GENERATOR"
 content="Mozilla/4.75 [en] (Windows NT 5.0; U) [Netscape]">
  <title>Getting Started with Linux-HA (heartbeat)</title>
</head>
<body>
<h1> Getting Started with Linux-HA (heartbeat)</h1>
<h2> Intro</h2>
Let me preface this document by saying most of this is _not_ original
work.&nbsp; My purpose for writing this document is just trying to
contribute in some way to possibly help those who REALLY get things
done.&nbsp; The "work" I am contributing is mostly compiling bits and
pieces from other HA documents (such as Volker Wiegand's Hardware
Installation Guide) into a document that can help novices get started on
HA without pestering Alan (like I did!) and to cut down on repeat
questions on the mailing list. <br>
&nbsp;
<h2> Getting Started</h2>
The first thing you'll need is two computers.&nbsp; You need not have
identical hardware in both machines (or amount of memory, etc.), but if
you did, it would make your life that much easier when a component
fails.
<p>Now you have to decide on some of your implementation.&nbsp; Your
"cluster" is established via a "heartbeat" between the two computers
(nodes) generated by the software package of the same name.&nbsp;
However, this heartbeat needs one or more media paths (serial via a null
modem cable, ethernet via a crossover cable, etc.) between the nodes. </p>
<p>At this point, you're actually ready to begin hardware-wise.&nbsp;
Of course, since you're looking into HA, you'll mostly likely want to
avoid having only one point of failure.&nbsp; In this case, that would
be your null modem cable/serial port or network interface
card(NIC)/crossover cable.&nbsp; So, you need to decide whether you wish
to add a second serial/null modem connection or a second network
interface card (NIC)/crossover connnection to each node.&nbsp; See
Appendix A for instructions on how to build a Cat-5 crossover
cable.&nbsp; My heartbeat path setup uses one serial port and one extra
NIC because I only had one null modem cable, had an extra of NIC on hand
and thought it was good to have two medium types for the heartbeats. </p>
<p>Once your hardware is in order, you must install your OS and
configure your networking (I used Red Hat).&nbsp; Assuming you have 2
NICs, one should be configured for your "normal" network and the other
as a private network between your clustered nodes (via the crossover
cable).&nbsp; For an example, we will assume that our cluster will have
the following addresses: </p>
<p>Node 1 (linuxha1):&nbsp;&nbsp; 192.168.85.1&nbsp; (normal 192x net) <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
10.0.0.1 (private 10x net for heartbeat) <br>
Node 2 (linuxha2):&nbsp;&nbsp; 192.168.85.2&nbsp; (192x) <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
10.0.0.2&nbsp; (10x) <br>
<i><font color="#ff0000">Note:&nbsp; None of these addresses should be
your "cluster address" - the address handled by heartbeat and failed
over between nodes!</font></i><br>
</p>
<p>Most *nix distributions this easy during installation, however, if
you are having any problems, refer to either the Ethernet HOWTO, or the
documentation for your distribution.&nbsp;&nbsp;&nbsp;&nbsp; To check
your configuration, type: </p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <b><tt>ifconfig</tt></b> </p>
<p>This will show your network interfaces and their
configuration.&nbsp; You can obtain your network routing information
from "netstat -nr". </p>
<p>If it looks good, make sure you can ping between both nodes on all
interfaces. </p>
<p>Next, if you're using one, you'll need to test your serial
connection.&nbsp; On one node, which will be the receiver, type: <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <b><tt>cat
&lt;/dev/ttyS0</tt></b> </p>
<p>On the other node, type,: <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <b><tt>echo
hello &gt;/dev/ttyS0</tt></b> </p>
<p>You should see the text on the receiver node.&nbsp; If it works,
change their roles and try again.&nbsp; If it doesn't, it may be as
simple as having the wrong device file.&nbsp; Volker's HA Hardware Guide
and the Serial HOWTO are two good resources for troubleshooting your
serial connection. </p>
<h2> Installing Heartbeat.</h2>
You can now install the heartbeat package.&nbsp; If you're reading
this, you already have it, but in any case it's available at:
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a
 href="http://linux-ha.org/download">http://linux-ha.org/download</a> </p>
<p>There are binary RPMs at the website, or you can build heartbeat
from source. &nbsp;Grab the tarball (or install the source RPM).
&nbsp;Untar it into your favorite source directory.&nbsp;&nbsp; From the
top of the source tree, type "<small><span style="font-weight: bold;">./ConfigureMe
configure</span></small>", followed by "<small><span
 style="font-weight: bold;">make</span><big>" and "<small><span
 style="font-weight: bold;">make install</span><big>". &nbsp;If you
&nbsp;have problems installing the RPMs found at the website and want a
way to make your own, there &nbsp;may be help in the <a
 href="./faqntips.html">FAQ</a>. </big></small></big></small><span
 style="font-weight: bold;"></span></p>
<h2> Configuring Heartbeat</h2>
<b><font size="+1">Configuring ha.cf</font></b> <br>
There are three files you will need to configure before starting up
heartbeat.&nbsp; First, is <i>ha.cf</i>.&nbsp; This will be placed in
the /etc/ha.d directory that is created after installation.&nbsp; It
tells heartbeat what types of media paths to use and how to configure
them.&nbsp;&nbsp; The ha.cf in the source directory contains all the
various options you can use, I'll go through it line by line...
<dl>
  <dt> <b><tt><font size="+1">serial /dev/ttyS0</font></tt></b></dt>
  <dd> Use a serial heartbeat - if you don't use a serial heartbeat, you
must use another medium, such as a bcast (ethernet) heartbeat.
&nbsp;Replace /dev/ttyS0 with the appropriate device file for your
required serial heartbeat.</dd>
  <dt> <b><tt><font size="+1">watchdog /dev/watchdog</font></tt></b></dt>
  <dd> Optional.&nbsp; The watchdog function provides a way to have a
system that is still minimally functioning, but not providing a
heartbeat, reboot itself after a minute of being sick.&nbsp; This could
help to avoid a scenario where the machine recovers its heartbeat after
being pronounced dead.&nbsp; If that happened and a disk mount failed
over, you could have two nodes mounting a disk simultaneously. If you
wish to use this feature, then in addition to this line, you will need
to load the "softdog" kernel module and create the actual device
file.&nbsp; To do this, first type "<b>insmod softdog</b>" to load the
module. Then, type "grep misc /proc/devices" and note the number it
reports (should be 10).&nbsp; Next, type "<b><tt>cat /proc/misc | grep
watchdog</tt></b>" and note that number (should be 130).&nbsp; Now you
can create the device file with that info typing, "<b><tt>mknod
/dev/watchdog c 10 130</tt></b>".</dd>
  <dt> <b><tt><font size="+1">bcast eth1</font></tt></b></dt>
  <dd> Specifies to use a broadcast heartbeat over the  eth1 interface
(replace with eth0, eth2, or whatever you use).</dd>
  <dt> <b><tt><font size="+1">keepalive 2</font></tt></b></dt>
  <dd> Sets the time between heartbeats to 2 seconds.</dd>
  <dt> <b><tt><font size="+1">warntime 10</font></tt></b></dt>
  <dd>Time in seconds before issuing a "late heartbeat" warning in the
logs.</dd>
  <dt> <b><tt><font size="+1">deadtime 30</font></tt></b></dt>
  <dd> Node is pronounced dead after 30 seconds.</dd>
  <dt> <b><tt><font size="+1">initdead 120</font></tt></b></dt>
  <dd>With some configurations, the network takes some time to start
working after a reboot. &nbsp; This is a separate "deadtime" to handle
that case. &nbsp;It should be at least twice the normal deadtime.</dd>
  <dt><b><tt><font size="+1">hopfudge 1</font></tt></b></dt>
  <dd> <i>Optional</i>.&nbsp; For ring topologies, number of hops
allowed in addition to the number of nodes in the cluster.</dd>
  <dt> <b><tt><font size="+1">baud 19200</font></tt></b></dt>
  <dd> Speed at which to run the serial line (bps).</dd>
  <dt> <b><tt><font size="+1">udpport 694</font></tt></b></dt>
  <dd> Use port number 694 for bcast or ucast communication.
	This is the default, and the official IANA registered port number.</dd>
  <dt> <b><tt><font size="+1">auto_failback on</font></tt></b></dt>
  <dl>
    <dt> <i>Required.</i>&nbsp; For those familiar with Tru64 Unix,
heartbeat acts as if in "favored member" mode.&nbsp; The master listed
in the haresources file holds all
the resources until a failover, at which time the slave takes
over.&nbsp; When <i>auto_failback</i> is set to <b>on</b>
once the master comes back online, it will take everything
back from the slave.&nbsp;
When set to <b>off</b> this option will prevent the master node from
re-acquiring cluster resources after a failover.
This option is similar to to the obsolete <i>nice_failback</i> option.
If you want to upgrade from a cluster which had <i>nice_failback</i>
set <b>off</b>, to this or later versions, special considerations apply
in order to want to avoid requiring a flash cut.  Please see the
<a href="http://linux-ha.org/download/faqnstuff.html">FAQ</a> for details
on how to deal with this situation.
</dt>
  </dl>
  <dt> <b><tt><font size="+1">node linuxha1.linux-ha.org</font></tt></b></dt>
  <dd> <i>Mandatory</i>.&nbsp; Hostname of machine in cluster as
described by `uname -n`.</dd>
  <dt> <b><tt><font size="+1">node linuxha2.linux-ha.org</font></tt></b></dt>
  <dd> <i>Mandatory</i>.&nbsp; Hostname of machine in cluster as
described by `uname -n`.<br>
  </dd>
  <dt> <b><tt><font size="+1">respawn&nbsp; userid&nbsp; cmd</font></tt></b></dt>
  <dd> <i>Optional</i>:&nbsp; Lists a command to be spawned&nbsp; and
monitored.&nbsp; Eg:&nbsp; To spawn ccm daemons the following line has
to be added:</dd>
  <dd> <b>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; respawn hacluster
/usr/lib/heartbeat/ccm</b><br>
Informs heartbeat to spawn the command with the credentials of that of
userid (hacluster, in this example) and monitors the health of the
process, respawning it if dead. &nbsp;For ipfail, the line would be:<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; <span
 style="font-weight: bold;">respawn hacluster /usr/lib/heartbeat/ipfail</span><span
 style="font-weight: bold;"><br>
NOTE</span>:&nbsp;If the process dies with exit code 100, the process
is not respawned.</dd>
  <dd> <br>
  </dd>
  <dt> <b><tt><font size="+1">ping&nbsp;&nbsp;&nbsp;
ping1.linux-ha.org&nbsp; ping2.linux-ha.org ....</font></tt></b></dt>
  <dd> <i>Optional</i>: Specify ping nodes.&nbsp; These nodes are not
considered as cluster nodes.&nbsp; They are used to check&nbsp; network
connectivity for modules like ipfail.</dd>
  <br>
  <dd><br>
  </dd>
  <dt> <b><tt><font size="+1">ping_group&nbsp;&nbsp;&nbsp;
name&nbsp; ping1.linux-ha.org&nbsp; ping2.linux-ha.org ....</font></tt></b></dt>
  <dd> <i>Optional</i>: Specify a group ping nodes.&nbsp; These are the
  similar to ping nodes, but if any node in a group is available
  then the group is considered available. The group name can
  be any string and is used to uniquely identify the group.
  Each group must appear on a separate line.
  Like ping nodes the group is not considered to be a cluster node.
  They appear to be the same as ping nodes and are used to check&nbsp; network
  connectivity for modules like ipfail.</dd>
  <br>
  <dd><br>
  </dd>
</dl>
<b><font size="+1">Configuring haresources</font></b> <br>
Once you've got your ha.cf set up, you need to configure <i>haresources</i>.&nbsp;
This file specifies the services for the cluster and who the default
owner is. <br>
<br>
<big><b><i><font color="#ff0000">Note:&nbsp; This file must be the same
on both nodes!</font></i></b></big>
<p>For our example, we'll assume the high availability services are
Apache and Samba.&nbsp; The IP for the cluster is mandatory, and <b>don't
configure the cluster IP outside of the haresources file!</b>.&nbsp;
The haresources will need one line: </p>
<pre>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <b><tt>linuxha1.linux-ha.org 192.168.85.3 httpd smb</tt></b></pre>
<tt>So, this line dictates that on startup, have linuxha1 serve the IP
192.168.85.3 and start apache and samba as well.</tt> <br>
<tt>On shutdown, heartbeat will first stop smb, then apache, then give
up the IP.&nbsp; This assumes that the command "uname -n" spits out
"linuxha1.linux-ha.org" - yours may well produce "linuxha1" and if it
does, use that instead!</tt>
<p><tt><i>Note</i>:&nbsp; httpd and smb are the name of startup scripts
for Apache and Samba, respectively.&nbsp; Heartbeat will look for
startup scripts of the same name in the following paths:</tt> <br>
<tt>&nbsp;&nbsp;&nbsp; /etc/ha.d/resource.d</tt> <br>
<tt>&nbsp;&nbsp;&nbsp; /etc/rc.d/init.d</tt> </p>
<p><tt>These scripts must start services via "scriptname start" and
stop them via "scriptname stop".</tt> <br>
<tt>So you can use any services as long as they conform to the above
standard.</tt> </p>
<p>Should you need to pass arguments to a custom script, the format
would be: </p>
<pre>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <b>scriptname::argument</b></pre>
So, if we added a service "maid" which needed the argument "vacuum",
our haresources line would modify to the following:
<pre><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; linuxha1 192.168.85.3 httpd smb maid::vacuum</b></pre>
<p><br>
<font size="+1">This brings us </font>to some added flexibility with
the service IP address.&nbsp; We are actually using a shorthand notation
above.&nbsp; The actual line could have read (we've canned the maid): </p>
<pre><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; linuxha1 IPaddr::192.168.85.3 httpd smb</b></pre>
Where <b><i>IPaddr</i></b> is the name of our service script, taking
the argument 192.168.85.3.&nbsp; Sure enough, if you look in the
directory /etc/ha.d/resource.d, you will find a script called
IPaddr.&nbsp; This script will also allow you to manipulate the netmask,
broadcast address and base interface of this IP service.&nbsp; To specify a subnet with
32 addresses, you could define the service as (leaving off the IPaddr
because we can!):
<pre><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; linuxha1 192.168.85.3/27 httpd smb</b></pre>
This sets the IP service address to 192.168.85.3, the netmask to
255.255.255.224 and the broadcast address would default to 192.168.85.31
(which is the highest address on the subnet).&nbsp; The last parameter
you can set is the broadcast address.&nbsp; To override the
default&nbsp; and set it to 192.168.85.16, your entry would read:
<pre><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; linuxha1 192.168.85.3/27/192.168.85.16 httpd smb</b></pre>
You may be wondering whether any of the above is necessary for
you.&nbsp; It depends.&nbsp; If you've properly established a net route
(independent of heartbeat) for the service's IP address, with the
correct netmask and broadcast address, then no, it's not necessary for
you.&nbsp; However, this case won't fit everybody and that's why the
option's there!&nbsp; In addition, you may have more than one possible
interface that could be used for the service IP.&nbsp; Read on to see
how heartbeat treats this...
<p>Once you straighten out your haresources file, copy ha.cf and
haresources to /etc/ha.d and you're ready to start! <br>
&nbsp; </p>

<b><font size="+1">Configuring ipfail</font></b><br>
The ipfail plugin attempts to provide detection of network failures, and
then intelligently react, directing the cluster to failover resources as
necessary.  In order to accomplish this goal, it uses ping nodes or ping
groups which work as "dumb" third parties in the cluster.  Provided both HA
nodes can communicate with each other, ipfail can reliably detect when one
of their network links has become unusable, and compensate.<br>
<br>
To configure ipfail, the following steps must be performed.
<ol>
<li><b>Select good ping node candidates.</b><br>
It is essential that good strategic ping nodes be selected.  The better your
choices, the stronger your HA cluster becomes.  Choosing solid network devices
like switches and routers is a good idea.  Do not choose either of the 
members of the HA cluster.  Nor should you select someone's workstation.  It 
is also important to select ping nodes that reflect the connectivity of your
HA nodes.  If you wish to monitor the connectivity of two interfaces, it is 
wise to select a ping node for each interface, that is reachable exclusively 
from said interface.  Consult 
<a href="ipfail-diagram.pdf">ipfail-diagram.pdf</a> for a graphical 
representation of this idea.
<br><br></li>
<li><b>Set auto_failback to <i>on</i> or <i>off</i>.</b><br>
ipfail will only operate if heartbeat has been configured to something
other than <i>legacy</i>
In ha.cf, set the auto_failback option to "on" or "off" like so:
<blockquote>
<tt>auto_failback on</tt>
</blockquote>
or
<blockquote>
<tt>auto_failback off</tt>
</blockquote>
</li>
<li><b>Configure your ha.cf to start ipfail.</b><br>
Add a line like the following to ha.cf (assuming your compile PREFIX is /usr)
<blockquote>
respawn hacluster /usr/lib/heartbeat/ipfail
</blockquote>
</li>
<li><b>Add the ping nodes to ha.cf.</b><br>
The ping nodes can be added to the cluster by using a line like the following:
<blockquote>
ping pnode1 pnode2 pnodeN
</blockquote>
Simply replace pnode1, pnode2, ... pnodeN with the IP addresses of your ping 
nodes.
</li>
</ol>
Ensure that the above configuration directives are added to the ha.cf on 
both members of the cluster, and that they are identical.<br>

<blockquote>
<b>NOTE:</b> You will want to check on the availability of the ping nodes 
prior to using them.  If you cannot ping them from both of the HA nodes, 
they are useless.
</blockquote>

<h2> Selecting an Interface</h2>
One important aspect of configuring the haresources file for a machine
which has multiple ethernet interfaces is to know how heartbeat selects
which interface will wind up supporting the service addresses that are
configured in haresources.&nbsp; After all, no interface was specified
in the haresources file.
<p>Heartbeat decides which interface will be used by looking at the
routing table.&nbsp; It tries to select the lowest cost route to the IP
address to be taken over.&nbsp; In the case of a tie, it chooses the
first route found.&nbsp; For most configurations this means the default
route will be least preferred. </p>
<p>If you don't specify a netmask for the IP address in the haresources
file, the netmask associated with the selected route will be used. 
Simmilarly, if an interface is not specivied, then the virtual ip address
will be added to the interface associated with the selected route.
If the broadcast address is omitted then the hightest address in
the subnet is used.<br>
&nbsp; </p>
<p><b><font size="+2">Configuring Authkeys</font></b> </p>
<p>The third file to configure determines your authentication
keys.&nbsp; There are three types of authentication methods
available:&nbsp; crc, md5, and sha1.&nbsp; "Well, which should I use?",
you ask.&nbsp; Since this document is called "Getting <i>Started</i>",
we'll keep it simple...... </p>
<p>If your heartbeat runs over a secure network, such as the crossover
cable in our example, you'll want to use crc.&nbsp; This is the cheapest
method from a resources perspective.&nbsp; If the network is insecure,
but you're either not very paranoid or concerned about minimizing CPU
resources, use md5.&nbsp; Finally, if you want the best authentication
without regard for CPU resources, use sha1.&nbsp; It's the hardest to
crack. </p>
<p>The format of the file is as follows: <br>
auth &lt;number&gt; <br>
&lt;number&gt; &lt;authmethod&gt; [&lt;authkey&gt;] </p>
<p>SO, for sha1, a sample /etc/ha.d/authkeys could be: <br>
auth 1 <br>
1 sha1 key-for-sha1-any-text-you-want </p>
<p>For md5, you could use the same as the above, but replace "sha1"
with "md5". </p>
<p>Finally, for crc, a sample might be: <br>
auth 2 <br>
2 crc </p>
<p> Whatever index you put after the keyword <b>auth</b> must be found
below in the keys listed in the file.  If you put "auth 4", then there
must be an "4 signaturetype" line in the list below. </p>
<p>Make sure its permissions are safe, like 600.&nbsp; And "any text
you want" is not <i>quite</i> right.&nbsp; There's a limit to the number
of characters you can use. <br>
That's it! </p>
<h2> Starting and testing heartbeat</h2>
From Red Hat, or other distributions which use /etc/init.d startup
files, simply type /etc/init.d/heartbeat start on both nodes.&nbsp; I
would recommend starting on the system master (in our example linuxha1)
first.
<p>If you want heartbeat to run on startup, what to do will differ on
your distribution.&nbsp; You may need to place links to the startup
script in the appropriate init level directories, but the RPM versions
will do this for you.&nbsp; I have heartbeat start at its default
sequential priority (75, which means it starts after services 74 and
lower and before services with priority 76-99), end at its default
sequential priority (05), and only care about the 0(halt), 6(reboot),
3(text-only), 5(X) run levels.&nbsp;</p>
<p>So, if I had to do it by hand, I'd need to type in the following (as
root, of course): </p>
<p><b>&nbsp;&nbsp;&nbsp; cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeat
K05heartbeat</b> <br>
<b>&nbsp;&nbsp;&nbsp; cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeat
S75heartbeat</b> <br>
<b>&nbsp;&nbsp;&nbsp; cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeat
S75heartbeat</b> <br>
&nbsp;<b>&nbsp;&nbsp; cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeat
K05heartbeat</b> </p>
<p>The last time I ran slackware, there was no /etc/rc.d/init.d
directory (may have changed by now) and to do the same thing, I would
have placed in /etc/rc.d/rc.local: <br>
&nbsp;&nbsp;&nbsp; <b>/etc/ha.d/heartbeat start</b> <br>
***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat.&nbsp;
If you can't find /etc/rc.d/init.d with your distribution and you're
unsure of how processes start, you can use the rc.local method.&nbsp;
But you're on your own for shutdown, I just don't remember... </p>
<p><i>Note:&nbsp; </i>If you use the watchdog function, you'll need to
load its module at bootup as well.&nbsp; You can put the following
command at the bottom of the /etc/rc.d/rc.sysinit file: <br>
&nbsp;&nbsp;&nbsp; <b>/sbin/insmod softdog</b> <br>
For the rc.local method, just put the same line right above where you
start heartbeat. <br>
&nbsp; </p>
<p>Once you've started heartbeat, take a peek at your log file (default
is /var/log/ha-log) before testing it.&nbsp; If all is peachy, the
service owner's log (linuxha1 in our example) should look something like
this: <br>
heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacility
found.<br>
heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to
/var/log/ha-log<br>
heartbeat: 2003/02/10_13:52:22 info: **************************<br>
heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Starting
heartbeat 0.4.9f<br>
heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.<br>
heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f<br>
heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17<br>
heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty
/dev/ttyS0 (19200 baud)<br>
heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started on
port 694 (694) interface eth1<br>
heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.<br>
heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.<br>
heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.<br>
heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device:
/dev/watchdog<br>
heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.<br>
heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'<br>
heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.<br>
heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.<br>
heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.<br>
heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead<br>
heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.org
held no resources.<br>
heartbeat: 2003/02/10_13:53:23 info: Resources being acquired from
linuxha2.linux-ha.org.<br>
heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'<br>
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/status
status<br>
heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down:
nice_failback: acquiring foreign resources<br>
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.<br>
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete for
node linuxha2.linux-ha.org.<br>
heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group:
linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror<br>
heartbeat: 2003/02/10_13:53:23 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.85.3 start<br>
heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3
netmask 255.255.255.0&nbsp; broadcast 192.168.85.255<br>
heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for
192.168.85.3 on eth0:0 [eth0]<br>
heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
heartbeat: 2003/02/10_13:53:24 info: Running
/etc/ha.d/resource.d/datadisk drbd0 start<br>
heartbeat: 2003/02/10_13:53:24 info: Running
/etc/ha.d/resource.d/datadisk drbd1 start<br>
heartbeat: 2003/02/10_13:53:25 info: Running
/etc/ha.d/resource.d/mirror&nbsp; start<br>
heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.<br>
heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisition
completed. (none)<br>
heartbeat: 2003/02/10_13:53:33 info: local resource transition
completed.<br>
heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.<br>
heartbeat: 2003/02/10_13:56:30 info: Status update for node
linuxha2.linux-ha.org: status up<br>
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status
status<br>
heartbeat: 2003/02/10_13:56:30 info: Status update for node
linuxha2.linux-ha.org: status active<br>
heartbeat: 2003/02/10_13:56:30 info: remote resource transition
completed.<br>
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status
status<br>
heartbeat: 2003/02/10_13:56:31 info: Link
linuxha2.linux-ha.org:/dev/ttyS0 up.<br>
<b>NOTE:</b>&nbsp; Your log may differ depending on when you started
heartbeat on linuxha2!!! &nbsp;I started heartbeat on the linuxha2
@13:56:30...</p>
<p> </p>
<hr width="54%">
<p><b>OK, </b>now try to ping your cluster's IP (192.168.85.3 in the
example). If this works, ssh to it and verify you're on linuxha1. <br>
Next, make sure your services are tied to the .3 address.&nbsp; Bring
up netscape and type in 192.168.85.3 for the URL.&nbsp; For Samba, try
to map the drive "\\192.168.85.3\test"&nbsp; assuming you set up a share
called "test".&nbsp; See Samba docs to get that going.&nbsp; As an
aside, however, you'll want to use the "netbios name" parameter to have
your Samba share listed under the cluster name and not the hostname of
your cluster member! </p>
<p><b><font color="#ff0000">NOTE</font>: </b>If you can't bring up the
service IP address and you get ha-log entries similar to this: </p>
<blockquote>
  <blockquote>
    <blockquote>
      <blockquote><i>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
SIOCSIFADDR: No such device</i> <br>
        <i>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SIOCSIFFLAGS: No
such device</i> <br>
        <i>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SIOCSIFNETMASK:
No such device</i> <br>
        <i>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SIOCSIFBRDADDR:
No such device</i> <br>
        <i>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SIOCSIFFLAGS: No
such device</i> <br>
        <i>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SIOCADDRT: No
such device</i></blockquote>
    </blockquote>
  </blockquote>
It <i>may</i> mean that you need to enable IP aliasing in your kernel
build.&nbsp; Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if you
don't have it, you'll have the line "CONFIG_IP_ALIAS is not set".&nbsp;
Rebuild your kernel with IP aliasing enabled.</blockquote>
If this all works, you've got availability.&nbsp; Now let's see if we
have High Availability :-)
<p>Take down linuxha1.&nbsp; Kill power, kill heartbeat, whatever you
have the stomach for, but <b>don't just yank</b> both the serial and
eth1 heartbeat cables.&nbsp; If you do that, you'll have services
running on both nodes and when you re-connect the heartbeat, a bit of
chaos.... <br>
Now ping the cluster IP. Approximately 5-10 seconds later it should
start responding again. Telnet again and verify you're on
linuxha2.&nbsp; If it happens but takes more like 30 seconds, something
is wrong. </p>
<p>If you get this far, it's probably working, but you should probably
check all your heartbeats, too. <br>
First, check your serial heartbeat.&nbsp; Unplug the crossover cable
from your eth1 NIC that you're using for your bcast heartbeat.&nbsp; Wait
about 10 seconds. <br>
Now, look at /var/log/ha-log on linuxha2 and make sure there's no line
like this: <br>
&nbsp;&nbsp;&nbsp; <b>1999/08/16_12:40:58 node linuxha1.linux-ha.org:
is dead</b> <br>
If you get that, your serial heartbeat isn't working and your second
node is taking over.&nbsp; To avoid any problems, shut down heartbeat on
the first node, then test your null modem cable.&nbsp; Run the above
serial tests again. </p>
<p>If your log is clean, great.&nbsp; Re-connect the crossover
cable.&nbsp; Once that's done, disconnect the serial cable, wait 10
seconds and check the linuxha2 log again. <br>
If it's clean, congrats!&nbsp; If not, you can check /var/log/ha-log
and /var/log/ha-debug for more clues. <br>
&nbsp; </p>
<p><b><font size="+1">Appendix A - Ethernet Crossover Cable Construction</font></b> </p>
<p>Your cable diagram should be as follows: </p>
<p>&nbsp;&nbsp;&nbsp; Connector A&nbsp;&nbsp;&nbsp;&nbsp; Connector B <br>
&nbsp; <br>
&nbsp;
<table border="1" cols="2" width="30%">
  <tbody>
    <tr align="center">
      <td>Connector A</td>
      <td>Connector B</td>
    </tr>
    <tr>
      <td align="center">Pin #</td>
      <td align="center">Pin #</td>
    </tr>
    <tr align="center">
      <td>1</td>
      <td>3</td>
    </tr>
    <tr align="center">
      <td>2</td>
      <td>6</td>
    </tr>
    <tr align="center">
      <td>3</td>
      <td>1</td>
    </tr>
    <tr align="center">
      <td>6</td>
      <td>2</td>
    </tr>
    <tr align="center">
      <td>4</td>
      <td>7</td>
    </tr>
    <tr align="center">
      <td>5</td>
      <td>8</td>
    </tr>
    <tr align="center">
      <td>7</td>
      <td>4</td>
    </tr>
    <tr align="center">
      <td>8</td>
      <td>5</td>
    </tr>
  </tbody>
</table>
</p>
<p>Rev 1.2.0 <br>
(c) 2003 Rudy Pawul <br>
<a href="mailto:rpawul@iso-ne.com">rpawul@iso-ne.com</a> </p>
</body>
</html>