File: README

package info (click to toggle)
ganglia 3.6.0-7
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, buster, sid, stretch
  • size: 6,484 kB
  • ctags: 3,880
  • sloc: ansic: 27,874; sh: 11,052; python: 6,695; makefile: 565; perl: 366; php: 126; xml: 28
file content (894 lines) | stat: -rw-r--r-- 41,952 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
Name
    ganglia - distributed monitoring system

Version
    ganglia 3.6.0

    The latest version of this software and document will always be found at
    http://ganglia.sourceforge.net/.

Synopsis
         ______                  ___
        / ____/___ _____  ____ _/ (_)___ _
       / / __/ __ `/ __ \/ __ `/ / / __ `/
      / /_/ / /_/ / / / / /_/ / / / /_/ /
      \____/\__,_/_/ /_/\__, /_/_/\__,_/
                       /____/ Distributed Monitoring System

    Ganglia is a scalable distributed monitoring system for high-performance
    computing systems such as clusters and Grids. It is based on a
    hierarchical design targeted at federations of clusters. It relies on a
    multicast-based listen/announce protocol to monitor state within
    clusters and uses a tree of point-to-point connections amongst
    representative cluster nodes to federate clusters and aggregate their
    state. It leverages widely used technologies such as XML for data
    representation, XDR for compact, portable data transport, and RRDtool
    for data storage and visualization. It uses carefully engineered data
    structures and algorithms to achieve very low per-node overheads and
    high concurrency. The implementation is robust, has been ported to an
    extensive set of operating systems and processor architectures, and is
    currently in use on over 500 clusters around the world. It has been used
    to link clusters across university campuses and around the world and can
    scale to handle clusters with 2000 nodes.

    The ganglia system is comprised of two unique daemons, a PHP-based web
    frontend and a few other small utility programs.

    Ganglia Monitoring Daemon (gmond)
        Gmond is a multi-threaded daemon which runs on each cluster node you
        want to monitor. Installation is easy. You don't have to have a
        common NFS filesystem or a database backend, install special
        accounts, maintain configuration files or other annoying hassles.

        Gmond has four main responsibilities: monitor changes in host state,
        announce relevant changes, listen to the state of all other ganglia
        nodes via a unicast or multicast channel and answer requests for an
        XML description of the cluster state.

        Each gmond transmits in information in two different ways:
        unicasting/multicasting host state in external data representation
        (XDR) format using UDP messages or sending XML over a TCP
        connection.

    Ganglia Meta Daemon (gmetad)
        Federation in Ganglia is achieved using a tree of point-to-point
        connections amongst representative cluster nodes to aggregate the
        state of multiple clusters. At each node in the tree, a Ganglia Meta
        Daemon ("gmetad") periodically polls a collection of child data
        sources, parses the collected XML, saves all numeric, volatile
        metrics to round-robin databases and exports the aggregated XML over
        a TCP sockets to clients. Data sources may be either "gmond"
        daemons, representing specific clusters, or other "gmetad" daemons,
        representing sets of clusters. Data sources use source IP addresses
        for access control and can be specified using multiple IP addresses
        for failover. The latter capability is natural for aggregating data
        from clusters since each "gmond" daemon contains the entire state of
        its cluster.

    Ganglia PHP Web Frontend
        The Ganglia web frontend provides a view of the gathered information
        via real-time dynamic web pages. Most importantly, it displays
        Ganglia data in a meaningful way for system administrators and
        computer users. Although the web frontend to ganglia started as a
        simple HTML view of the XML tree, it has evolved into a system that
        keeps a colorful history of all collected data.

        The Ganglia web frontend caters to system administrators and users.
        For example, one can view the CPU utilization over the past hour,
        day, week, month, or year. The web frontend shows similar graphs for
        Memory usage, disk usage, network statistics, number of running
        processes, and all other Ganglia metrics.

        The web frontend depends on the existence of the "gmetad" which
        provides it with data from several Ganglia sources. Specifically,
        the web frontend will open the local port 8651 (by default) and
        expects to receive a Ganglia XML tree. The web pages themselves are
        highly dynamic; any change to the Ganglia data appears immediately
        on the site. This behavior leads to a very responsive site, but
        requires that the full XML tree be parsed on every page access.
        Therefore, the Ganglia web frontend should run on a fairly powerful,
        dedicated machine if it presents a large amount of data.

        The Ganglia web frontend is written in the PHP scripting language,
        and uses graphs generated by "gmetad" to display history
        information. It has been tested on many flavours of Unix (primarily
        Linux) with the Apache webserver and the PHP module (5.0.0 or
        later). The GD graphics library for PHP is used to generate pie
        charts in the frontend and needs to be installed separately. On
        RPM-based system, it is usually provided by the php-gd package.

Installation
    The latest version of all ganglia software can always be downloaded from
    http://ganglia.info/

    Ganglia runs on Linux (i386, x86_64, ia64, sparc, alpha, powerpc, m68k,
    mips, arm, hppa, s390), FreeBSD, NetBSD, OpenBSD, DragonflyBSD, MacOS X,
    Solaris, AIX, IRIX, Tru64, HPUX and Windows NT/XP/2000/2003/2008 making
    it as portable as it is scalable.

  Monitoring Core Installation
    If you use the Linux RPMs provided on the ganglia web site, you can skip
    to the end of this section.

    Ganglia uses the GNU autoconf so compilation and installation of the
    monitoring core is basically

      % ./configure
      % make
      % make install

    but there are some issues that you need to take a look at first.

    Kernel multicast support
        If you use the ganglia multicast support, you must have a kernel
        that supports multicast. The vast majority of machines have
        multicast support by default. If you have problems with ganglia this
        is a core issue.

    Gmetad is not installed by default
        Since "gmetad" relies on the Round-Robin Database Tool ( see
        http://www.rrdtool.org/ ) it will not be compiled unless you
        explicit request it by using a --with-gmetad flag.

          % ./configure --with-gmetad

        The configure script will fail if it cannot find the rrdtool library
        and header files. By default, it expects to find them at
        /usr/include/rrd.h and /usr/lib/librrd.so. If you installed them in
        different locations then you need to instruct configure where to
        find them using:

          % ./configure --with-librrd=/rrd/path --with-gmetad

        Of course, you need to substitute "/rrd/path" with the real location
        of the rrd tool directory where the header file can be located
        inside an include subdirectory and the library can be located inside
        a lib subdirectory. As an alternative you could set "-L" in LDFLAGS,
        and "-I" in CFLAGS and CPPFLAGS for the library path and the header
        path respectively.

    AIX should not be compiled with shared libraries
        You must add the "--disable-shared" configure flags if you are
        running on AIX. For more details refer to the README.AIX file

          % ./configure --disable-shared

    Solaris dependencies could be problematic
        Not really a Solaris specific problem, but since Solaris has several
        different package repositories, all of them unofficial, it is
        difficult to be sure that all possible permutations have been
        confirmed to work reliably.

        Be sure to have all dependencies covered, as explained in the
        INSTALL file and to use GNU make and a gcc compiler that builds
        32bit binaries with all other libraries matching that ISA.

        When in doubt, build the problematic dependency from source and
        remember to distribute it together with your ganglia build as
        everything is dynamically linked by default.

        Be particularly careful with libConfuse, especially if using the old
        2.5 version. LibConfuse 2.5 is known to be incorrectly packaged and
        to compile by default as a static library which will fail to link
        with ganglia.

    Propietary *NIX systems might not work at all
        The good news is that the libmetrics code that used to work before
        3.1 is still most likely working fine and so there is nothing
        fundamentally broken about it.

        But the bad news is that in order to add the dynamic metric
        functionality, the build system and the way gmond used to locate its
        metrics had to be changed significantly. Therefore getting gmond to
        build and work again required fixes to be implemented for all
        platforms.

        Since none of the developers had access to HPUX, IRIX, Tru64
        (OSF/1), or Darwin (MacOS X) those platforms might not be able to
        build or run a 3.1 gmond yet. If you have access to any of these
        platforms and want to run ganglia 3.1, feel free to drop by the
        ganglia-developers list with suggestions, or even better patches.

    GEXEC confusion
        GEXEC is a scalable cluster remote execution system which provides
        fast, RSA authenticated remote execution of parallel and distributed
        jobs. It provides transparent forwarding of stdin, stdout, stderr,
        and signals to and from remote processes, provides local environment
        propagation, and is designed to be robust and to scale to systems
        over 1000 nodes. Internally, GEXEC operates by building an n-ary
        tree of TCP sockets and threads between gexec daemons and
        propagating control information up and down the tree. By using
        hierarchical control, GEXEC distributes both the work and resource
        usage associated with massive amounts of parallelism across multiple
        nodes, thereby eliminating problems associated with single node
        resource limits (e.g., limits on the number of file descriptors on
        front-end nodes). (from http://www.theether.org/gexec )

        "gexec" is a great cluster execution tool but integrating it with
        ganglia is a bit clumsy. GEXEC can run standalone without access to
        a ganglia "gmond". In standalone mode gexec will use the hosts
        listed in your GEXEC_SVRS variable to run on. For example, say I
        want to run "hostname" on three machines in my cluster: "host1",
        "host2" and "host3". I use the following command line.

          % GEXEC_SVRS="host1 host2 host3" gexec -n 3 hostname

        and gexec would build an n-ary tree (binary tree by default) of TCP
        sockets to those machines and run the command "hostname"

        As an added feature, you can have "gexec" pull a host list from a
        locally running gmond and use that as the host list instead of
        GEXEC_SVRS. The list is load balanced and "gexec" will start the job
        on the *n* least-loaded machines.

        For example..

          % gexec -n 5 hostname

        will run the command "hostname" on the five least-loaded machines in
        a cluster.

        To turn on the "gexec" feature in ganglia you must configure ganglia
        with the "--enable-gexec" flag

          % ./configure --enable-gexec

        Enabling "gexec" means that by default any host running gmond will
        send a special message announcing that gexec is installed on it and
        open for requests.

        Now the question is, what if I don't want gexec to run on every host
        in my cluster? For example, you may not want to have "gexec" run
        jobs on your cluster frontend nodes.

        You simply add the following line to your "gmond" configuration file
        ("/etc/ganglia/gmond.conf" by default)

          no_gexec on

        Simple huh? I know the configuration file option, "no_gexec", seems
        crazy (and it is). Why have an option that says "yes to no gexec"?
        The early versions of gmond didn't use a configuration file but
        instead commandline options. One of the commandline options was
        simply "--no-gexec" and the default was to announce gexec as on.

    Once you have successfully run

      % ./configure <options>
      % make
      % make install

    you should find the following files installed in "/usr" (by default).

      /usr/bin/gstat
      /usr/bin/gmetric
      /usr/sbin/gmond
      /usr/sbin/gmetad

    If you installed ganglia using RPMs then these files will be installed
    when you install the RPM. The RPM is installed simply by running

      % rpm -Uvh ganglia-gmond-3.6.0.i386.rpm
      % rpm -Uvh ganglia-gmetad-3.6.0.i386.rpm

    Once you have the necessary binaries installed, you can test your
    installation by running

       % ./gmond

    This will start the ganglia monitoring daemon. You should then be able
    to run

       % telnet localhost 8649

    And get an XML description of the state of your machine (and any other
    hosts running gmond at the time).

    If you are installing by source on Linux, scripts are provided to start
    "gmetad" and "gmond" at system startup. They are easy to install from
    the source root.

       % cp ./gmond/gmond.init /etc/rc.d/init.d/gmond
       % chkconfig --add gmond
       % chkconfig --list gmond
         gmond              0:off   1:off   2:on    3:on    4:on    5:on    6:off
       % /etc/rc.d/init.d/gmond start
         Starting GANGLIA gmond:                                    [  OK  ]

    Repeat this step with gmetad.

  PHP Web Frontend Installation
    1.  The ./web directory of the ganglia distribution contains all the
        necessary PHP files for running your web frontend. Copy those files
        to "/var/www/html", however look for the variable "DocumentRoot" in
        your Apache configuration files to be sure. All the PHP script files
        use relative URLs in their links, so you may place the "ganglia/"
        directory anywhere convenient.

    2.  Ensure your webserver understands how to process PHP script files.
        Currently, the web frontend contains certain php language that
        requires PHP version 5 or greater. Processing PHP script files
        usually requires a webserver module, such as the "mod_php" for the
        popular Apache webserver. In RedHat Linux, the RPM package that
        provides this module is called simply "php".

        For Apache, "mod_php" module must be enabled. The following lines
        should appear somewhere in Apache's *conf files. This example
        applies to Red Hat Linux (and clones). The actual filenames may vary
        on your system. If you installed the php module using an RPM
        package, this work will have been done automatically.

          LoadModule php5_module modules/libphp5.so

          AddHandler php5-script .php
          AddType text/html .php

    3.  The webfrontend requires the existance of the gmetad package on the
        webserver. Follow the installation instructions on the gmetad page.
        Specifically, the webfrontend requires the rrdtool and the "rrds/"
        directory from gmetad. If you are a power user, you may use NFS to
        simulate the local existance of the rrds.

    4.  Test your installation. Visit the URL:

          http://localhost/ganglia/

        With a web-browser, where localhost is the address of your
        webserver.

    Installation of the web frontend is simplified on Linux by using rpm.

      % rpm -Uvh ganglia-web-3.6.0-1.noarch.rpm
      Preparing...                ########################################### [100%]
         1:ganglia-web            ########################################### [100%]

Configuration
  Gmond Configuration
    The configuration file format has changed between gmond version 2.5.x
    and version 3.x. The change was necessary in order to allow more complex
    configuration options.

    Gmond has a default configuration it will use if it does not find the
    default configuration file /etc/ganglia/gmond.conf. To see the default
    configuration simply run the command:

      % gmond --default_config

    and gmond will output its default configuration to stdout. This default
    configuration can serve as a good starting place for building a more
    custom configuration.

      % gmond --default_config > gmond.conf

    would create a file gmond.conf which you can then edit to taste and copy
    to /etc/ganglia/gmond.conf or elsewhere.

    To start gmond with a configuration file other then
    /etc/ganglia/gmond.conf, simply specify the configuration file location
    by running

      % gmond --config /my/ganglia/configs/custom.conf

    If you want to convert a 2.5.x configuration file to 3.x file format,
    run the following command

      % gmond --convert ./old_25_config.conf

    and gmond with output the equivalent 3.x configuration file to stdout.
    You can then redirect that output to a new configuration file which can
    serve as a starting point for your configuration.

      % gmond --convert ./old_25_config.conf > ./new_26_config.conf

    For details about gmond configuration options, simply run

      % man gmond.conf

    for a complete listing of options with detailed explanations.

  Gmetad Configuration
    The behavior of the Ganglia Meta Daemon is completely controlled by a
    single configuration file which is by default
    "/etc/ganglia/gmetad.conf". For gmetad to do anything useful you much
    specify at least one "data_source" in the configuration. The format of
    the data_source line is as follows

      data_source "Cluster A" 127.0.0.1  1.2.3.4:8655  1.2.3.5:8625
      data_source "Cluster B" 1.2.4.4:8655

    In this example, there are two unique data sources: "Cluster A" and
    "Cluster B". The Cluster A data source has three redundant sources. If
    gmetad cannot pull the data from the first source, it will continue
    trying the other sources in order.

    If you do not specify a port number, gmetad will assume the default
    ganglia port which is 8649 (U*N*I*X on a phone key pad)

    For a sample gmetad configuration file with comments, look at the
    gmetad.conf file provided as part of the distribution package in the
    gmetad directory

    "gmetad" has a "--conf" option to allow you to specify alternate
    configuration files

      % ./gmetad -conf=/tmp/my_custom_config.conf

  PHP Web Frontend Configuration
    Most configuration parameters reside in the "ganglia/conf.php" file.
    Here you may alter the template, gmetad location, RRDtool location, and
    set the default time range and metrics for graphs.

    The static portions of the Ganglia website are themable. This means you
    can alter elements such as section lables, some links, and images to
    suit your individual tastes and environment. The "template_name"
    variable names a directory containing the current theme. Ganglia uses
    TemplatePower to implement themes. A user-defined skin must conform to
    the template interface as defined by the default theme. Essentially, the
    variable names and START/END blocks in a custom theme must remain the
    same as the default, but all other HTML elements may be changed.

    Other configuration variables in "conf.php" specify the location of
    gmetad's files, and where to find the rrdtool program. These locations
    need only be changed if you do not run gmetad on the webserver.
    Otherwise the default locations should work fine. The "default_range"
    variable specifies what range of time to show on the graphs by default,
    with possible values of hour, day, week, month, year. The
    "default_metric" parameter specifies which metric to show on the cluster
    view page by default.

Commandline Tools
    There are two commandline tools that work with "gmond" to add custom
    metrics and query the current state of a cluster: "gmetric" and "gstat"
    respectively.

  Gmetric
    The Ganglia Metric Tool (gmetric) allows you to easily monitor any
    arbitrary host metrics that you like expanding on the core metrics that
    gmond measures by default.

    If you want help with the gmetric sytax, simply use the "help"
    commandline option

      % gmetric --help
      gmetric 3.6.0

      Purpose:
        The Ganglia Metric Client (gmetric) announces a metric
        on the list of defined send channels defined in a configuration file

      Usage: gmetric [OPTIONS]...

        -h, --help          Print help and exit
        -V, --version       Print version and exit
        -c, --conf=STRING   The configuration file to use for finding send channels
                            (default=`/etc/ganglia/gmond.conf')
        -n, --name=STRING   Name of the metric
        -v, --value=STRING  Value of the metric
        -t, --type=STRING   Either
                            string|int8|uint8|int16|uint16|int32|uint32|float|double
        -u, --units=STRING  Unit of measure for the value e.g. Kilobytes, Celcius
                            (default=`')
        -s, --slope=STRING  Either zero|positive|negative|both  (default=`both')
        -x, --tmax=INT      The maximum time in seconds between gmetric calls
                            (default=`60')
        -d, --dmax=INT      The lifetime in seconds of this metric  (default=`0')
        -S, --spoof=STRING  IP address and name of host/device (colon separated) we
                              are spoofing  (default='')
        -H, --heartbeat     spoof a heartbeat message (use with spoof option)

    Gmetric sends the metric specified on the commandline to all
    udp_send_channels specified in the configuration file
    /etc/ganglia/gmond.conf by default. If you want to send metric to
    alternate udp_send_channels, you can specify a different configuration
    file as such:

      % gmetric --conf=./custom.conf -n "wow" -v "it works" -t "string"

    All metrics in ganglia have a name, value, type and optionally units.
    For example, say I wanted to measure the temperature of my CPU
    (something gmond doesn't do by default) then I could send this metric
    with name="temperature", value="63", type="int16" and units="Celcius".

    Assume I have a program called "cputemp" which outputs in text the
    temperature of the CPU

      % cputemp
      63

    I could easily send this data to all listening gmonds by running

      % gmetric --name temperature --value `cputemp` --type int16 --units Celcius

    Check the exit value of gmetric to see if it successfully sent the data:
    0 on success and -1 on failure.

    To constantly sample this temperature metric, you just need too add this
    command to your cron table.

  Gstat
    The Ganglia Cluster Status Tool (gstat) is a commandline utility that
    allows you to get status report for your cluster.

    To get help with the commandline options, simply pass "gstat" the
    "--help" option

      % gstat --help
      gstat 3.6.0

      Purpose:
        The Ganglia Status Client (gstat) connects with a
        Ganglia Monitoring Daemon (gmond) and output a load-balanced list
        of cluster hosts

      Usage: gstat [OPTIONS]...
         -h         --help             Print help and exit
         -V         --version          Print version and exit
         -a         --all              List all hosts.  Not just hosts running gexec (default=off)
         -d         --dead             Print only the hosts which are dead (default=off)
         -m         --mpifile          Print a load-balanced mpifile (default=off)
         -1         --single_line      Print host and information all on one line (default=off)
         -l         --list             Print ONLY the host list (default=off)
         -n         --numeric          Print numeric addresses instead of hostnames (default=off)
         -iSTRING   --gmond_ip=STRING  Specify the ip address of the gmond to query (default='127.0.0.1')
         -pINT      --gmond_port=INT   Specify the gmond port to query (default=8649)

    Note: gstat with no option will only show gexec-enabled hosts. To see
    all hosts that are UP (regardless of their gexec state) you need to add
    the --all flag.

      % gstat --all

Extending Ganglia through metric modules
    There are currently two ways in which metric modules can be written and
    plugged into Gmond in order to extend the types of metrics that Ganglia
    is able to monitor. As of Ganglia 3.1, a pluggable interface has been
    added to allow the Gmond metric gathering agent to collect any type of
    metric that can be acquired through programatic means. The primary
    metric module interface is C with a secondary python interface. This
    means that pluggable modules can either be written and compiled into
    dynamically loadable C based language modules or written and deployed as
    python pluggable modules.

    The basic steps when writting a pluggable module either in C or in
    python, is as follows:

    1. Create a module definition structure that contains callback data and
    metric information
    2. Implement 3 callback functions that will serve as the links between
    the Gmond metric gathering agent and the metric module. These callback
    functions include module initialization, metric handler and module
    cleanup.

    There are simple metric module examples for both a C based and a python
    based module under the gmond/modules and gmond/python_modules source
    code sub-trees. Please see these module examples for more details.

Frequently Asked Questions (FAQ)
    What metrics does ganglia collect on platform x?
        To see a complete list of the metrics that a particular gmond
        supports, run the command:

          % gmond -m

        and gmond will output all the metrics that it is capable of
        collecting and sending.

        This table describes all the metrics that ganglia collects and shows
        what platforms the metric are supported on. (The following table is
        only partially complete).

          Metric Name    Description                             Platforms
          -----------------------------------------------------------------------
          boottime      System boot timestamp                    l,f
          bread_sec
          bwrite_sec
          bytes_in      Number of bytes in per second            l,f
          bytes_out     Number of bytes out per second           l,f
          cpu_aidle     Percent of time since boot idle CPU      l
          cpu_arm
          cpu_avm
          cpu_idle      Percent CPU idle                         l,f
          cpu_intr
          cpu_nice      Percent CPU nice                         l,f
          cpu_num       Number of CPUs                           l,f
          cpu_rm
          cpu_speed     Speed in MHz of CPU                      l,f
          cpu_ssys
          cpu_system    Percent CPU system                       l,f
          cpu_user      Percent CPU user                         l,f
          cpu_vm
          cpu_wait
          cpu_wio
          disk_free     Total free disk space                    l,f
          disk_total    Total available disk space               l,f
          load_fifteen  Fifteen minute load average              l,f
          load_five     Five minute load average                 l,f
          load_one      One minute load average                  l,f
          location      GPS coordinates for host                 e
          lread_sec
          lwrite_sec
          machine_type
          mem_buffers   Amount of buffered memory                l,f
          mem_cached    Amount of cached memory                  l,f
          mem_free      Amount of available memory               l,f
          mem_shared    Amount of shared memory                  l,f
          mem_sreclaimable    Amount of slab reclaimable memory  l (kernel >= 2.6.19)
          mem_total     Amount of available memory               l,f
          mtu           Network maximum transmission unit        l,f
          os_name       Operating system name                    l,f
          os_release    Operating system release (version)       l,f
          part_max_used Maximum percent used for all partitions  l,f
          phread_sec
          phwrite_sec
          pkts_in       Packets in per second                    l,f
          pkts_out      Packets out per second                   l,f
          proc_run      Total number of running processes        l,f
          proc_total    Total number of processes                l,f
          rcache
          swap_free     Amount of available swap memory          l,f
          swap_total    Total amount of swap memory              l,f
          sys_clock     Current time on host                     l,f
          wcache

          Platform key:
          l = Linux, f = FreeBSD, a = AIX, c = Cygwin
          m = MacOS, i = IRIX, h = HPUX,  t = Tru64
          e = Every Platform

        If you are interested in how the metrics are collected, just take a
        look in directory "./libmetrics" in the source distribution. There
        is a directory for each platform that is supported.

    What does the error "Process XML (x): XML_ParseBuffer() error at line x:
    not well-formed"
        This is an error that occurs when a ganglia components reads data
        from another ganglia component and finds that the XML is not
        well-formed. The most common time this is a problem is when the PHP
        web frontend tries to read the XML stream from gmetad.

        To troubleshoot this problem, capture an XML from the ganglia
        component in question (gmetad/gmond). This is easy to do if you have
        telnet installed. Simply login to the machine running the component
        and run.

          % telnet localhost 8651

        By default, gmetad exports its XML on port 8651 and gmond exports
        its XML on port 8649. Modify the port number above to suite your
        configuration.

        When you connect to the port you should get an XML stream. If not,
        look in the process table on the machine to ensure that the
        component is actually running.

        Once you are getting an XML stream, capture it to a file by running.

          % telnet localhost 8651 > XML.txt
          Connection closed by foreign host.

        If you open the file "XML.txt", you will see the captured XML
        stream. You will need to remove the first three lines of the
        "XML.txt" which will read...

          Trying 127.0.0.1...
          Connected to localhost.
          Escape character is '^]'.

        Those lines are output from "telnet" and not the ganglia component
        (I wish telnet would send those messages to "stderr" but they are
        send to "stdout").

        There are many ways that XML can be misformed. The great tool for
        validating XML is "xmllint". "xmllint" will read the file and find
        the line containing the error.

          % xmllint --valid --noout XML.txt

        will read your captured XML stream, validate it against the ganglia
        DTD and check that it is well-formed XML. "xmllint" will quiet exit
        if there are no errors. If there are errors they will be reported
        with line numbers. For example...

          /tmp/XML.txt:3393: error: Opening and ending tag mismatch: HOST and CLUSTER
          </CLUSTER>
                 ^
          /tmp/XML.txt:3394: error: Opening and ending tag mismatch: CLUSTER and GANGLIA_XML
          </GANGLIA_XML>
                     ^
          /tmp/XML.txt:3395: error: Premature end of data in tag GANGLIA_XML

        If you get errors, open "XML.txt" and go to the line numbers in
        question. See if you can understand based on your configuration how
        these errors could occur. If you cannot fix the problem yourself,
        please email your "XML.txt" and output from "xmllint" to
        "ganglia-developers@lists.sourceforge.net". Please include
        information about the version of each component in question along
        with the operating system they are running on. The more details we
        have about your configuration the more likely it is we will be able
        to help you. Also, all mailing to "ganglia-developers" is archiving
        and available to read on the web. You may want to modify "XML.txt"
        to remove any sensitive information.

    How do I remove a host from the list?
        A common problem that people have is not being able to remove a host
        from the ganglia web frontend.

        Here is a common scenario

        1. All hosts in a cluster are send on the ganglia udp_send_channels.
        2. One of the hosts fails or is moved for whatever reason.
        3. All the hosts in the cluster report that the host is "dead" or
        "expired".
        4. The sysadmin wants to removed this host from the "dead" list.

        Unfortunately there is currently no nice way to remove a single dead
        host from the list. All data in gmond is soft state so you will need
        to restart all gmond and gmetad processes. It is important to note
        that ALL dead hosts will be flushed from the record by restarting
        the processes (since they have to hear the host at least once to
        know it is expired).

        If you add the line

          globals {
            host_dmax = 3600
          }

        then hosts will be removed from host tables when they haven't been
        heard from in 3600 seconds. See "man gmond.conf" for details.

    How good is Solaris, IRIX, Tru64 support?
        Here is an email from Steve Wagner about the state of the ganglia on
        Solaris, IRIX and Tru64. Steve is to thank for porting ganglia to
        Solaris and Tru64. He also helped with the IRIX port.

           State of the IRIX port:
   
           *  CPU percentage stuff hasn't improved despite my efforts.  I fear there
              may be a flaw in the way I'm summing counters for all the CPUs.
           *  Auto-detection of network interfaces apparently segfaults.
           *  Memory and load reporting appear to be running properly.
           *  CPU speed is not being reported properly on multi-proc machines.
           *  Total/running processes are not reported.
           *  gmetad untested.
           *  Monitoring core apparently stable in foreground, background being tested
           (had a segfault earlier).
   
           State of the Tru64 port:
   
           *  CPU percentage stuff here works perfectly.
           *  Memory and swap usage stats are suspected to be inaccurate.
           *  Total/running processes are not reported.
           *  gmetad untested.
           *  Monitoring core apparently stable in foreground and background.
   
           State of the Solaris port:
           *  CPU percentages are slightly off, but correct enough for trending
              purposes.
           *  Load, ncpus, CPU speed, breads/writes, lreads/writes, phreads/writes,
              and rcache/wcache are all accurate.
           *  Memory/swap statistics are suspiciously flat, but local stats bear
              this out (and they *are* being updated) so I haven't investigated
              further.
           *  Total processes are counted, but not running ones.
           *  gmetad appears stable
   
           Anyway, all three ports I've been messing with are usable and fairly
           stable.  Although there are areas for improvement I think we really can't
           keep hogging all this good stuff - what I'm looking at is ready for
           release.

    Where are the debian packages?
        Debian packages for 2.5 are available from the main Debian archive
        for all releases.

        There was never an oficial Debian package for 3.0 and so if you need
        to use that branch you will need to build your own binaries.

        Packages for 3.1 are available from Debian (and therefore derivative
        distributions like Ubuntu) standard repositories.

    How should I configure multihomed machines?
        Various issues arise when a multihomed machine is running the gmond
        agent.

        Sending multicast packets out on the right interface: the mcast_if
        option can be declared in the udp_send_channel to force outgoing
        multicast packets to use a particular interface. The system
        administrator may also be able to make other platform-specific
        configuration settings through the OS to achieve the desired
        behaviour.

        Ensuring that outgoing metric packets are always sent with the same
        source address: the bind and bind_hostname parameters are the
        solution. Either (but not both) of these can be specified in the
        udp_send_channel if required. See the gmond.conf man page for
        details.

        Previous advice given in this document suggested adding a route like
        so:

        route add -host 239.2.11.71 dev eth1

        and this method is still valid, but it will be over-ridden by the
        bind or bind_hostname setting.

    How should I configure my Cisco Catalyst Switches?
        Perhaps information regarding gmond on networks set up through cisco
        catalyst switches should be mentioned in the ganglia documentation.
        I think by default multicast traffic on the catalyst will flood all
        devices unless configured properly. Here is a relavent snipet from a
        message forum, with a link to cisco document.

        If what you are trying to do, is minimizing the impact on your
        network due to a multicast application, this link may describe what
        you want to do: http://www.cisco.com/warp/public/473/38.html

        We set up our switches according to this after a consultant came in
        and installed an application multicasting several hundred packets
        per second. This made the network functional again.

Getting Support
      The tired and thirsty prospector threw himself down at the edge of the 
      watering hole and started to drink. But then he looked around and saw 
      skulls and bones everywhere. "Uh-oh," he thought. "This watering hole 
      is reserved for skeletons." --Jack Handey

    There are three mailing lists available to you: "ganglia-general",
    "ganglia-developers" and "ganglia-announce". You can join these lists or
    read their archives by visiting
    https://sourceforge.net/mail/?group_id=43021

    "All of the ganglia mailing lists are closed". That means that in order
    to post to the lists, you must be subscribed to the list. We're sorry
    for the inconvenience however it is very easy to subscribe and
    unsubscribe from the lists. We had to close the mailing lists because of
    SPAM problems.

    When you need help please follow these steps until your problem is
    resolved.

    1.  completely read the documentation

    2.  check the "ganglia-general" archive to see if other people have had
        the same problem

    3.  post your support request to the "ganglia-general" mailing list

    4.  check the "ganglia-developers" archive

    5.  post your question to the "ganglia-developers" list

    please send all bugs, patches, and feature requests to the
    "ganglia-developers" list after you have checked the
    "ganglia-developers" archive to see if the question has already been
    asked and answered.

Copyright
      Copyright (C) 2002,2003 University of California, Berkeley

Authors
    The Ganglia Development Team...

     Bas van der Vlies      basv               Developer    basv at users.sourceforge.net 
     Neil T. Spring         bluehal            Developer    bluehal at users.sourceforge.net
     Brooks Davis           brooks_en_davis    Developer    brooks_en_davis at users.sourceforge.net
     Eric Fraser            fraze              Developer    fraze at users.sourceforge.net 
     greg bruno             gregbruno          Developer    gregbruno at users.sourceforge.net
     Jeff Layton            laytonjb        Developer       laytonjb at users.sourceforge.net       
     Doc Schneider          maddocbuddha    Developer       maddocbuddha at users.sourceforge.net 
     Mason Katz             masonkatz       Developer       masonkatz at users.sourceforge.net      
     Mike Howard            mhoward         Developer       mhoward at users.sourceforge.net        
     Matt Massie            massie          Project Admin   massie at users.sourceforge.net
     Oliver Mössinger      olivpass        Developer       olivpass at users.sourceforge.net       
     Preston Smith          pmsmith         Developer       pmsmith at users.sourceforge.net        
     Federico David Sacerdoti sacerdoti     Developer       sacerdoti at users.sourceforge.net      
     Tim Cera               timcera         Developer       timcera at users.sourceforge.net        
     Mathew Benson          wintermute11    Developer       wintermute11 at users.sourceforge.net   
     Brad Nicholes          bnicholes       Developer       bnicholes at users.sourceforge.net
     Carlo Arenas           carenas         Developer       carenas at users.sourceforge.net

Contributors
    There have been dozens of contributors who have provided patches and
    helpful bug reports. We need to list them here later.