File: examples.rst

package info (click to toggle)
clustershell 1.9.3-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 2,228 kB
sloc: python: 20,978; makefile: 149
file content (250 lines) | stat: -rw-r--r-- 8,415 bytes
parent folder | download | duplicates (2)
.. _prog-examples:

Programming Examples
====================

.. highlight:: python

.. _prog-example-seq:

Remote command example (sequential mode)
----------------------------------------

The following example shows how to send a command on some nodes, how to get a
specific buffer and how to get gathered buffers::

    from ClusterShell.Task import task_self
    task = task_self()

    task.run("/bin/uname -r", nodes="green[36-39,133]")

    print task.node_buffer("green37")

    for buf, nodes in task.iter_buffers():
            print nodes, buf

    if task.max_retcode() != 0:
        print "An error occurred (max rc = %s)" % task.max_retcode()


Result::

    2.6.32-431.el6.x86_64
    ['green37', 'green38', 'green36', 'green39'] 2.6.32-431.el6.x86_64
    ['green133'] 3.10.0-123.20.1.el7.x86_64
    Max return code is 0

.. _prog-example-ev:

Remote command example with live output (event-based mode)
----------------------------------------------------------

The following example shows how to use the event-based programmation model by
installing an EventHandler and listening for :meth:`.EventHandler.ev_read`
(we've got a line to read) and :meth:`.EventHandler.ev_hup` (one command has
just completed) events. The goal here is to print standard outputs of ``uname
-a`` commands during their execution and also to notify the user of any
erroneous return codes::

    from ClusterShell.Task import task_self
    from ClusterShell.Event import EventHandler

    class MyHandler(EventHandler):

       def ev_read(self, worker, node, sname, msg):
           print "%s: %s" % (node, msg)

       def ev_hup(self, worker, node, rc):
           if rc != 0:
               print "%s: returned with error code %s" % (node, rc)

    task = task_self()

    # Submit command, install event handler for this command and run task
    task.run("/bin/uname -a", nodes="fortoy[32-159]", handler=MyHandler())

.. _prog-example-script:

*check_nodes.py* example script
-------------------------------

The following script is available as an example in the source repository and
is usually packaged with ClusterShell::

    #!/usr/bin/python
    # check_nodes.py: ClusterShell simple example script.
    #
    # This script runs a simple command on remote nodes and report node
    # availability (basic health check) and also min/max boot dates.
    # It shows an example of use of Task, NodeSet and EventHandler objects.
    # Feel free to copy and modify it to fit your needs.
    #
    # Usage example: ./check_nodes.py -n node[1-99]

    import optparse
    from datetime import date, datetime
    import time

    from ClusterShell.Event import EventHandler
    from ClusterShell.NodeSet import NodeSet
    from ClusterShell.Task import task_self


    class CheckNodesResult(object):
        """Our result class"""
        def __init__(self):
            """Initialize result class"""
            self.nodes_ok = NodeSet()
            self.nodes_ko = NodeSet()
            self.min_boot_date = None
            self.max_boot_date = None

        def show(self):
            """Display results"""
            if self.nodes_ok:
                print "%s: OK (boot date: min %s, max %s)" % \
                    (self.nodes_ok, self.min_boot_date, self.max_boot_date)
            if self.nodes_ko:
                print "%s: FAILED" % self.nodes_ko

    class CheckNodesHandler(EventHandler):
        """Our ClusterShell EventHandler"""

        def __init__(self, result):
            """Initialize our event handler with a ref to our result object."""
            EventHandler.__init__(self)
            self.result = result

        def ev_read(self, worker, node, sname, msg):
            """Read event from remote nodes"""
            # this is an example to demonstrate remote result parsing
            bootime = " ".join(msg.strip().split()[2:])
            date_boot = None
            for fmt in ("%Y-%m-%d %H:%M",): # formats with year
                try:
                    # datetime.strptime() is Python2.5+, use old method instead
                    date_boot = datetime(*(time.strptime(bootime, fmt)[0:6]))
                except ValueError:
                    pass
            for fmt in ("%b %d %H:%M",):    # formats without year
                try:
                    date_boot = datetime(date.today().year, \
                        *(time.strptime(bootime, fmt)[1:6]))
                except ValueError:
                    pass
            if date_boot:
                if not self.result.min_boot_date or \
                    self.result.min_boot_date > date_boot:
                    self.result.min_boot_date = date_boot
                if not self.result.max_boot_date or \
                    self.result.max_boot_date < date_boot:
                    self.result.max_boot_date = date_boot
                self.result.nodes_ok.add(node)
            else:
                self.result.nodes_ko.add(node)

        def ev_close(self, worker, timedout):
            """Worker has finished (command done on all nodes)"""
            if timedout:
                nodeset = NodeSet.fromlist(worker.iter_keys_timeout())
                self.result.nodes_ko.add(nodeset)
            self.result.show()

    def main():
        """ Main script function """
        # Initialize option parser
        parser = optparse.OptionParser()
        parser.add_option("-d", "--debug", action="store_true", dest="debug",
                          default=False, help="Enable debug mode")
        parser.add_option("-n", "--nodes", action="store", dest="nodes",
                          default="@all", help="Target nodes (default @all group)")
        parser.add_option("-f", "--fanout", action="store", dest="fanout",
                          default="128", help="Fanout window size (default 128)",
                          type=int)
        parser.add_option("-t", "--timeout", action="store", dest="timeout",
                          default="5", help="Timeout in seconds (default 5)",
                          type=float)
        options, _ = parser.parse_args()

        # Get current task (associated to main thread)
        task = task_self()
        nodes_target = NodeSet(options.nodes)
        task.set_info("fanout", options.fanout)
        if options.debug:
            print "nodeset : %s" % nodes_target
            task.set_info("debug", True)

        # Create ClusterShell event handler
        handler = CheckNodesHandler(CheckNodesResult())

        # Schedule remote command and run task (blocking call)
        task.run("who -b", nodes=nodes_target, handler=handler, \
            timeout=options.timeout)


    if __name__ == '__main__':
        main()

.. _prog-example-pp-sbatch:

Using NodeSet with Parallel Python Batch script using SLURM
-----------------------------------------------------------

The following example shows how to use the NodeSet class to expand
``$SLURM_NODELIST`` environment variable in a Parallel Python batch script
launched by SLURM. This variable may contain folded node sets. If ClusterShell
is not available system-wide on your compute cluster, you need to follow
:ref:`install-pip-user` first.

.. highlight:: bash

Example of SLURM ``pp.sbatch`` to submit using ``sbatch pp.sbatch``::

    #!/bin/bash

    #SBATCH -N 2
    #SBATCH --ntasks-per-node 1

    # run the servers
    srun ~/.local/bin/ppserver.py -w $SLURM_CPUS_PER_TASK -t 300 &
    sleep 10

    # launch the parallel processing
    python -u ./pp_jobs.py

.. highlight:: python

Example of a ``pp_jobs.py`` script::

    #!/usr/bin/env python

    import os, time
    import pp
    from ClusterShell.NodeSet import NodeSet

    # get the nodelist form Slurm
    nodeset = NodeSet(os.environ['SLURM_NODELIST'])

    # start the servers (ncpus=0 will make sure that none is started locally)
    # casting nodelist to tuple/list will correctly expand $SLURM_NODELIST
    job_server = pp.Server(ncpus=0, ppservers=tuple(nodelist))

    # make sure the servers have enough time to start
    time.sleep(5)

    # test function to execute on the remove nodes
    def test_func():
        print os.uname()

    # start the jobs
    job_1 = job_server.submit(test_func,(),(),("os",))
    job_2 = job_server.submit(test_func,(),(),("os",))

    # retrieve the results
    print job_1()
    print job_2()

    # Cleanup
    job_server.print_stats()
    job_server.destroy()