File: logging.rst

package info (click to toggle)
python-ruffus 2.6.3%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 20,828 kB
  • ctags: 2,843
  • sloc: python: 15,745; makefile: 180; sh: 14
file content (221 lines) | stat: -rw-r--r-- 7,587 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
.. include:: ../../global.inc
.. include:: manual_chapter_numbers.inc

.. index::
    pair: logging; Tutorial

.. _new_manual.logging:

######################################################################################################
|new_manual.logging.chapter_num|: Logging progress through a pipeline
######################################################################################################


.. seealso::

   * :ref:`Manual Table of Contents <new_manual.table_of_contents>`

.. note::

    Remember to look at the :ref:`example code <new_manual.logging.code>`

*************************
Overview
*************************

    There are two parts to logging with **Ruffus**:

    * Logging progress through the pipeline

        This produces the sort of output displayed in this manual:

        ::

            >>> pipeline_run([parallel_io_task])
            Task = parallel_io_task
                Job = ["a.1" -> "a.2", "A file"] completed
                Job = ["b.1" -> "b.2", "B file"] unnecessary: already up to date
            Completed Task = parallel_io_task


    * Logging your own messages from within your pipelined functions.

        Because **Ruffus** may run each task function in separate process on a separate
        CPU (multiprocessing), some attention has to be paid to how to send and
        synchronise your log messages across process boundaries.


    We shall deal with these in turn.


.. _new_manual.logging.pipeline:

**********************************
Logging task/job completion
**********************************
    By default, *Ruffus* logs each task and each job as it is completed to
    `sys.stderr  <http://docs.python.org/2/library/sys.html#sys.stderr>`__.

    By default, Ruffus logs to ``STDERR``: :ref:`pipeline_run(logger = stderr_logger) <pipeline_functions.pipeline_run>`.

    If you want to turn off all tracking messages as the pipeline runs, apart from setting ``verbose = 0``, you
    can also use the aptly named Ruffus ``black_hole_logger``:

        .. code-block:: python

            pipeline_run(logger = black_hole_logger)

.. index::
    pair: pipeline_run verbosity; Tutorial

=================================
Controlling logging verbosity
=================================
    :ref:`pipeline_run() <pipeline_functions.pipeline_run>` currently has five levels of verbosity, set by the optional ``verbose``
    parameter which defaults to 1:

        ::

                verbose = 0: nothing
                verbose = 1: logs completed jobs/tasks;
                verbose = 2: logs up to date jobs in incomplete tasks
                verbose = 3: logs reason for running job
                verbose = 4: logs messages useful only for debugging ruffus pipeline code


        ``verbose`` > ``5`` are intended for debugging **Ruffus** by the developers and the details
        are liable to change from release to release


.. index::
    pair: logging with ruffus.cmdline; Tutorial

********************************************************************************
Use :ref:`ruffus.cmdline <new_manual.cmdline>`
********************************************************************************

    As always, it is easiest to use :ref:`ruffus.cmdline <new_manual.cmdline>`.

    Set your script to

        * write messages to ``STDERR`` with the ``--verbose`` option and
        * to a log file with the ``--log_file`` option.

        .. code-block:: python
           :emphasize-lines: 3

            from ruffus import *

            #  Python logger which can be synchronised across concurrent Ruffus tasks
            logger, logger_mutex = cmdline.setup_logging (__name__, options.log_file, options.verbose)

            @transform( ["job1.input"], suffix(".input"), ".output1"),
            def first_task(input_file, output_file):
                pass

            pipeline_run(logger=logger)


.. index::
    pair: logging customising; Tutorial

****************************************
Customising logging
****************************************

    You can also specify exactly how logging works by providing a `logging <http://docs.python.org/library/logging.html>`_ object
    to :ref:`pipeline_run() <pipeline_functions.pipeline_run>` .
    This log object should have ``debug()`` and ``info()`` methods.

    Instead of writing your own, it is usually more convenient to use the python
    `logging <http://docs.python.org/library/logging.html>`_
    module which provides logging classes with rich functionality.

    The :ref:`example code<new_manual.logging.code>` sets up a logger to a rotating set of files


.. index::
    pair: logging your own message; Tutorial

.. _new_manual.logging.per_job:

****************************************
Log your own messages
****************************************

    You need to take a little care when logging your custom messages *within* your pipeline.

    * If your Ruffus pipeline may run in parallel, make sure that logging is synchronised.
    * If your Ruffus pipeline may run across separate processes, send your logging object across process boundaries.


    `logging <http://docs.python.org/library/logging.html>`_ objects can not be
    `pickled <http://docs.python.org/library/pickle.html>`_ and shared naively across
    processes. Instead, we need to create proxies which forward the logging to a single
    shared log.

    The :ref:`ruffus.proxy_logger <proxy-logger>` module provides an easy way to share
    `logging <http://docs.python.org/library/logging.html>`_ objects among
    jobs. This requires just two simple steps:




.. note::

    * This is a good template for sharing `non-picklable objects <http://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled>`_
      across processes.


.. _new_manual.sharing_proxy_object:


============================================================
    1. Set up logging
============================================================

    Things are easiest if you are using ``ruffus.cmdline``:

    .. code-block:: python

        #  standard python logger which can be synchronised across concurrent Ruffus tasks
        logger, logger_mutex = cmdline.setup_logging (__name__, options.log_file, options.verbose)


    Otherwise, manually:

    .. code-block:: python


        from ruffus.proxy_logger import *
        (logger,
         logging_mutex) = make_shared_logger_and_proxy (setup_std_shared_logger,
                                                        "my_logger",
                                                        {"file_name" :"/my/lg.log"})

============================================================
    2. Share the proxy
============================================================
        Now, pass:

            * ``logger`` (which forwards logging calls across jobs) and
            * ``logging_mutex`` (which prevents different jobs which are logging simultaneously
              from being jumbled up)

        to each job:

        .. code-block:: python
            :emphasize-lines: 4,6,9

            @transform( initial_file,
                        suffix(".input"),
                        ".output1",
                        logger, logging_mutex),         # pass log and synchronisation as parameters
            def first_task(input_file, output_file,
                        logger, logging_mutex):         # pass log and synchronisation as parameters
                pass

                # synchronise logging
                with logging_mutex:
                    logger.info("Here we go logging...")