File: command_line.rst

package info (click to toggle)
python-ruffus 2.6.3%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 20,828 kB
  • ctags: 2,843
  • sloc: python: 15,745; makefile: 180; sh: 14
file content (352 lines) | stat: -rw-r--r-- 11,969 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
.. include:: ../../global.inc
.. include:: manual_chapter_numbers.inc

.. index::
    pair: command line; Tutorial

.. _new_manual.cmdline:

######################################################################################################
|new_manual.cmdline.chapter_num|: Running *Ruffus* from the command line with ruffus.cmdline
######################################################################################################

.. seealso::

   * :ref:`Manual table of Contents <new_manual.table_of_contents>`


We find that much of our *Ruffus* pipeline code is built on the same template and this is generally
a good place to start developing a new pipeline.

From version 2.4, *Ruffus* includes an optional ``Ruffus.cmdline`` module that provides
support for a set of common command line arguments. This makes writing *Ruffus* pipelines much more pleasant.


.. _new_manual.cmdline.get_argparse:

.. _new_manual.cmdline.run:

.. _new_manual.cmdline.setup_logging:

************************************************************************************************************
Template for `argparse  <http://docs.python.org/2.7/library/argparse.html>`__
************************************************************************************************************
    All you need to do is copy these 6 lines


    .. code-block:: python
        :emphasize-lines: 5, 13

        import ruffus.cmdline as cmdline

        parser = cmdline.get_argparse(description='WHAT DOES THIS PIPELINE DO?')

        #   <<<---- add your own command line options like --input_file here
        # parser.add_argument("--input_file")

        options = parser.parse_args()

        #  standard python logger which can be synchronised across concurrent Ruffus tasks
        logger, logger_mutex = cmdline.setup_logging (__name__, options.log_file, options.verbose)

        #   <<<----  pipelined functions go here

        cmdline.run (options)

    You are recommended to use the standard `argparse  <http://docs.python.org/2.7/library/argparse.html>`__ module
    but the deprecated `optparse  <http://docs.python.org/2.7/library/optparse.html>`__ module works as well. (See :ref:`below <code_template.optparse>` for the template)


******************************************************
Command Line Arguments
******************************************************

     ``Ruffus.cmdline`` by default provides these predefined options:

        .. code-block:: bash
            :emphasize-lines: 5,12,15,22

            -v, --verbose
                --version
            -L, --log_file

                # tasks
            -T, --target_tasks
                --forced_tasks
            -j, --jobs
                --use_threads


                # printout
            -n, --just_print

                # flow chart
                --flowchart
                --key_legend_in_graph
                --draw_graph_horizontally
                --flowchart_format


                # check sum
                --touch_files_only
                --checksum_file_name
                --recreate_database


******************************************************
1) Logging
******************************************************

    The script provides for logging both to the command line:

        .. code-block:: bash

            myscript -v
            myscript --verbose

    and an optional log file:

        .. code-block:: bash

            # keep tabs on yourself
            myscript --log_file /var/log/secret.logbook

    Logging is ignored if neither ``--verbose`` or ``--log_file`` are specified on the command line

    ``Ruffus.cmdline`` automatically allows you to write to a shared log file via a proxy from multiple processes.
    However, you do need to use ``logging_mutex`` for the log files to be synchronised properly across different jobs:

        .. code-block:: python

            with logging_mutex:

                logger_proxy.info("Look Ma. No hands")

    Logging is set up so that you can write


=================================
        A) Only to the log file:
=================================

        .. code-block:: python

                logger.info("A message")

=================================
        B) Only to the display:
=================================

        .. code-block:: python

                logger.debug("A message")


.. _new_manual.cmdline.MESSAGE:

======================================
        C) To both simultaneously:
======================================

        .. code-block:: python

                from ruffus.cmdline import MESSAGE

                logger.log(MESSAGE, "A message")


******************************************************
2) Tracing pipeline progress
******************************************************

    This is extremely useful for understanding what is happening with your pipeline, what tasks and which
    jobs are up-to-date etc.

    See :ref:`new_manual.pipeline_printout`

    To trace the pipeline, call script with the following options

        .. code-block:: bash

            # well-mannered, reserved
            myscript --just_print
            myscript -n

            or

            # extremely loquacious
            myscript --just_print --verbose 5
            myscript -n -v5

    Increasing levels of verbosity (``--verbose`` to ``--verbose 5``) provide more detailed output



******************************************************
3) Printing a flowchart
******************************************************

    This is the subject of :ref:`new_manual.pipeline_printout_graph`.

    Flowcharts can be specified using the following option:

        .. code-block:: bash

            myscript --flowchart xxxchart.svg

    The extension of the flowchart file indicates what format the flowchart should take,
    for example, ``svg``, ``jpg`` etc.

    Override with ``--flowchart_format``

******************************************************
4) Running in parallel on multiple processors
******************************************************


    Optionally specify the number of parallel strands of execution and which is the last *target* task to run.
    The pipeline will run starting from any out-of-date tasks which precede the *target* and proceed no further
    beyond the *target*.

        .. code-block:: bash

            myscript --jobs 15 --target_tasks "final_task"
            myscript -j 15




******************************************************************************************************
5) Setup checkpointing so that *Ruffus* knows which files are out of date
******************************************************************************************************

    The :ref:`checkpoint file <new_manual.checkpointing>` uses to the value set in the
    environment (``DEFAULT_RUFFUS_HISTORY_FILE``).

    If this is not set, it will default to ``.ruffus_history.sqlite`` in the current working directory.

    Either can be changed on the command line:

        .. code-block:: bash

            myscript --checksum_file_name mychecksum.sqlite


============================================================================================================================================================
Recreating checkpoints
============================================================================================================================================================

    Create or update the checkpoint file so that all existing files in completed jobs appear up to date

    Will stop sensibly if current state is incomplete or inconsistent

        ::

            myscript --recreate_database

============================================================================================================================================================
Touch files
============================================================================================================================================================

    As far as possible, create empty files with the correct timestamp to make the pipeline appear up to date.

    .. code-block:: bash

        myscript --touch_files_only


******************************************************************************************************
6) Skipping specified options
******************************************************************************************************
    Note that particular options can be skipped (not added to the command line), if they conflict with your own options, for example:

        .. code-block:: python
            :emphasize-lines: 3

            # see below for how to use get_argparse
            parser = cmdline.get_argparse(  description='WHAT DOES THIS PIPELINE DO?',
                                            # Exclude the following options: --log_file --key_legend_in_graph
                                            ignored_args = ["log_file", "key_legend_in_graph"])


******************************************************************************************************
7) Specifying verbosity and abbreviating long paths
******************************************************************************************************

    The verbosity can be specified on the command line

        .. code-block:: bash

            myscript --verbose 5

            # verbosity of 5 + 1 = 6
            myscript --verbose 5 --verbose

            # verbosity reset to 2
            myscript --verbose 5 --verbose --verbose 2

    If the printed paths are too long, and need to be abbreviated, or alternatively, if you want see the full absolute paths of your input and output parameters,
    you can specify an extension to the verbosity. See the manual discussion of :ref:`verbose_abbreviated_path <new_manual.pipeline_printout.verbose_abbreviated_path>` for
    more details. This is specified as ``--verbose VERBOSITY:VERBOSE_ABBREVIATED_PATH``. (No spaces!)

    For example:

        .. code-block:: bash
           :emphasize-lines: 4,7

            # verbosity of 4
            myscript.py --verbose 4

            # display three levels of nested directories
            myscript.py --verbose 4:3

            # restrict input and output parameters to 60 letters
            myscript.py --verbose 4:-60


******************************************************************************************************
8) Displaying the version
******************************************************************************************************
    Note that the version for your script will default to ``"%(prog)s 1.0"`` unless specified:

        .. code-block:: python

            parser = cmdline.get_argparse(  description='WHAT DOES THIS PIPELINE DO?',
                                            version = "my_programme.py v. 2.23")







.. _code_template.optparse:

************************************************************************************************************
Template for `optparse  <http://docs.python.org/2.7/library/optparse.html>`__
************************************************************************************************************

    deprecated since python 2.7

        .. code-block:: python
            :emphasize-lines: 8,16

            #
            #   Using optparse (new in python v 2.6)
            #
            from ruffus import *

            parser = cmdline.get_optgparse(version="%prog 1.0", usage = "\n\n    %prog [options]")

            #   <<<---- add your own command line options like --input_file here
            # parser.add_option("-i", "--input_file", dest="input_file", help="Input file")

            (options, remaining_args) = parser.parse_args()

            #  logger which can be passed to ruffus tasks
            logger, logger_mutex = cmdline.setup_logging ("this_program", options.log_file, options.verbose)

            #   <<<----  pipelined functions go here

            cmdline.run (options)