1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352
|
.. include:: ../../global.inc
.. include:: manual_chapter_numbers.inc
.. index::
pair: command line; Tutorial
.. _new_manual.cmdline:
######################################################################################################
|new_manual.cmdline.chapter_num|: Running *Ruffus* from the command line with ruffus.cmdline
######################################################################################################
.. seealso::
* :ref:`Manual table of Contents <new_manual.table_of_contents>`
We find that much of our *Ruffus* pipeline code is built on the same template and this is generally
a good place to start developing a new pipeline.
From version 2.4, *Ruffus* includes an optional ``Ruffus.cmdline`` module that provides
support for a set of common command line arguments. This makes writing *Ruffus* pipelines much more pleasant.
.. _new_manual.cmdline.get_argparse:
.. _new_manual.cmdline.run:
.. _new_manual.cmdline.setup_logging:
************************************************************************************************************
Template for `argparse <http://docs.python.org/2.7/library/argparse.html>`__
************************************************************************************************************
All you need to do is copy these 6 lines
.. code-block:: python
:emphasize-lines: 5, 13
import ruffus.cmdline as cmdline
parser = cmdline.get_argparse(description='WHAT DOES THIS PIPELINE DO?')
# <<<---- add your own command line options like --input_file here
# parser.add_argument("--input_file")
options = parser.parse_args()
# standard python logger which can be synchronised across concurrent Ruffus tasks
logger, logger_mutex = cmdline.setup_logging (__name__, options.log_file, options.verbose)
# <<<---- pipelined functions go here
cmdline.run (options)
You are recommended to use the standard `argparse <http://docs.python.org/2.7/library/argparse.html>`__ module
but the deprecated `optparse <http://docs.python.org/2.7/library/optparse.html>`__ module works as well. (See :ref:`below <code_template.optparse>` for the template)
******************************************************
Command Line Arguments
******************************************************
``Ruffus.cmdline`` by default provides these predefined options:
.. code-block:: bash
:emphasize-lines: 5,12,15,22
-v, --verbose
--version
-L, --log_file
# tasks
-T, --target_tasks
--forced_tasks
-j, --jobs
--use_threads
# printout
-n, --just_print
# flow chart
--flowchart
--key_legend_in_graph
--draw_graph_horizontally
--flowchart_format
# check sum
--touch_files_only
--checksum_file_name
--recreate_database
******************************************************
1) Logging
******************************************************
The script provides for logging both to the command line:
.. code-block:: bash
myscript -v
myscript --verbose
and an optional log file:
.. code-block:: bash
# keep tabs on yourself
myscript --log_file /var/log/secret.logbook
Logging is ignored if neither ``--verbose`` or ``--log_file`` are specified on the command line
``Ruffus.cmdline`` automatically allows you to write to a shared log file via a proxy from multiple processes.
However, you do need to use ``logging_mutex`` for the log files to be synchronised properly across different jobs:
.. code-block:: python
with logging_mutex:
logger_proxy.info("Look Ma. No hands")
Logging is set up so that you can write
=================================
A) Only to the log file:
=================================
.. code-block:: python
logger.info("A message")
=================================
B) Only to the display:
=================================
.. code-block:: python
logger.debug("A message")
.. _new_manual.cmdline.MESSAGE:
======================================
C) To both simultaneously:
======================================
.. code-block:: python
from ruffus.cmdline import MESSAGE
logger.log(MESSAGE, "A message")
******************************************************
2) Tracing pipeline progress
******************************************************
This is extremely useful for understanding what is happening with your pipeline, what tasks and which
jobs are up-to-date etc.
See :ref:`new_manual.pipeline_printout`
To trace the pipeline, call script with the following options
.. code-block:: bash
# well-mannered, reserved
myscript --just_print
myscript -n
or
# extremely loquacious
myscript --just_print --verbose 5
myscript -n -v5
Increasing levels of verbosity (``--verbose`` to ``--verbose 5``) provide more detailed output
******************************************************
3) Printing a flowchart
******************************************************
This is the subject of :ref:`new_manual.pipeline_printout_graph`.
Flowcharts can be specified using the following option:
.. code-block:: bash
myscript --flowchart xxxchart.svg
The extension of the flowchart file indicates what format the flowchart should take,
for example, ``svg``, ``jpg`` etc.
Override with ``--flowchart_format``
******************************************************
4) Running in parallel on multiple processors
******************************************************
Optionally specify the number of parallel strands of execution and which is the last *target* task to run.
The pipeline will run starting from any out-of-date tasks which precede the *target* and proceed no further
beyond the *target*.
.. code-block:: bash
myscript --jobs 15 --target_tasks "final_task"
myscript -j 15
******************************************************************************************************
5) Setup checkpointing so that *Ruffus* knows which files are out of date
******************************************************************************************************
The :ref:`checkpoint file <new_manual.checkpointing>` uses to the value set in the
environment (``DEFAULT_RUFFUS_HISTORY_FILE``).
If this is not set, it will default to ``.ruffus_history.sqlite`` in the current working directory.
Either can be changed on the command line:
.. code-block:: bash
myscript --checksum_file_name mychecksum.sqlite
============================================================================================================================================================
Recreating checkpoints
============================================================================================================================================================
Create or update the checkpoint file so that all existing files in completed jobs appear up to date
Will stop sensibly if current state is incomplete or inconsistent
::
myscript --recreate_database
============================================================================================================================================================
Touch files
============================================================================================================================================================
As far as possible, create empty files with the correct timestamp to make the pipeline appear up to date.
.. code-block:: bash
myscript --touch_files_only
******************************************************************************************************
6) Skipping specified options
******************************************************************************************************
Note that particular options can be skipped (not added to the command line), if they conflict with your own options, for example:
.. code-block:: python
:emphasize-lines: 3
# see below for how to use get_argparse
parser = cmdline.get_argparse( description='WHAT DOES THIS PIPELINE DO?',
# Exclude the following options: --log_file --key_legend_in_graph
ignored_args = ["log_file", "key_legend_in_graph"])
******************************************************************************************************
7) Specifying verbosity and abbreviating long paths
******************************************************************************************************
The verbosity can be specified on the command line
.. code-block:: bash
myscript --verbose 5
# verbosity of 5 + 1 = 6
myscript --verbose 5 --verbose
# verbosity reset to 2
myscript --verbose 5 --verbose --verbose 2
If the printed paths are too long, and need to be abbreviated, or alternatively, if you want see the full absolute paths of your input and output parameters,
you can specify an extension to the verbosity. See the manual discussion of :ref:`verbose_abbreviated_path <new_manual.pipeline_printout.verbose_abbreviated_path>` for
more details. This is specified as ``--verbose VERBOSITY:VERBOSE_ABBREVIATED_PATH``. (No spaces!)
For example:
.. code-block:: bash
:emphasize-lines: 4,7
# verbosity of 4
myscript.py --verbose 4
# display three levels of nested directories
myscript.py --verbose 4:3
# restrict input and output parameters to 60 letters
myscript.py --verbose 4:-60
******************************************************************************************************
8) Displaying the version
******************************************************************************************************
Note that the version for your script will default to ``"%(prog)s 1.0"`` unless specified:
.. code-block:: python
parser = cmdline.get_argparse( description='WHAT DOES THIS PIPELINE DO?',
version = "my_programme.py v. 2.23")
.. _code_template.optparse:
************************************************************************************************************
Template for `optparse <http://docs.python.org/2.7/library/optparse.html>`__
************************************************************************************************************
deprecated since python 2.7
.. code-block:: python
:emphasize-lines: 8,16
#
# Using optparse (new in python v 2.6)
#
from ruffus import *
parser = cmdline.get_optgparse(version="%prog 1.0", usage = "\n\n %prog [options]")
# <<<---- add your own command line options like --input_file here
# parser.add_option("-i", "--input_file", dest="input_file", help="Input file")
(options, remaining_args) = parser.parse_args()
# logger which can be passed to ruffus tasks
logger, logger_mutex = cmdline.setup_logging ("this_program", options.log_file, options.verbose)
# <<<---- pipelined functions go here
cmdline.run (options)
|