File: pipeline_printout.rst

package info (click to toggle)
python-ruffus 2.6.3%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 20,828 kB
  • ctags: 2,843
  • sloc: python: 15,745; makefile: 180; sh: 14
file content (215 lines) | stat: -rw-r--r-- 9,043 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
.. include:: ../../global.inc
.. include:: manual_chapter_numbers.inc

.. index::
    pair: pipeline_printout; Tutorial

.. _new_manual.pipeline_printout:

############################################################################################################################################################################################################
|new_manual.pipeline_printout.chapter_num|: Understanding how your pipeline works with :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>`
############################################################################################################################################################################################################


.. seealso::

   * :ref:`Manual Table of Contents <new_manual.table_of_contents>`
   * :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>` syntax
   * :ref:`Python Code for this chapter <new_manual.pipeline_printout.code>`


.. note::

    * **Whether you are learning or developing ruffus pipelines, your best friend is** :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>`
      **This shows the exact parameters and files as they are passed through the pipeline.**

    * **We also** *strongly* **recommend you use the** ``Ruffus.cmdline`` **convenience module which**
      **will take care of all the command line arguments for you. See** :ref:`new_manual.cmdline`.



=======================================
Printing out which jobs will be run
=======================================

    :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>` takes the same parameters as pipeline_run but just prints
    the tasks which are and are not up-to-date.

    The ``verbose`` parameter controls how much detail is displayed.

    Let us take the pipelined code we previously wrote in
    |new_manual.transform_in_parallel.chapter_num| :ref:`More on @transform-ing data and @originate <new_manual.transform_in_parallel.code>`
    but call :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>` instead of
    :ref:`pipeline_run(...) <pipeline_functions.pipeline_run>`.
    This lists the tasks which will be run in the pipeline:

        ::

            >>> import sys
            >>> pipeline_printout(sys.stdout, [second_task])

            ________________________________________
            Tasks which will be run:

            Task = create_initial_file_pairs
            Task = first_task
            Task = second_task
            ________________________________________



    To see the input and output parameters of each job in the pipeline, try increasing the verbosity from the default (``1``) to ``3``
    (See :ref:`code  <new_manual.pipeline_printout.code>`)

    This is very useful for checking that the input and output parameters have been specified correctly.

=============================================
Determining which jobs are out-of-date or not
=============================================

    It is often useful to see which tasks are or are not up-to-date. For example, if we
    were to run the pipeline in full, and then modify one of the intermediate files, the
    pipeline would be partially out of date.


    Let us start by run the pipeline in full but then modify ``job1.a.output.1`` so that the second task appears out-of-date:

        .. code-block:: python
            :emphasize-lines: 3

            pipeline_run([second_task])

            # "touch" job1.stage1
            open("job1.a.output.1", "w").close()


    Run :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>` with a verbosity of ``5``.

    This will tell you exactly why ``second_task(...)`` needs to be re-run:
    because ``job1.a.output.1`` has a file modification time *after* ``job1.a.output.2`` (highlighted):


        .. code-block:: pycon
            :emphasize-lines: 9

            >>> pipeline_printout(sys.stdout, [second_task], verbose = 5)

            ________________________________________
            Tasks which are up-to-date:

            Task = create_initial_file_pairs
            Task = first_task

            ________________________________________

            ________________________________________
            Tasks which will be run:

            Task = second_task
                   Job  = [job1.a.output.1
                         -> job1.a.output.2]
            >>> # File modification times shown for out of date files
                     Job needs update:
                     Input files:
                      * 22 Jul 2014 15:29:19.33: job1.a.output.1
                     Output files:
                      * 22 Jul 2014 15:29:07.53: job1.a.output.2

                   Job  = [job2.a.output.1
                         -> job2.a.output.2]
                   Job  = [job3.a.output.1
                         -> job3.a.output.2]

            ________________________________________


    N.B. At a verbosity of 5, even jobs which are up-to-date in ``second_task`` are displayed.



=============================================
Verbosity levels
=============================================

    The verbosity levels for :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>` and :ref:`pipeline_run(...) <pipeline_functions.pipeline_run>`
    can be specified from ``verbose = 0`` (print out nothing) to the extreme verbosity of ``verbose=6``.  A verbosity of above 10 is reserved for the internal
    debugging of Ruffus

        * level **0** : *nothing*
        * level **1** : *Out-of-date Task names*
        * level **2** : *All Tasks (including any task function docstrings)*
        * level **3** : *Out-of-date Jobs in Out-of-date Tasks, no explanation*
        * level **4** : *Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings*
        * level **5** : *All Jobs in Out-of-date Tasks,  (include only list of up-to-date tasks)*
        * level **6** : *All jobs in All Tasks whether out of date or not*
        * level **10**: *logs messages useful only for debugging ruffus pipeline code*

.. _new_manual.pipeline_printout.verbose_abbreviated_path:

==========================================================================================
Abbreviating long file paths with ``verbose_abbreviated_path``
==========================================================================================

    Pipelines often produce interminable lists of deeply nested filenames. It would be nice to be able to abbreviate this
    to just enough information to follow the progress.

    The ``verbose_abbreviated_path`` parameter specifies that   :ref:`pipeline_printout(...) <pipeline_functions.pipeline_printout>` and :ref:`pipeline_run(...) <pipeline_functions.pipeline_run>` only display

        1) the ``NNN`` th top level sub-directories to be included, or that
        2) the message to be truncated to a specified ```MMM`` characters (to fit onto a line, for example). ``MMM`` is specified by setting ``verbose_abbreviated_path = -MMM``, i.e. negative values.

           Note that the number of characters specified is just the separate lengths of the input and output parameters,
           not the entire indented line. You many need to specify a smaller limit that you expect (e.g. ``60`` rather than `80`)

        .. code-block:: python

            pipeline_printout(verbose_abbreviated_path = NNN)
            pipeline_run(verbose_abbreviated_path = -MMM)



    ``verbose_abbreviated_path`` defaults to ``2``


    For example:

        Given ``["aa/bb/cc/dddd.txt", "aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt"]``


        .. code-block:: python
           :emphasize-lines: 1,4,8,19

            # Original relative paths
            "[aa/bb/cc/dddd.txt, aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]"

            # Full abspath
            verbose_abbreviated_path = 0
            "[/test/ruffus/src/aa/bb/cc/dddd.txt, /test/ruffus/src/aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]"

            # Specifed level of nested directories
            verbose_abbreviated_path = 1
            "[.../dddd.txt, .../gggg.txt]"

            verbose_abbreviated_path = 2
            "[.../cc/dddd.txt, .../ffff/gggg.txt]"

            verbose_abbreviated_path = 3
            "[.../bb/cc/dddd.txt, .../eeee/ffff/gggg.txt]"


            # Truncated to MMM characters
            verbose_abbreviated_path = -60
            "<???> /bb/cc/dddd.txt, aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]"


=============================================
Getting a list of all tasks in a pipeline
=============================================

    If you just wanted a list of all tasks (Ruffus decorated function names), then you can
    just run Run :ref:`pipeline_get_task_names(...) <pipeline_functions.pipeline_get_task_names>`.

    This doesn't touch any pipeline code or even check to see if the pipeline is connected up properly.

    However, it is sometimes useful to allow users at the command line to choose from a list of
    possible tasks as a target.