File: subdivide.rst

package info (click to toggle)
python-ruffus 2.6.3%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 20,828 kB
  • ctags: 2,843
  • sloc: python: 15,745; makefile: 180; sh: 14
file content (196 lines) | stat: -rw-r--r-- 8,388 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
.. include:: ../global.inc
.. _decorators.subdivide:
.. index::
    pair: @subdivide; Syntax

.. seealso::

    * :ref:`@subdivide <new_manual.subdivide>` in the **Ruffus** Manual
    * :ref:`Decorators <decorators>` for more decorators

########################
@subdivide
########################

.. |input| replace:: `input`
.. _input: `decorators.subdivide.input`_
.. |extras| replace:: `extras`
.. _extras: `decorators.subdivide.extras`_
.. |output| replace:: `output`
.. _output: `decorators.subdivide.output`_
.. |matching_regex| replace:: `matching_regex`
.. _matching_regex: `decorators.subdivide.matching_regex`_
.. |matching_formatter| replace:: `matching_formatter`
.. _matching_formatter: `decorators.subdivide.matching_formatter`_
.. |input_pattern_or_glob| replace:: `input_pattern_or_glob`
.. _input_pattern_or_glob: `decorators.subdivide.input_pattern_or_glob`_


************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
*@subdivide* ( |input|_, :ref:`regex<decorators.regex>`\ *(*\ |matching_regex|_\ *)* |  :ref:`formatter<decorators.formatter>`\ *(*\ |matching_formatter|_\ *)*\, [ :ref:`inputs<decorators.inputs>` *(*\ |input_pattern_or_glob|_\ *)* | :ref:`add_inputs<decorators.add_inputs>` *(*\ |input_pattern_or_glob|_\ *)* ], |output|_, [|extras|_,...]  )
************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
    **Purpose:**

        * Subdivides a set of *Inputs* each further into multiple *Outputs*.

        * **Many to Even More** operator

        * The number of files in each *Output* can be set at runtime by the use of globs

        * Output file names are specified using the :ref:`formatter<decorators.formatter>` or :ref:`regex<decorators.regex>` indicators from |input|_, i.e. from the output
          of specified tasks, or a list of file names, or a |glob|_ matching pattern.

        * Additional inputs or dependencies can be added dynamically to the task:
            :ref:`add_inputs<decorators.add_inputs>` nests the the original input parameters in a list before adding additional dependencies.

            :ref:`inputs<decorators.inputs>` replaces the original input parameters wholescale.

        * Only out of date tasks (comparing input and output files) will be run.

        .. note::

            The use of **split** is a synonym for subdivide is deprecated.


    **Example**:

        .. code-block:: python
            :emphasize-lines: 12,13,20

            from ruffus import *
            from random import randint
            from random import os

            @originate(['0.start', '1.start', '2.start'])
            def create_files(output_file):
                with open(output_file, "w"):
                    pass


            #
            #   Subdivide each of 3 start files further into [NNN1, NNN2, NNN3] number of files
            #      where NNN1, NNN2, NNN3 are determined at run time
            #
            @subdivide(create_files, formatter(),
                        "{path[0]}/{basename[0]}.*.step1",  # Output parameter: Glob matches any number of output file names
                        "{path[0]}/{basename[0]}")          # Extra parameter:  Append to this for output file names
            def subdivide_files(input_file, output_files, output_file_name_root):
                #
                #   IMPORTANT: cleanup rubbish from previous run first
                #
                for oo in output_files:
                    os.unlink(oo)
                #   The number of output files is decided at run time
                number_of_output_files = randint(2,4)
                for ii in range(number_of_output_files):
                    output_file_name = "{output_file_name_root}.{ii}.step1".format(**locals())
                    with open(output_file_name, "w"):
                        pass


            #
            #   Each output of subdivide_files results in a separate job for downstream tasks
            #
            @transform(subdivide_files, suffix(".step1"), ".step2")
            def analyse_files(input_file, output_file_name):
                with open(output_file_name, "w"):
                    pass

            pipeline_run()

        .. comment **

            The Ruffus printout shows how each of the jobs in ``subdivide_files()`` spawns
            multiple *Output* leading to more jobs in ``analyse_files()``


        .. code-block:: pycon

            >>> pipeline_run()
                Job  = [None -> 0.start] completed
                Job  = [None -> 1.start] completed
                Job  = [None -> 2.start] completed
            Completed Task = create_files
                Job  = [0.start -> 0.*.step1, 0] completed
                Job  = [1.start -> 1.*.step1, 1] completed
                Job  = [2.start -> 2.*.step1, 2] completed
            Completed Task = subdivide_files
                Job  = [0.0.step1 -> 0.0.step2] completed
                Job  = [0.1.step1 -> 0.1.step2] completed
                Job  = [0.2.step1 -> 0.2.step2] completed
                Job  = [1.0.step1 -> 1.0.step2] completed
                Job  = [1.1.step1 -> 1.1.step2] completed
                Job  = [1.2.step1 -> 1.2.step2] completed
                Job  = [1.3.step1 -> 1.3.step2] completed
                Job  = [2.0.step1 -> 2.0.step2] completed
                Job  = [2.1.step1 -> 2.1.step2] completed
                Job  = [2.2.step1 -> 2.2.step2] completed
                Job  = [2.3.step1 -> 2.3.step2] completed
            Completed Task = analyse_files




    **Parameters:**


.. _decorators.subdivide.input:

    * *tasks_or_file_names*
       can be a:

       #.  Task / list of tasks (as in the example above).
            File names are taken from the output of the specified task(s)
       #.  (Nested) list of file name strings.
            File names containing ``*[]?`` will be expanded as a |glob|_.
             E.g.:``"a.*" => "a.1", "a.2"``


.. _decorators.subdivide.matching_regex:

    * *matching_regex*
       is a python regular expression string, which must be wrapped in
       a :ref:`regex<decorators.regex>` indicator object
       See python `regular expression (re) <http://docs.python.org/library/re.html>`_
       documentation for details of regular expression syntax

.. _decorators.subdivide.matching_formatter:

    * *matching_formatter*
       a :ref:`formatter<decorators.formatter>` indicator object containing optionally
       a  python `regular expression (re) <http://docs.python.org/library/re.html>`_.

.. _decorators.subdivide.output:

    * **output** = *output*
        Specifies the resulting output file name(s) after string substitution

        Can include glob patterns.


.. _decorators.subdivide.input_pattern_or_glob:

    * *input_pattern*
       Specifies the resulting input(s) to each job.
       Must be wrapped in an :ref:`inputs<decorators.inputs>` or an :ref:`inputs<decorators.add_inputs>` indicator object.

       Can be a:

       #.  Task / list of tasks (as in the example above).
            File names are taken from the output of the specified task(s)
       #.  (Nested) list of file name strings.

       Strings are subject to :ref:`regex<decorators.regex>` or :ref:`formatter<decorators.formatter>` substitution.


.. _decorators.subdivide.extras:

    * **extras** = *extras*
       Any extra parameters are passed verbatim to the task function

       If you are using named parameters, these can be passed as a list, i.e. ``extras= [...]``

       Any extra parameters are consumed by the task function and not forwarded further down the pipeline.

       Strings are subject to :ref:`regex<decorators.regex>` or :ref:`formatter<decorators.formatter>`
       substitution.