File: batch_processing_with_foreach.rst

package info (click to toggle)
mrtrix3 3.0.8-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 15,300 kB
  • sloc: cpp: 130,470; python: 9,603; sh: 597; makefile: 62; xml: 47
file content (112 lines) | stat: -rw-r--r-- 6,669 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
.. _batch_processing:

Batch processing with ``for_each``
==================================

Image processing often involves executing the same command on many different subjects or time points within a study. *MRtrix3* includes a Python script called ``for_each`` to simplify this process. The main benefit of using ``for_each`` compared to a bash ``for`` loop is a simpler and less verbose syntax. However other benefits include multi-threaded job execution (to exploit modern multi-core CPUs when the command being run is not already multi-threaded), and automatic identification of path basenames and prefixes. To view the full help page run ``for_each`` on the command line with no arguments.


Example 1 - using IN
--------------------
Many people like to organise their imaging datasets with one directory per subject. For example::

  study/001_patient/dwi.mif
  study/002_patient/dwi.mif
  study/003_patient/dwi.mif
  study/004_control/dwi.mif
  study/005_control/dwi.mif
  study/006_control/dwi.mif

The for_each script can be used to run the same command on each subject, for example::

.. code-block:: console

  $ for_each study/* : dwidenoise IN/dwi.mif IN/dwi_denoised.mif

The first part of the command above is the ``for_each`` script name, followed by the pattern matching string (``study/*``) to identify all the files (which in this case are directories) to be looped over. The colon is used to separate the invocation of ``for_each``, along with its inputs and any command-line options, from the command to be executed. In this example the ``dwidenoise`` command will be run multiple times, by substituting the keyword ``IN`` with each of the directories that match the pattern (``study/001_patient``, ``study/002_patient``, etc.).

Example 2 - using NAME
----------------------
Other people may pefer to organise their imaging datasets with one folder per image type and have all subjects inside. For example:

.. code-block:: text

  study/dwi/001_patient.mif
  study/dwi/002_patient.mif
  study/dwi/003_patient.mif
  study/dwi/004_control.mif
  study/dwi/005_control.mif
  study/dwi/006_control.mif

The ``NAME`` keyword can be used in this situation to obtain the basename of the file path. For example:

.. code-block:: console

  $ mkdir study/dwi_denoised
  $ for_each study/dwi/* : dwidenoise IN study/dwi_denoised/NAME

Here, the IN keyword will be substituted with the full string from the matching pattern (``study/dwi/001_patient.mif``, ``study/dwi/002_patient.mif``, etc), however the NAME keyword will be replaced with the *basename* of the matching pattern (``001_patient.mif``, ``002_patient.mif``, etc).

Alternatively, the same result can be achieved by running ``for_each`` from inside the ``study/dwi`` directory. In this case NAME would not be required. For example:

.. code-block:: console

  $ mkdir study/dwi_denoised
  $ cd study/dwi
  $ for_each * : dwidenoise IN ../dwi_denoised/IN


Example 3 - using PRE
---------------------
For this example let us assume we want to convert all dwi.mif files from example 2 to NIfTI file format (``*.nii``). This can be performed using:

.. code-block:: console

  $ for_each study/dwi/* : mrconvert IN study/dwi/PRE.nii
  $ rm *.mif

There the PRE keyword will be replaced by the file basename, without the file extension.


Example 4 - Sequential Processing
---------------------------------
As an example of a single ``for_each`` command running multiple sequential commands (e.g. with the bash ``;``, ``|``, ``&&``, ``||`` operators), let's assume in the previous example we wanted to remove the ``*.mif`` files as they were converted. We could use the ``&&`` operator, which means "run next command only if current command succeeds without error".

.. code-block:: console

  $ for_each study/dwi/* : mrconvert IN study/dwi/PRE.nii "&&" rm IN

The ``&&`` operator here must be escaped with quotes in order to prevent the shell from interpreting it. Bash operator characters can also be escaped with the "\" character; for example, to :ref:`pipe an image <unix_pipelines>` between two MRtrix commands (assuming the data set directory layout from example 1):

.. code-block:: console

  $ for_each study/* : dwiextract -bzero IN/dwi.mif - \| mrmath - mean -axis 3 IN/mean_b0.mif


Example 5 - Parallel Processing
-------------------------------
To run multiple jobs at once, use the standard *MRtrix3* command-line option ``-nthreads N``, where N is the number of concurrent jobs required. For example:

.. code-block:: console

  $ for_each study/* -nthreads 8 : dwidenoise IN/dwi.mif IN/dwi_denoised.mif

will run up to 8 of the required jobs in parallel. Note that unlike in other *MRtrix3* commands where command-line options can be placed anywhere on the command-line, in this particular context the ``-nthreads`` option must be specified *before* the colon separator. This is necessary in order for the ``for_each`` script to recognise that this command-line option applies to its own operation, as opposed to the command that ``for_each`` is responsible for invoking. To demonstrate this, consider the following usage:

.. code-block:: console

  $ for_each study/* : dwidenoise IN/dwi.mif IN/dwi_denoised.mif -nthreads 8

Here, ``for_each`` would execute the ``dwidenoise`` command entirely *sequentially*, once for each input; but each time it is run, ``dwidenoise`` would be instructed to use 8 threads.

Indeed these two usages can in theory be *combined*. Imagine that a hypothetical *MRtrix3* command, "``dwidostuff``", tends to not be capable in practise of utilising any more than four threads, regardless of how many threads are in fact available on your hardware / explicitly invoked. However you have a system with eight hardware threads, and wish to utilise them all as much as possible. In such a scenario, you could use:

.. code-block:: console

  $ for_each study/* -nthreads 2 : dwidostuff IN/dwi.mif IN/dwi_stuffdone.mif -nthreads 4

This would instruct ``for_each`` to always have *two* jobs running in parallel, each of which will be explicitly instructed to use *four* threads.

Note that most *MRtrix3* commands are multi-threaded, and will generally succeed in individually using all available CPU cores, in which case running multiple jobs in parallel using ``for_each`` is unlikely to provide a benefit in computation time (or it may in fact be detrimental). If however a particular command is known to be single-threaded (or have only limited multi-threading capability), and your system possesses enough RAM to support running multiple instances of that command at once, this usage may yield a considerable reduction in total processing time.