1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355
|
// --------------------------------------------------------------------------
// OpenMS -- Open-Source Mass Spectrometry
// --------------------------------------------------------------------------
// Copyright The OpenMS Team -- Eberhard Karls University Tuebingen,
// ETH Zurich, and Freie Universitaet Berlin 2002-2018.
//
// This software is released under a three-clause BSD license:
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
// * Neither the name of any author or any participating institution
// may be used to endorse or promote products derived from this software
// without specific prior written permission.
// For a full list of authors, refer to the file AUTHORS.
// --------------------------------------------------------------------------
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
// ARE DISCLAIMED. IN NO EVENT SHALL ANY OF THE AUTHORS OR THE CONTRIBUTING
// INSTITUTIONS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
// OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
// WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
// OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
// ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// --------------------------------------------------------------------------
// $Maintainer: Johannes Junker $
// $Authors: Johannes Junker, Chris Bielow $
// --------------------------------------------------------------------------
//##############################################################################
/**
@page TOPPAS_general General introduction
@b TOPPAS allows you to create, edit, open, save, and run TOPP workflows. Pipelines
can be created conveniently in a GUI. The
parameters of all involved tools can be edited within TOPPAS
and are also saved as part of the pipeline definition in the @a .toppas file.
Furthermore, @b TOPPAS interactively performs validity checks during the pipeline
editing process and before execution (i.e., a dry run of the entire pipeline),
in order to prevent the creation of invalid workflows.
Once set up and saved, a workflow can also be run without the GUI using
@a ExecutePipeline @a -in @a \<file\>.
The following figure shows a simple example pipeline that has just been created
and executed successfully:
@image html TOPPAS_simple_example.png
@image latex TOPPAS_simple_example.png "" width=14cm
To create a new TOPPAS file, you can either:
- open TOPPAS without providing any existing workflow - an empty workflow will be opened automatically
- in a running TOPPAS program choose: @a File @a > @a New
- create an empty file in your file browser (explorer) with the suffix @a \.toppas and double-click it (on Windows systems all @a \.toppas files are associated with TOPPAS automatically during installation of %OpenMS, on Linux and MacOS you might need to manually associate the extension)
@page TOPPAS_interface User interface
@section TOPPAS_interface_introduction Introduction
The following figure shows the @b TOPPAS main window and a pipeline which is just being created.
The user has added some tools by drag&dropping them from the TOPP tool list on the left
onto the central window. Additionally, the user has added nodes
for input and output files.
Next, the user has drawn some connections between the tools
which determine the data flow of the pipeline. Connections can be drawn by first @em deselecting the
desired source node (by clicking anywhere but not on another node)
and then dragging the mouse from the source to the target node (if
a @em selected node is dragged, it is moved on the canvas instead).
When a connection is created and the source (the target) has more than one output (input) parameter,
an input/output parameter
mapping dialog shows up and lets the user select the output parameter of the source node and the
input parameter of the target node for this data flow - shown here for the connection between PeptideIndexer and FalseDiscoveryRate.
If the file types of the selected input and output parameters are not compatible with each other,
@b TOPPAS will refuse to add the connection. It will also refuse to add a connection if it
would create a cycle in the workflow, or if it just would not make sense, e.g., if
its target is an input file node. The connection between the input file node (#1) and the OMSSAAdapter (#5)
tool is painted yellow which indicates it is not ready yet, because no input files have been
specified.
@image html TOPPAS_edges.png
@image latex TOPPAS_edges.png "" width=14cm
The input/output mapping of connections can be changed at any time during the editing process by double-clicking
an connections or by selecting @a Edit @a I/O @a mapping from the context menu which appears when a connection is right-clicked.
All visible items (i.e. connections and the different kinds of nodes) have such a context menu. For a detailed list
of the different menus and their entries, see @ref TOPPAS_interface_menus .
The following figure shows a possible next step: the user has double-clicked one of the tool nodes in order
to configure its parameters. By default, the standard parameters are used for each tool. Again, this can also
be done by selecting @a Edit @a parameters from the context menu of the tool.
@image html TOPPAS_parameters.png
@image latex TOPPAS_parameters.png "" width=14cm
Once the pipeline has been set up, the input files have to be specified before the pipeline can be executed.
This is done by double-clicking an input node and selecting the desired files in the dialog that appears.
Input nodes have a special mode named "recycling mode", i.e., if the input node has fewer files than the following
node has rounds (as it might have two incoming connections) then the files are recycled until all rounds
are satisfied. This might be useful if one input node specifies a single database file (for a Search-Adapter like Mascot)
and another input node has the actual MS2 experiments (which is usually more than one). Then the database input
node would be set to "recycle" the database file, i.e. use it for every run of the MascotAdapter node. The input
list can be recycled an arbitrary number of times, but the recycling has to be @em complete, i.e. the number of rounds of the
downstream node have to be a multiple of the number of input files. Recycling mode can be activated by right-clicking
the input node and selecting the according entry from the context menu.
Finally, if you have input and output nodes at every end of your pipeline and all connections are green,
you can select @a Pipeline @a > @a Run in the menu bar or just press @a F5.
@image html TOPPAS_run_options.png
@image latex TOPPAS_run_options.png "" width=14cm
You will be asked for an output file directory where
a sub-directory, @a TOPPAS_out, will be created. This directory will contain your output files.
You can also specify the number of jobs TOPPAS is allowed to run in parallel. If a number greater than 1
is selected, TOPPAS will parallelize the pipeline execution in the following scenarios:
- A tool has to process more than one file. In this case, multiple files can be processed in parallel.
- The pipeline contains multiple branches that are independent of each other. In this case, multiple tools can run in parallel.
Be careful with this setting, however, as some of the TOPP tools require large amounts of RAM (depending
on the size of your dataset). Running too many parallel jobs on a machine with not enough memory will cause problems.
Also, do not confuse this setting with the @em threads parameter of the individual TOPP tools: every TOPP tool has this
parameter specifying the maximum number of threads the tool is allowed to use (although only a subset of the TOPP tools make use
of this parameter, since there are tasks that cannot be computed in parallel). Be especially careful with combinations
of both parameters! If you have a pipeline containing the @em FeatureFinderCentroided, for example, and set its @em threads parameter
to 8, and you additionally set the number of parallel jobs in @b TOPPAS to 8, then you end up using 64 threads in parallel, which
might not be what you intended to do.
In addition to @a TOPPAS_out, a @a TOPPAS_tmp directory will be created in the %OpenMS temp path
(call the @em OpenMSInfo tool to see where exactly).
It will contain all temporary files that are passed from tool to tool within the pipeline.
Both folders contain further sub-directories which are named after the number in the top-left corner of the node they
belong to (plus the name of the tool for temporary files). During pipeline execution, the status lights in the top-right corner of the
tools indicate if the tool has finished successfully (green), is currently running (yellow),
has not done anything so far (gray), is scheduled to run next (blue), or has crashed (red).
The numbers in the bottom-right corner of every tool show how many files have already been processed and
the overall number of files to be processed by this tool.
When the execution has finished, you can check the generated output files of every node quickly by selecting
"@a Open @a files @a in @a TOPPView" or "@a Open @a containing @a folder" from the context menu (right click on the node).
@section TOPPAS_interface_mk Mouse and keyboard
Using the mouse, you can
- drag&drop tools from the TOPP tool list onto the workflow window (you can also double-click them instead)
- select items (by clicking)
- select multiple items (by holding down @a CTRL while clicking)
- select multiple items (by holding down @a CTRL and dragging the mouse in order to "catch" items with a selection rectangle)
- move all selected items (by dragging one of them)
- draw a new connection from one node to another (by dragging; source must be deselected first)
- specify input files (by double-clicking an input node)
- configure parameters of tools (by double-clicking a tool node)
- specify the input/output mapping of connections (by double-clicking a connection)
- translate the view (by dragging anywhere but on an item)
- zoom in and out (using the mouse wheel)
- make the context menu of an item appear (by right-clicking it)
@n
Using the keyboard, you can
- delete all selected items (@a DEL or @a BACKSPACE)
- zoom in and out (@a + / @a -)
- run the pipeline (@a F5)
- open this tutorial (@a F1)
@n
Using the mouse+keyboard, you can
- copy a node's parameters to another node (only parameters with identical names will be copied, e.g., 'fixed_modifications') (@a CTRL while creating an edge)
The edge will be colored as dark magenta to indicate parameter copying.
@section TOPPAS_interface_menus Menus
@b Menu @b bar:
@n @n
In the @a File menu, you can
- create a new, empty workflow (@a New)
- open an existing one (@a Open)
- open an example file (@a Open @a example @a file)
- include an existing workflow to the current workflow (@a Include)
- visit the online workflow repository (@a Online @a repository)
- save a workflow (@a Save / @a Save @a as)
- export the workflow as image (@a Export @a as @a image)
- refresh the parameter definitions of all tools contained in the workflow (@a Refresh @a parameters)
- close the current window (@a Close)
- load and save TOPPAS resource files (.trf) (@a Load / @a Save @a TOPPAS @a resource @a file)
@n
In the @a Pipeline menu, you can
- run a pipeline (@a Run)
- abort a currently running pipeline (@a Abort)
@n
In the @a Windows menu, you can
- make the TOPP tool list window on the left, the description window on the right, and the log message at the bottom (in)visible.
@n
In the @a Help menu, you can
- go to the %OpenMS website (@a %OpenMS @a website)
- open this tutorial (@a TOPPAS @a tutorial)
@n @n
@b Context @b menus:
@n @n
In the context menu of an @a input @a node, you can
- specify the input files
- open the specified files in TOPPView
- open the input files' folder in the window manager (explorer)
- toggle the "recycling" mode
- copy, cut, and remove the node
@n
In the context menu of a @a tool, you can
- configure the parameters of the tool
- resume the pipeline at this node
- open its temporary output files in TOPPView
- open the temporary output folder in the file manager (explorer)
- toggle the "recycling" mode
- copy, cut, and remove the node
@n
In the context menu of a @a Merger or @a Collector, you can
- toggle the "recycling" mode
- copy, cut, and remove the node
@n
In the context menu of an @a output @a node, you can
- open the output files in TOPPView
- open the output files' folder in the window manager (explorer)
- copy, cut, and remove the node
@page TOPPAS_examples Examples
The following sections explain the example pipelines TOPPAS comes with. You can
open them by selecting @a File > @a Open @a example @a file. All input files and
parameters are already specified, so you can just hit @a Pipeline > @a Run (or press
@a F5) and see what happens.
@section TOPPAS_peak_picking_example Profile data processing
The file @a peakpicker_tutorial.toppas contains a simple pipeline representing a
common use case: starting with profile data, the noise is eliminated and the baseline
is subtracted. Then, PeakPickerWavelet is used to find all peaks in the noise-filtered
and baseline-reduced profile data. This workflow is also described in the section
@ref TOPP_example_signalprocessing. The individual steps are explained in more
detail in the TOPPView tutorial: @ref TOPPView_smoothing, @ref TOPPView_baseline_reduction,
and @ref TOPPView_peakpicking.
@image html TOPPAS_example_profile_data_processing.png
@image latex TOPPAS_example_profile_data_processing.png "" width=14cm
@section TOPPAS_id_example Identification of E. coli peptides
This section describes an example identification pipeline contained in the
example directory, @a Ecoli_Identification.toppas. It is shipped together
with a reduced example mzML file containing 139 MS2 spectra from an E. coli
run on an Orbitrap instrument as well as an E. coli target-decoy database.
We use the search engine
OMSSA (Geer et al., 2004) for peptide identification. Therefore,
OMSSA must be installed and the path to the OMSSA executable (omssacl) must
be set in the parameters of the OMSSAAdapter node.
- Node #1 accepts mzML files containing MS2 spectra.
- Node #2 provides the database and is set to "recycling mode" to allow the database to be reused when there is more than one input file in node #1.
- OMSSAAdapter calls OMSSA which performs the actual search.
- PeptideIndexer annotates for each search result whether it is a target or a decoy hit.
- FalseDiscoveryRate computes q-values for the IDs.
- Finally, IDFilter selects only those IDs with a q-value of less than 1%.
@image html TOPPAS_Ecoli_Identification.png
@image latex TOPPAS_Ecoli_Identification.png "" width=12cm
Extensions to this pipeline would be to do the annotation of the spectra with
multiple search engines and combine the results afterwards, using the ConsensusID
TOPP tool.
The results may be exported using the TextExporter tool, for further analysis with
different tools.
@section TOPPAS_quant_example Quantitation of BSA runs
The simple pipeline described in this section (@a BSA_Quantitation.toppas) can be used to quantify peptides
that occur on different runs. The example dataset contains three different bovine serum albumin (BSA) runs.
First, FeatureFinderCentroided is called since the dataset is centroided. The
results of the feature finding are then annotated with (existing) identification results.
For convenience, we provide these search results from OMSSA with peptides with an FDR of 5% in the BSA directory.
@image html TOPPAS_BSA_Quantitation.png
@image latex TOPPAS_BSA_Quantitation.png "" width=8cm
Identifications are mapped to features by the IDMapper. The last step
is performed by FeatureLinkerUnlabeled which links corresponding features. The results can be
used to calculate ratios, for example. The data could also be exported to a text based
format using the TextExporter for further processing (e.g., in Microsoft Excel).
The results can be opened in TOPPView. The next figures show the results in 2D
and 3D view, together with the feature intermediate results. One can see
that the intensities and retention times are slightly different between
the runs. To correct for retention times shift, a map alignment could be done,
either on the spectral data or on the feature data.
@image html TOPPAS_BSA_results_2d.png
@image latex TOPPAS_BSA_results_2d.png "" width=10cm
@image html TOPPAS_BSA_results_3d.png
@image latex TOPPAS_BSA_results_3d.png "" width=10cm
@section TOPPAS_merger_example Merger and Collect nodes
The following example is actually not a useful workflow but is supposed
to demonstrate how merger and collector nodes can be used in a pipeline. Have a look at
@a merger_tutorial.toppas:
@image html TOPPAS_example_merger.png
@image latex TOPPAS_example_merger.png "" width=14cm
As its name suggests, a merger merges its incoming file lists, i.e.,
files of all incoming edges are appended into new lists (which
have as many elements as the merger has incoming connections). All tools this merger has outgoing
connections to are called with these merged lists as input files. All incoming connections should
pass the same number of files (unless the corresponding preceding tool is in recycling mode).
A collector node, on the other hand, waits for all rounds to finish before concatenating all files from all
incoming connections into one single list. It then calls the next tool with this list of files as input.
This will happen exactly once during the entire pipeline run.
In order to track what is happening, you can just open the example file and run it. When the
pipeline execution has finished, have a look at all input and output files (e.g., select
@a Open @a in @a TOPPView in the context menu of the input/output nodes).
The input files are named rt_1.mzML, rt_2.mzML, ... and each contains a single
spectrum with RT as indicated by the filename, so you can easily see which files have
been merged together.
*/
|