File: TOPP_ProteinResolver.html

package info (click to toggle)
openms 1.11.1-5
links: PTS, VCS
area: main
in suites: jessie, jessie-kfreebsd
size: 436,688 kB
ctags: 150,907
sloc: cpp: 387,126; xml: 71,547; python: 7,764; ansic: 2,626; php: 2,499; sql: 737; ruby: 342; sh: 325; makefile: 128
file content (176 lines) | stat: -rw-r--r-- 16,761 bytes
<HTML>
<HEAD>
<TITLE>ProteinResolver</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> &nbsp;&middot;
<A href="classes.html">Classes</A> &nbsp;&middot;
<A href="annotated.html">Annotated Classes</A> &nbsp;&middot;
<A href="modules.html">Modules</A> &nbsp;&middot;
<A href="functions_func.html">Members</A> &nbsp;&middot;
<A href="namespaces.html">Namespaces</A> &nbsp;&middot;
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
  <div class="headertitle">
<div class="title">ProteinResolver </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>A peptide-centric algorithm for protein inference. </p>
<center></center><center><table class="doxtable">
<tr>
<td align="center" bgcolor="#EBEBEB">pot. predecessor tools  </td><td valign="middle" rowspan="3"><img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> ProteinResolver <img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> </td><td align="center" bgcolor="#EBEBEB">pot. successor tools   </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_IDFilter.html">IDFilter</a>  </td><td valign="middle" align="center" rowspan="1">(external)   </td></tr>
</table>
</center><dl class="experimental"><dt><b><a class="el" href="experimental.html#_experimental000021">Experimental classes:</a></b></dt><dd>This tool has not been tested thoroughly and might behave not as expected!</dd></dl>
<p>This tool is an imlementation of </p>
<p>Meyer-Arendt K, Old WM, et al. (2011)<br/>
 IsoformResolver: A peptide-centric algorithm for protein inference<br/>
 Journal of Proteome Research 10 (7): 3060-75, DOI: 10.1021/pr200039p </p>
<p>The algorithm tries to assign to each protein its experimentally validated peptide (meaning you should supply peptides with have undergone FDR filtering or alike). Proteins are grouped into ISD groups(in-silico derived) and MSD groups(MS/MS derived) if they have in-silico derived or MS/MS derived peptides in common. Proteins and peptides span a bipartite graph. There is an edge between a protein node and a peptide node if and only if the protein contains the peptide. ISD groups are connected graphs in the forementionend bipartite graph. MSD groups are subgraphs of ISD groups. For further information see above paper.</p>
<p><b>Remark:</b> If parameter <code>in</code> is given, <code>in_path</code> is ignored. Parameter <code>in_path</code> is considered only, if <code>in</code> is empty. </p>
<p><b>The command line parameters of this tool are:</b> </p>
<pre class="fragment">
ProteinResolver -- protein inference
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  ProteinResolver &lt;options&gt;

Options (mandatory options marked with '*'):
  -fasta &lt;file&gt;*                       Input database file (valid formats: 'fasta')
  -in &lt;file(s)&gt;                        Input file(s) holding experimental data (valid formats: 'idXML', 'cons
                                       ensusXML')
  -in_path &lt;file&gt;                      Path to idXMLs or consensusXMLs files. Ignored if 'in' is given.
  -design &lt;file&gt;                       Text file containing the experimental design. See documentation for 
                                       specific format requirements (valid formats: 'txt')
  -protein_groups &lt;file&gt;               Output file. Contains all protein groups (valid formats: 'csv')
  -peptide_table &lt;file&gt;                Output file. Contains one peptide per line and all proteins which cont
                                       ain that peptide (valid formats: 'csv')
  -protein_table &lt;file&gt;                Output file. Contains one protein per line (valid formats: 'csv')

Additional options for algorithm:
  -resolver:missed_cleavages &lt;number&gt;  Number of allowed missed cleavages (default: '2' min: '0')
  -resolver:min_length &lt;number&gt;        Minimum length of peptide (default: '6' min: '1')
  -resolver:enzyme &lt;choice&gt;            Digestion enzyme (default: 'Trypsin' valid: 'Trypsin')

Additional options for quantitative experimental design:
  -designer:experiment &lt;text&gt;          Identifier for the experimental design. (default: 'ExperimentalSetting
                                       ')
  -designer:file &lt;text&gt;                Identifier for the file name. (default: 'File')
  -designer:separator &lt;choice&gt;         Separator, which should be used to split a row into columns (default: 
                                       'tab' valid: 'tab', 'semi-colon', 'comma', 'whitespace')

                                       
Common TOPP options:
  -ini &lt;file&gt;                          Use the given TOPP INI file
  -threads &lt;n&gt;                         Sets the number of threads allowed to be used by the TOPP tool (defaul
                                       t: '1')
  -write_ini &lt;file&gt;                    Writes the default configuration file
  --help                               Shows options
  --helphelp                           Shows all options (including advanced)

</pre><p><b>Input</b></p>
<p>Since the ProteinResolver offers two different input parameters, there are some possibilites how to use this TOPP tool. </p>
<dl>
<dt>One single input file (<code>in</code>) </dt>
<dd><p class="startdd">The ProteinResolver simply performs the protein inference based on the above mentioned algortihm of Meyer-Arendt et al. (2011) for that specific file.</p>
<p class="enddd"></p>
</dd>
<dt>Multiple files (<code>in</code> or <code>in_path</code>) </dt>
<dd><ol>
<li>
If no experimental design file is given, all files are treated as in batch processing. </li>
<li>
If an experimental design file is provided, all files that can be mapped to the same experimental design are treated as one single input file (simply by merging them before the computation). </li>
</ol>
</dd>
</dl>
<p><b>Output</b></p>
<p>Four possible outputs are available:</p>
<dl>
<dt>Protein groups </dt>
<dd>For each MSD group, the ISD group, the protein indices, the peptide indices, the number of peptides in MSD group, the number of proteins in ISD and the number of proteins in ISD are written to the output file </dd>
<dt>Protein table </dt>
<dd>The resulting text file contains one protein per line </dd>
<dt>Peptide table </dt>
<dd>The output file will contain one peptide per line and all proteins which contain that specific peptide </dd>
<dt>Statistics: </dt>
<dd>Number of ISD groups, number of MSD groups, number of target peptides, number of decoy peptides, number of target and decoy peptides, number of peptides in MSD groups and estimated FDR for protein list. </dd>
</dl>
<p>The results for different input files are appended and written into the same output file. In other words, no matter how many input files you have, you will end up with one single output file. </p>
<p><b>Text file format of the quantitative experimental design:</b></p>
<p>The text file has to be column-based and must contain only one additional line as header. The header must specify two specific columns that represents the file name and an identifier for the experimental setup. These two header identifiers can be defined as parameter and must be unique (default: "File" and "ExperimentalSetting"). There are four options how the columns can be separated: tabulator, comma, semi-colon and whitespace.</p>
<p><em>Example for text file format:</em></p>
<center> <table class="doxtable">
<tr>
<td align="center" bgcolor="#EBEBEB">Slice </td><td align="center" bgcolor="#EBEBEB">File </td><td align="center" bgcolor="#EBEBEB">ExperimentalSetting  </td></tr>
<tr>
<td align="center">1 </td><td align="center">SILAC_2_1 </td><td align="center">S1224  </td></tr>
<tr>
<td align="center">4 </td><td align="center">SILAC_3_4 </td><td align="center">D1224  </td></tr>
<tr>
<td align="center">2 </td><td align="center">SILAC_10_2 </td><td align="center">S1224  </td></tr>
<tr>
<td align="center">7 </td><td align="center">SILAC_8_7 </td><td align="center">S1224  </td></tr>
</table>
</center><p>In this case the values of the parameters "experiment" and "file" which are by default set to "ExperimentalSetting" and "File", respectively, are ok. If you use other column headers you need to change these parameters.</p>
<p>The separator should be changed if the file is not tab separated. Every other column (here: first column) is just ignored. Not every file mentioned in the design file has to be given as input file; and every input file that has no match in the design file is ignored for the computation.<br/>
 <br/>
 <em>Consider the following scenario:</em><br/>
 <br/>
 <b>Input files:</b> SILAC_2_1.consensusXML, SILAC_3_4.consensusXML, SILAC_10_2.consensusXML and SILAC_8_7_.consensusXML<br/>
 <br/>
 <b>First step:</b> Data from SILAC_2_1.consensusXML and SILAC_10_2.consensusXML is merged, because both files can be mapped to the same setting S1224. SILAC_8_7_.consensusXML is ignored, since SILAC_8_7_ is no match to SILAC_8_7.<br/>
 <br/>
 <b>Second step:</b> ProteinResolver computes results for the merged data, and the data from the file SILAC_3_4.<br/>
 <br/>
 <b>Third step:</b> ProteinResolver writes the results for experimental setting S1224 and D1224 to the same output file.<br/>
 <br/>
</p>
<p><b>INI file documentation of this tool:</b> <div class="ini_global">
<div class="legend">
<b>Legend:</b><br>
 <div class="item item_required">required parameter</div>
 <div class="item item_advanced">advanced parameter</div>
</div>
  <div class="node"><span class="node_name">+ProteinResolver</span><span class="node_description">protein inference</span></div>
    <div class="item item_advanced"><span class="item_name" style="padding-left:16px;">version</span><span class="item_value">1.11.1</span>
<span class="item_description">Version of the tool that generated this parameters file.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>    <div class="node"><span class="node_name">++1</span><span class="node_description">Instance '1' section for 'ProteinResolver'</span></div>
      <div class="item"><span class="item_name item_required" style="padding-left:24px;">fasta</span><span class="item_value"></span>
<span class="item_description">Input database file</span><span class="item_tags">input file</span><span class="item_restrictions">*.fasta</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">in</span><span class="item_value">[]</span>
<span class="item_description">Input file(s) holding experimental data</span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML,*.consensusXML</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">in_path</span><span class="item_value"></span>
<span class="item_description">Path to idXMLs or consensusXMLs files. Ignored if 'in' is given.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">design</span><span class="item_value"></span>
<span class="item_description">Text file containing the experimental design. See documentation for specific format requirements</span><span class="item_tags">input file</span><span class="item_restrictions">*.txt</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">protein_groups</span><span class="item_value"></span>
<span class="item_description">output file. Contains all protein groups</span><span class="item_tags">output file</span><span class="item_restrictions">*.csv</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">peptide_table</span><span class="item_value"></span>
<span class="item_description">output file. Contains one peptide per line and all proteins which contain that peptide</span><span class="item_tags">output file</span><span class="item_restrictions">*.csv</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">protein_table</span><span class="item_value"></span>
<span class="item_description">output file. Contains one protein per line</span><span class="item_tags">output file</span><span class="item_restrictions">*.csv</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">additional_info</span><span class="item_value"></span>
<span class="item_description">output file for additional info</span><span class="item_tags">output file</span><span class="item_restrictions">*.csv</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">log</span><span class="item_value"></span>
<span class="item_description">Name of log file (created only when specified)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">debug</span><span class="item_value">0</span>
<span class="item_description">Sets the debug level</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">threads</span><span class="item_value">1</span>
<span class="item_description">Sets the number of threads allowed to be used by the TOPP tool</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">no_progress</span><span class="item_value">false</span>
<span class="item_description">Disables progress logging to command line</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">test</span><span class="item_value">false</span>
<span class="item_description">Enables the test mode (needed for internal use only)</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="node"><span class="node_name">+++resolver</span><span class="node_description">Additional options for algorithm</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">missed_cleavages</span><span class="item_value">2</span>
<span class="item_description">Number of allowed missed cleavages</span><span class="item_tags"></span><span class="item_restrictions">0:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">min_length</span><span class="item_value">6</span>
<span class="item_description">Minimum length of peptide</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">enzyme</span><span class="item_value">Trypsin</span>
<span class="item_description">Digestion enzyme</span><span class="item_tags"></span><span class="item_restrictions">Trypsin</span></div>      <div class="node"><span class="node_name">+++designer</span><span class="node_description">Additional options for quantitative experimental design</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">experiment</span><span class="item_value">ExperimentalSetting</span>
<span class="item_description">Identifier for the experimental design.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">file</span><span class="item_value">File</span>
<span class="item_description">Identifier for the file name.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">separator</span><span class="item_value">tab</span>
<span class="item_description">Separator, which should be used to split a row into columns</span><span class="item_tags"></span><span class="item_restrictions">tab,semi-colon,comma,whitespace</span></div></div>
 </div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>