1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211
|
<HTML>
<HEAD>
<TITLE>IDFilter</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> ·
<A href="classes.html">Classes</A> ·
<A href="annotated.html">Annotated Classes</A> ·
<A href="modules.html">Modules</A> ·
<A href="functions_func.html">Members</A> ·
<A href="namespaces.html">Namespaces</A> ·
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">IDFilter </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Filters protein identification engine results by different criteria. </p>
<center></center><center><table class="doxtable">
<tr>
<td align="center" bgcolor="#EBEBEB">potential predecessor tools </td><td valign="middle" rowspan="5"><img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> IDFilter <img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> </td><td align="center" bgcolor="#EBEBEB">potential successor tools </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_MascotAdapter.html">MascotAdapter</a> (or other ID engines) </td><td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_PeptideIndexer.html">PeptideIndexer</a> </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_IDFileConverter.html">IDFileConverter</a> </td><td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_ProteinInference.html">ProteinInference</a> </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_FalseDiscoveryRate.html">FalseDiscoveryRate</a> </td><td valign="middle" align="center" rowspan="2"><a class="el" href="TOPP_IDMapper.html">IDMapper</a> </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_ConsensusID.html">ConsensusID</a> </td></tr>
</table>
</center><p>This tool is used to filter the identifications found by a peptide/protein identification tool like Mascot. Different filters can be applied:</p>
<p>To enable any of the filters, just change their default value. All active filters will be applied in order.</p>
<ul>
<li>
<b>score:pep</b>:<br/>
This parameter specifies which score a peptide hit should have to be kept. </li>
<li>
<b>score:prot</b>:<br/>
This parameter specifies which score a protein hit should have to be kept. </li>
<li>
<b>thresh:pep</b>:<br/>
This parameter specifies which amount of the significance threshold should be reached by a peptide to be kept. If for example a peptide has score 30 and the significance threshold is 40, the peptide will only be kept by the filter if the significance threshold fraction is set to 0.75 or lower. </li>
<li>
<b>thresh:prot</b>:<br/>
This parameter behaves in the same way as the peptide significance threshold fraction parameter. The only difference is that it is used to filter protein hits. </li>
<li>
<b>whitelist:proteins</b>:<br/>
If you know which proteins are in the measured sample you can specify a FASTA file which contains the protein sequences of those proteins. All peptides which are not a substring of a protein contained in the sequences file will be filtered out. The filtering is based on the protein identifiers attached to the peptide hits. Protein Hits not matching any FASTA protein are also removed.<br/>
If you want filtering using the sequence alone, then use the flag <em>WhiteList:by_seq_only</em>. </li>
<li>
<b>blacklist:peptides</b>:<br/>
For this option you specify an idXML file. All peptides that are present in both files (in-file and exclusion peptides file) will be dropped. Protein Hits are not affected. </li>
<li>
<b>rt</b>:<br/>
To filter identifications according to their predicted retention times you have to set 'rt:p_value' and/or 'rt:p_value_1st_dim' larger than 0, depending which RT dimension you want to filter. This filter can only be applied to idXML files produced by <a class="el" href="TOPP_RTPredict.html">RTPredict</a>. </li>
<li>
<b>best:n_peptide_hits</b>:<br/>
Only the best n peptide hits of a spectrum are kept. If two hits have the same score, their order is random. </li>
<li>
<b>best:n_protein_hits</b>:<br/>
Only the best n protein hits of a spectrum are kept. If two hits have the same score, their order is random. </li>
<li>
<b>best:strict</b>:<br/>
Only the best hit of a spectrum is kept. If there is more than one hit for a spectrum with the maximum score, then none of the hits will be kept. This is similar to n_peptide_hits=1, but if there are two or more highest scoring hits, none are kept. </li>
</ul>
<p><b>The command line parameters of this tool are:</b> </p>
<pre class="fragment">
IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976
Usage:
IDFilter <options>
Options (mandatory options marked with '*'):
-in <file>* Input file (valid formats: 'idXML')
-out <file>* Output file (valid formats: 'idXML')
Filtering by peptide/protein score. To enable any of the filters below, just change their default value. All
active filters will be applied in order.:
-score:pep <score> The score which should be reached by a peptide hit to be kept. The score
is dependent on the most recent(!) preprocessing - it could be Mascot
scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscov
eryRate was applied before), etc. (default: '0')
-score:prot <score> The score which should be reached by a protein hit to be kept. (default:
'0')
Filtering by significance threshold:
-thresh:pep <fraction> Keep a peptide hit only if its score is above this fraction of the peptid
e significance threshold. (default: '0')
-thresh:prot <fraction> Keep a protein hit only if its score is above this fraction of the protei
n significance threshold. (default: '0')
Filtering by whitelisting (only instances also present in a whitelist file can pass):
-whitelist:proteins <file> Filename of a FASTA file containing protein sequences.
All peptides that are not a substring of a sequence in this file are rem
oved
All proteins whose accession is not present in this file are removed. (v
alid formats: 'fasta')
-whitelist:by_seq_only Match peptides with FASTA file by sequence instead of accession and disab
le protein filtering.
Filtering by blacklisting (only instances not present in a blacklist file can pass):
-blacklist:peptides <file> Peptides having the same sequence as any peptide in this file will be
filtered out
(valid formats: 'idXML')
Filtering by RT predicted by 'RTPredict':
-rt:p_value <float> Retention time filtering by the p-value predicted by RTPredict. (default:
'0' min: '0' max: '1')
-rt:p_value_1st_dim <float> Retention time filtering by the p-value predicted by RTPredict for first
dimension. (default: '0' min: '0' max: '1')
Filtering by mz:
-mz:error <float> Filtering by deviation to theoretical mass (disabled for negative values)
. (default: '-1')
-mz:unit <String> Absolute or relativ error. (default: 'ppm' valid: 'Da', 'ppm')
Filtering best hits per spectrum (for peptides) or from proteins:
-best:n_peptide_hits <integer> Keep only the 'n' highest scoring peptide hits per spectrum (for n>0).
(default: '0' min: '0')
-best:n_protein_hits <integer> Keep only the 'n' highest scoring protein hits (for n>0). (default: '0'
min: '0')
-best:strict Keep only the highest scoring peptide hit.
Similar to n_peptide_hits=1, but if there are two or more highest scorin
g hits, none are kept.
-min_length <integer> Keep only peptide hits with a length greater or equal this value. Value
0 will have no filter effect. (default: '0' min: '0')
-max_length <integer> Keep only peptide hits with a length less or equal this value. Value 0
will have no filter effect. Value is overridden by min_length, i.e. if
max_length < min_length, max_length will be ignored. (default: '0' max:
'0')
-min_charge <integer> Keep only peptide hits for tandem spectra with charge greater or equal
this value. (default: '1' min: '1')
-var_mods Keep only peptide hits with variable modifications (fixed modifications
from SearchParameters will be ignored).
-unique If a peptide hit occurs more than once per PSM, only one instance is kept
.
-unique_per_protein Only peptides matching exactly one protein are kept. Remember that isofor
ms count as different proteins!
-keep_unreferenced_protein_hits Proteins not referenced by a peptide are retained in the idXML.
Common TOPP options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (default:
'1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)
</pre><p> <b>INI file documentation of this tool:</b> <div class="ini_global">
<div class="legend">
<b>Legend:</b><br>
<div class="item item_required">required parameter</div>
<div class="item item_advanced">advanced parameter</div>
</div>
<div class="node"><span class="node_name">+IDFilter</span><span class="node_description">Filters results from protein or peptide identification engines based on different criteria.</span></div>
<div class="item item_advanced"><span class="item_name" style="padding-left:16px;">version</span><span class="item_value">1.11.1</span>
<span class="item_description">Version of the tool that generated this parameters file.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="node"><span class="node_name">++1</span><span class="node_description">Instance '1' section for 'IDFilter'</span></div>
<div class="item"><span class="item_name item_required" style="padding-left:24px;">in</span><span class="item_value"></span>
<span class="item_description">input file </span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div> <div class="item"><span class="item_name item_required" style="padding-left:24px;">out</span><span class="item_value"></span>
<span class="item_description">output file </span><span class="item_tags">output file</span><span class="item_restrictions">*.idXML</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">min_length</span><span class="item_value">0</span>
<span class="item_description">Keep only peptide hits with a length greater or equal this value. Value 0 will have no filter effect.</span><span class="item_tags"></span><span class="item_restrictions">0:∞</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">max_length</span><span class="item_value">0</span>
<span class="item_description">Keep only peptide hits with a length less or equal this value. Value 0 will have no filter effect. Value is overridden by min_length, i.e. if max_length < min_length, max_length will be ignored.</span><span class="item_tags"></span><span class="item_restrictions">-∞:0</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">min_charge</span><span class="item_value">1</span>
<span class="item_description">Keep only peptide hits for tandem spectra with charge greater or equal this value.</span><span class="item_tags"></span><span class="item_restrictions">1:∞</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">var_mods</span><span class="item_value">false</span>
<span class="item_description">Keep only peptide hits with variable modifications (fixed modifications from SearchParameters will be ignored).</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">unique</span><span class="item_value">false</span>
<span class="item_description">If a peptide hit occurs more than once per PSM, only one instance is kept.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">unique_per_protein</span><span class="item_value">false</span>
<span class="item_description">Only peptides matching exactly one protein are kept. Remember that isoforms count as different proteins!</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">keep_unreferenced_protein_hits</span><span class="item_value">false</span>
<span class="item_description">Proteins not referenced by a peptide are retained in the idXML.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">log</span><span class="item_value"></span>
<span class="item_description">Name of log file (created only when specified)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">debug</span><span class="item_value">0</span>
<span class="item_description">Sets the debug level</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item"><span class="item_name" style="padding-left:24px;">threads</span><span class="item_value">1</span>
<span class="item_description">Sets the number of threads allowed to be used by the TOPP tool</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">no_progress</span><span class="item_value">false</span>
<span class="item_description">Disables progress logging to command line</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">test</span><span class="item_value">false</span>
<span class="item_description">Enables the test mode (needed for internal use only)</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="node"><span class="node_name">+++score</span><span class="node_description">Filtering by peptide/protein score. To enable any of the filters below, just change their default value. All active filters will be applied in order.</span></div>
<div class="item"><span class="item_name" style="padding-left:32px;">pep</span><span class="item_value">0</span>
<span class="item_description">The score which should be reached by a peptide hit to be kept. The score is dependent on the most recent(!) preprocessing - it could be Mascot scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscoveryRate was applied before), etc.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item"><span class="item_name" style="padding-left:32px;">prot</span><span class="item_value">0</span>
<span class="item_description">The score which should be reached by a protein hit to be kept.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="node"><span class="node_name">+++thresh</span><span class="node_description">Filtering by significance threshold</span></div>
<div class="item"><span class="item_name" style="padding-left:32px;">pep</span><span class="item_value">0</span>
<span class="item_description">Keep a peptide hit only if its score is above this fraction of the peptide significance threshold.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item"><span class="item_name" style="padding-left:32px;">prot</span><span class="item_value">0</span>
<span class="item_description">Keep a protein hit only if its score is above this fraction of the protein significance threshold.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="node"><span class="node_name">+++whitelist</span><span class="node_description">Filtering by whitelisting (only instances also present in a whitelist file can pass)</span></div>
<div class="item"><span class="item_name" style="padding-left:32px;">proteins</span><span class="item_value"></span>
<span class="item_description">filename of a FASTA file containing protein sequences.<br>All peptides that are not a substring of a sequence in this file are removed<br>All proteins whose accession is not present in this file are removed.</span><span class="item_tags">input file</span><span class="item_restrictions">*.fasta</span></div> <div class="item"><span class="item_name" style="padding-left:32px;">by_seq_only</span><span class="item_value">false</span>
<span class="item_description">Match peptides with FASTA file by sequence instead of accession and disable protein filtering.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="node"><span class="node_name">+++blacklist</span><span class="node_description">Filtering by blacklisting (only instances not present in a blacklist file can pass)</span></div>
<div class="item"><span class="item_name" style="padding-left:32px;">peptides</span><span class="item_value"></span>
<span class="item_description">Peptides having the same sequence as any peptide in this file will be filtered out<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div> <div class="node"><span class="node_name">+++rt</span><span class="node_description">Filtering by RT predicted by 'RTPredict'</span></div>
<div class="item"><span class="item_name" style="padding-left:32px;">p_value</span><span class="item_value">0</span>
<span class="item_description">Retention time filtering by the p-value predicted by RTPredict.</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div> <div class="item"><span class="item_name" style="padding-left:32px;">p_value_1st_dim</span><span class="item_value">0</span>
<span class="item_description">Retention time filtering by the p-value predicted by RTPredict for first dimension.</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div> <div class="node"><span class="node_name">+++mz</span><span class="node_description">Filtering by mz</span></div>
<div class="item"><span class="item_name" style="padding-left:32px;">error</span><span class="item_value">-1</span>
<span class="item_description">Filtering by deviation to theoretical mass (disabled for negative values).</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item"><span class="item_name" style="padding-left:32px;">unit</span><span class="item_value">ppm</span>
<span class="item_description">Absolute or relativ error.</span><span class="item_tags"></span><span class="item_restrictions">Da,ppm</span></div> <div class="node"><span class="node_name">+++best</span><span class="node_description">Filtering best hits per spectrum (for peptides) or from proteins</span></div>
<div class="item"><span class="item_name" style="padding-left:32px;">n_peptide_hits</span><span class="item_value">0</span>
<span class="item_description">Keep only the 'n' highest scoring peptide hits per spectrum (for n>0).</span><span class="item_tags"></span><span class="item_restrictions">0:∞</span></div> <div class="item"><span class="item_name" style="padding-left:32px;">n_protein_hits</span><span class="item_value">0</span>
<span class="item_description">Keep only the 'n' highest scoring protein hits (for n>0).</span><span class="item_tags"></span><span class="item_restrictions">0:∞</span></div> <div class="item"><span class="item_name" style="padding-left:32px;">strict</span><span class="item_value">false</span>
<span class="item_description">Keep only the highest scoring peptide hit.<br>Similar to n_peptide_hits=1, but if there are two or more highest scoring hits, none are kept.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">n_to_m_peptide_hits</span><span class="item_value">:</span>
<span class="item_description">peptide hit rank range to extracts</span><span class="item_tags"></span><span class="item_restrictions"> </span></div></div>
</div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>
|