File: TOPP_IDFilter.html

package info (click to toggle)
openms 1.11.1-5
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 436,688 kB
  • ctags: 150,907
  • sloc: cpp: 387,126; xml: 71,547; python: 7,764; ansic: 2,626; php: 2,499; sql: 737; ruby: 342; sh: 325; makefile: 128
file content (211 lines) | stat: -rw-r--r-- 22,949 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
<HTML>
<HEAD>
<TITLE>IDFilter</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> &nbsp;&middot;
<A href="classes.html">Classes</A> &nbsp;&middot;
<A href="annotated.html">Annotated Classes</A> &nbsp;&middot;
<A href="modules.html">Modules</A> &nbsp;&middot;
<A href="functions_func.html">Members</A> &nbsp;&middot;
<A href="namespaces.html">Namespaces</A> &nbsp;&middot;
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
  <div class="headertitle">
<div class="title">IDFilter </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Filters protein identification engine results by different criteria. </p>
<center></center><center><table class="doxtable">
<tr>
<td align="center" bgcolor="#EBEBEB">potential predecessor tools  </td><td valign="middle" rowspan="5"><img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> IDFilter <img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> </td><td align="center" bgcolor="#EBEBEB">potential successor tools   </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_MascotAdapter.html">MascotAdapter</a> (or other ID engines)  </td><td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_PeptideIndexer.html">PeptideIndexer</a>   </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_IDFileConverter.html">IDFileConverter</a>  </td><td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_ProteinInference.html">ProteinInference</a>   </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_FalseDiscoveryRate.html">FalseDiscoveryRate</a>  </td><td valign="middle" align="center" rowspan="2"><a class="el" href="TOPP_IDMapper.html">IDMapper</a>   </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_ConsensusID.html">ConsensusID</a>   </td></tr>
</table>
</center><p>This tool is used to filter the identifications found by a peptide/protein identification tool like Mascot. Different filters can be applied:</p>
<p>To enable any of the filters, just change their default value. All active filters will be applied in order.</p>
<ul>
<li>
<b>score:pep</b>:<br/>
 This parameter specifies which score a peptide hit should have to be kept.  </li>
<li>
<b>score:prot</b>:<br/>
 This parameter specifies which score a protein hit should have to be kept.  </li>
<li>
<b>thresh:pep</b>:<br/>
 This parameter specifies which amount of the significance threshold should be reached by a peptide to be kept. If for example a peptide has score 30 and the significance threshold is 40, the peptide will only be kept by the filter if the significance threshold fraction is set to 0.75 or lower.  </li>
<li>
<b>thresh:prot</b>:<br/>
 This parameter behaves in the same way as the peptide significance threshold fraction parameter. The only difference is that it is used to filter protein hits.  </li>
<li>
<b>whitelist:proteins</b>:<br/>
 If you know which proteins are in the measured sample you can specify a FASTA file which contains the protein sequences of those proteins. All peptides which are not a substring of a protein contained in the sequences file will be filtered out. The filtering is based on the protein identifiers attached to the peptide hits. Protein Hits not matching any FASTA protein are also removed.<br/>
 If you want filtering using the sequence alone, then use the flag <em>WhiteList:by_seq_only</em>.  </li>
<li>
<b>blacklist:peptides</b>:<br/>
 For this option you specify an idXML file. All peptides that are present in both files (in-file and exclusion peptides file) will be dropped. Protein Hits are not affected.  </li>
<li>
<b>rt</b>:<br/>
 To filter identifications according to their predicted retention times you have to set 'rt:p_value' and/or 'rt:p_value_1st_dim' larger than 0, depending which RT dimension you want to filter. This filter can only be applied to idXML files produced by <a class="el" href="TOPP_RTPredict.html">RTPredict</a>.  </li>
<li>
<b>best:n_peptide_hits</b>:<br/>
 Only the best n peptide hits of a spectrum are kept. If two hits have the same score, their order is random.  </li>
<li>
<b>best:n_protein_hits</b>:<br/>
 Only the best n protein hits of a spectrum are kept. If two hits have the same score, their order is random.  </li>
<li>
<b>best:strict</b>:<br/>
 Only the best hit of a spectrum is kept. If there is more than one hit for a spectrum with the maximum score, then none of the hits will be kept. This is similar to n_peptide_hits=1, but if there are two or more highest scoring hits, none are kept.  </li>
</ul>
<p><b>The command line parameters of this tool are:</b> </p>
<pre class="fragment">
IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  IDFilter &lt;options&gt;

Options (mandatory options marked with '*'):
  -in &lt;file&gt;*                       Input file  (valid formats: 'idXML')
  -out &lt;file&gt;*                      Output file  (valid formats: 'idXML')

Filtering by peptide/protein score. To enable any of the filters below, just change their default value. All 
active filters will be applied in order.:
  -score:pep &lt;score&gt;                The score which should be reached by a peptide hit to be kept. The score 
                                    is dependent on the most recent(!) preprocessing - it could be Mascot
                                    scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscov
                                    eryRate was applied before), etc. (default: '0')
  -score:prot &lt;score&gt;               The score which should be reached by a protein hit to be kept. (default: 
                                    '0')

Filtering by significance threshold:
  -thresh:pep &lt;fraction&gt;            Keep a peptide hit only if its score is above this fraction of the peptid
                                    e significance threshold. (default: '0')
  -thresh:prot &lt;fraction&gt;           Keep a protein hit only if its score is above this fraction of the protei
                                    n significance threshold. (default: '0')

Filtering by whitelisting (only instances also present in a whitelist file can pass):
  -whitelist:proteins &lt;file&gt;        Filename of a FASTA file containing protein sequences.
                                    All peptides that are not a substring of a sequence in this file are rem
                                    oved
                                    All proteins whose accession is not present in this file are removed. (v
                                    alid formats: 'fasta')
  -whitelist:by_seq_only            Match peptides with FASTA file by sequence instead of accession and disab
                                    le protein filtering.

Filtering by blacklisting (only instances not present in a blacklist file can pass):
  -blacklist:peptides &lt;file&gt;        Peptides having the same sequence as any peptide in this file will be 
                                    filtered out
                                    (valid formats: 'idXML')

Filtering by RT predicted by 'RTPredict':
  -rt:p_value &lt;float&gt;               Retention time filtering by the p-value predicted by RTPredict. (default:
                                    '0' min: '0' max: '1')
  -rt:p_value_1st_dim &lt;float&gt;       Retention time filtering by the p-value predicted by RTPredict for first 
                                    dimension. (default: '0' min: '0' max: '1')

Filtering by mz:
  -mz:error &lt;float&gt;                 Filtering by deviation to theoretical mass (disabled for negative values)
                                    . (default: '-1')
  -mz:unit &lt;String&gt;                 Absolute or relativ error. (default: 'ppm' valid: 'Da', 'ppm')

Filtering best hits per spectrum (for peptides) or from proteins:
  -best:n_peptide_hits &lt;integer&gt;    Keep only the 'n' highest scoring peptide hits per spectrum (for n&gt;0). 
                                    (default: '0' min: '0')
  -best:n_protein_hits &lt;integer&gt;    Keep only the 'n' highest scoring protein hits (for n&gt;0). (default: '0' 
                                    min: '0')
  -best:strict                      Keep only the highest scoring peptide hit.
                                    Similar to n_peptide_hits=1, but if there are two or more highest scorin
                                    g hits, none are kept.

  -min_length &lt;integer&gt;             Keep only peptide hits with a length greater or equal this value. Value 
                                    0 will have no filter effect. (default: '0' min: '0')
  -max_length &lt;integer&gt;             Keep only peptide hits with a length less or equal this value. Value 0 
                                    will have no filter effect. Value is overridden by min_length, i.e. if
                                    max_length &lt; min_length, max_length will be ignored. (default: '0' max:
                                    '0')
  -min_charge &lt;integer&gt;             Keep only peptide hits for tandem spectra with charge greater or equal 
                                    this value. (default: '1' min: '1')
  -var_mods                         Keep only peptide hits with variable modifications (fixed modifications 
                                    from SearchParameters will be ignored).
  -unique                           If a peptide hit occurs more than once per PSM, only one instance is kept
                                    .
  -unique_per_protein               Only peptides matching exactly one protein are kept. Remember that isofor
                                    ms count as different proteins!
  -keep_unreferenced_protein_hits   Proteins not referenced by a peptide are retained in the idXML.
                                    
Common TOPP options:
  -ini &lt;file&gt;                       Use the given TOPP INI file
  -threads &lt;n&gt;                      Sets the number of threads allowed to be used by the TOPP tool (default: 
                                    '1')
  -write_ini &lt;file&gt;                 Writes the default configuration file
  --help                            Shows options
  --helphelp                        Shows all options (including advanced)

</pre><p> <b>INI file documentation of this tool:</b> <div class="ini_global">
<div class="legend">
<b>Legend:</b><br>
 <div class="item item_required">required parameter</div>
 <div class="item item_advanced">advanced parameter</div>
</div>
  <div class="node"><span class="node_name">+IDFilter</span><span class="node_description">Filters results from protein or peptide identification engines based on different criteria.</span></div>
    <div class="item item_advanced"><span class="item_name" style="padding-left:16px;">version</span><span class="item_value">1.11.1</span>
<span class="item_description">Version of the tool that generated this parameters file.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>    <div class="node"><span class="node_name">++1</span><span class="node_description">Instance '1' section for 'IDFilter'</span></div>
      <div class="item"><span class="item_name item_required" style="padding-left:24px;">in</span><span class="item_value"></span>
<span class="item_description">input file </span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name item_required" style="padding-left:24px;">out</span><span class="item_value"></span>
<span class="item_description">output file </span><span class="item_tags">output file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">min_length</span><span class="item_value">0</span>
<span class="item_description">Keep only peptide hits with a length greater or equal this value. Value 0 will have no filter effect.</span><span class="item_tags"></span><span class="item_restrictions">0:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">max_length</span><span class="item_value">0</span>
<span class="item_description">Keep only peptide hits with a length less or equal this value. Value 0 will have no filter effect. Value is overridden by min_length, i.e. if max_length < min_length, max_length will be ignored.</span><span class="item_tags"></span><span class="item_restrictions">-&#8734;:0</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">min_charge</span><span class="item_value">1</span>
<span class="item_description">Keep only peptide hits for tandem spectra with charge greater or equal this value.</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">var_mods</span><span class="item_value">false</span>
<span class="item_description">Keep only peptide hits with variable modifications (fixed modifications from SearchParameters will be ignored).</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">unique</span><span class="item_value">false</span>
<span class="item_description">If a peptide hit occurs more than once per PSM, only one instance is kept.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">unique_per_protein</span><span class="item_value">false</span>
<span class="item_description">Only peptides matching exactly one protein are kept. Remember that isoforms count as different proteins!</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">keep_unreferenced_protein_hits</span><span class="item_value">false</span>
<span class="item_description">Proteins not referenced by a peptide are retained in the idXML.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">log</span><span class="item_value"></span>
<span class="item_description">Name of log file (created only when specified)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">debug</span><span class="item_value">0</span>
<span class="item_description">Sets the debug level</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">threads</span><span class="item_value">1</span>
<span class="item_description">Sets the number of threads allowed to be used by the TOPP tool</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">no_progress</span><span class="item_value">false</span>
<span class="item_description">Disables progress logging to command line</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">test</span><span class="item_value">false</span>
<span class="item_description">Enables the test mode (needed for internal use only)</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="node"><span class="node_name">+++score</span><span class="node_description">Filtering by peptide/protein score. To enable any of the filters below, just change their default value. All active filters will be applied in order.</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">pep</span><span class="item_value">0</span>
<span class="item_description">The score which should be reached by a peptide hit to be kept. The score is dependent on the most recent(!) preprocessing - it could be Mascot scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscoveryRate was applied before), etc.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">prot</span><span class="item_value">0</span>
<span class="item_description">The score which should be reached by a protein hit to be kept.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="node"><span class="node_name">+++thresh</span><span class="node_description">Filtering by significance threshold</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">pep</span><span class="item_value">0</span>
<span class="item_description">Keep a peptide hit only if its score is above this fraction of the peptide significance threshold.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">prot</span><span class="item_value">0</span>
<span class="item_description">Keep a protein hit only if its score is above this fraction of the protein significance threshold.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="node"><span class="node_name">+++whitelist</span><span class="node_description">Filtering by whitelisting (only instances also present in a whitelist file can pass)</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">proteins</span><span class="item_value"></span>
<span class="item_description">filename of a FASTA file containing protein sequences.<br>All peptides that are not a substring of a sequence in this file are removed<br>All proteins whose accession is not present in this file are removed.</span><span class="item_tags">input file</span><span class="item_restrictions">*.fasta</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">by_seq_only</span><span class="item_value">false</span>
<span class="item_description">Match peptides with FASTA file by sequence instead of accession and disable protein filtering.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="node"><span class="node_name">+++blacklist</span><span class="node_description">Filtering by blacklisting (only instances not present in a blacklist file can pass)</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">peptides</span><span class="item_value"></span>
<span class="item_description">Peptides having the same sequence as any peptide in this file will be filtered out<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div>      <div class="node"><span class="node_name">+++rt</span><span class="node_description">Filtering by RT predicted by 'RTPredict'</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">p_value</span><span class="item_value">0</span>
<span class="item_description">Retention time filtering by the p-value predicted by RTPredict.</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">p_value_1st_dim</span><span class="item_value">0</span>
<span class="item_description">Retention time filtering by the p-value predicted by RTPredict for first dimension.</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>      <div class="node"><span class="node_name">+++mz</span><span class="node_description">Filtering by mz</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">error</span><span class="item_value">-1</span>
<span class="item_description">Filtering by deviation to theoretical mass (disabled for negative values).</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">unit</span><span class="item_value">ppm</span>
<span class="item_description">Absolute or relativ error.</span><span class="item_tags"></span><span class="item_restrictions">Da,ppm</span></div>      <div class="node"><span class="node_name">+++best</span><span class="node_description">Filtering best hits per spectrum (for peptides) or from proteins</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">n_peptide_hits</span><span class="item_value">0</span>
<span class="item_description">Keep only the 'n' highest scoring peptide hits per spectrum (for n>0).</span><span class="item_tags"></span><span class="item_restrictions">0:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">n_protein_hits</span><span class="item_value">0</span>
<span class="item_description">Keep only the 'n' highest scoring protein hits (for n>0).</span><span class="item_tags"></span><span class="item_restrictions">0:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">strict</span><span class="item_value">false</span>
<span class="item_description">Keep only the highest scoring peptide hit.<br>Similar to n_peptide_hits=1, but if there are two or more highest scoring hits, none are kept.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>        <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">n_to_m_peptide_hits</span><span class="item_value">:</span>
<span class="item_description">peptide hit rank range to extracts</span><span class="item_tags"></span><span class="item_restrictions"> </span></div></div>
 </div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>