File: TOPP_IDPosteriorErrorProbability.html

package info (click to toggle)
openms 1.11.1-5
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 436,688 kB
  • ctags: 150,907
  • sloc: cpp: 387,126; xml: 71,547; python: 7,764; ansic: 2,626; php: 2,499; sql: 737; ruby: 342; sh: 325; makefile: 128
file content (116 lines) | stat: -rw-r--r-- 14,044 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
<HTML>
<HEAD>
<TITLE>IDPosteriorErrorProbability</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> &nbsp;&middot;
<A href="classes.html">Classes</A> &nbsp;&middot;
<A href="annotated.html">Annotated Classes</A> &nbsp;&middot;
<A href="modules.html">Modules</A> &nbsp;&middot;
<A href="functions_func.html">Members</A> &nbsp;&middot;
<A href="namespaces.html">Namespaces</A> &nbsp;&middot;
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
  <div class="headertitle">
<div class="title">IDPosteriorErrorProbability </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Tool to estimate the probability of peptide hits to be incorrectly assigned.</p>
<center> <table class="doxtable">
<tr>
<td align="center" bgcolor="#EBEBEB">potential predecessor tools  </td><td valign="middle" rowspan="2"><img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> IDPosteriorErrorProbability <img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> </td><td align="center" bgcolor="#EBEBEB">potential successor tools   </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_MascotAdapter.html">MascotAdapter</a> (or other ID engines)  </td><td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_ConsensusID.html">ConsensusID</a>   </td></tr>
</table>
</center><dl class="experimental"><dt><b><a class="el" href="experimental.html#_experimental000011">Experimental classes:</a></b></dt><dd>This tool has not been tested thoroughly and might behave not as expected!</dd></dl>
<p>By default an estimation is performed using the (inverse) Gumbel distribution for incorrectly assigned sequences and a Gaussian distribution for correctly assigned sequences. The probabilities are calculated by using Bayes' law, similar to PeptideProphet. Alternatively, a second Gaussian distribution can be used for incorrectly assigned sequences. At the moment, IDPosteriorErrorProbability is able to handle X!Tandem, Mascot, MyriMatch and OMSSA scores.</p>
<p>No target/decoy information needs to be provided, since the model fits are done on the mixed distribution.</p>
<p>In order to validate the computed probabilities one can adjust the fit_algorithm subsection.</p>
<p>There are three parameters for the plot: The parameter 'output_plots' is by default false. If set to true the plot will be created. The scores are plotted in form of bins. Each bin represents a set of scores in a range of (highest_score - smallest_score)/number_of_bins (if all scores have positive values). The midpoint of the bin is the mean of the scores it represents. Finally, the parameter output_name should be used to give the plot a unique name. Two files are created. One with the binned scores and one with all steps of the estimation. If top_hits_only is set, only the top hits of each PeptideIndentification are used for the estimation process. Additionally, if 'top_hits_only' is set, target_decoy information are available and a False Discovery Rate run was performed before, an additional plot will be plotted with target and decoy bins(output_plot must be true in fit_algorithm subsection). A peptide hit is assumed to be a target if its q-value is smaller than fdr_for_targets_smaller.</p>
<p>Actually, the plots are saved as a gnuplot file. Therefore, to visualize the plots one has to use gnuplot, e.g. gnuplot file_name. This should output a postscript file which contains all steps of the estimation.</p>
<p><b>The command line parameters of this tool are:</b> </p>
<pre class="fragment">
IDPosteriorErrorProbability -- Estimates probabilities for incorrectly assigned peptide sequences and a set 
of search engine scores using a mixture model.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  IDPosteriorErrorProbability &lt;options&gt;

This tool has algoritm parameters that are not shown here! Please check the ini file for a detailed descripti
on or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in &lt;file&gt;*           Input file  (valid formats: 'idXML')
  -out &lt;file&gt;*          Output file  (valid formats: 'idXML')
  -output_name &lt;file&gt;*  Gnuplot file as txt (valid formats: 'txt')
  -split_charge         The search engine scores are split by charge if this flag is set. Thus, for each char
                        ge state a new model will be computed.
  -top_hits_only        If set only the top hits of every PeptideIdentification will be used
  -ignore_bad_data      If set errors will be written but ignored. Useful for pipelines with many datasets 
                        where only a few are bad, but the pipeline should run through.
  -prob_correct         If set scores will be calculated as 1-ErrorProbabilities and can be interpreted as 
                        probabilities for correct identifications.
                        
                        
Common TOPP options:
  -ini &lt;file&gt;           Use the given TOPP INI file
  -threads &lt;n&gt;          Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini &lt;file&gt;     Writes the default configuration file
  --help                Shows options
  --helphelp            Shows all options (including advanced)

The following configuration subsections are valid:
 - fit_algorithm   Algorithm parameter subsection

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

</pre><p> <b>INI file documentation of this tool:</b> <div class="ini_global">
<div class="legend">
<b>Legend:</b><br>
 <div class="item item_required">required parameter</div>
 <div class="item item_advanced">advanced parameter</div>
</div>
  <div class="node"><span class="node_name">+IDPosteriorErrorProbability</span><span class="node_description">Estimates probabilities for incorrectly assigned peptide sequences and a set of search engine scores using a mixture model.</span></div>
    <div class="item item_advanced"><span class="item_name" style="padding-left:16px;">version</span><span class="item_value">1.11.1</span>
<span class="item_description">Version of the tool that generated this parameters file.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>    <div class="node"><span class="node_name">++1</span><span class="node_description">Instance '1' section for 'IDPosteriorErrorProbability'</span></div>
      <div class="item"><span class="item_name item_required" style="padding-left:24px;">in</span><span class="item_value"></span>
<span class="item_description">input file </span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name item_required" style="padding-left:24px;">out</span><span class="item_value"></span>
<span class="item_description">output file </span><span class="item_tags">output file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name item_required" style="padding-left:24px;">output_name</span><span class="item_value"></span>
<span class="item_description">gnuplot file as txt</span><span class="item_tags">output file</span><span class="item_restrictions">*.txt</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">smallest_e_value</span><span class="item_value">1e-19</span>
<span class="item_description">This value gives a lower bound to E-Values. It should not be 0, as transformation in a real number (log of E-value) is not possible for certain values then.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">split_charge</span><span class="item_value">false</span>
<span class="item_description">The search engine scores are split by charge if this flag is set. Thus, for each charge state a new model will be computed.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">top_hits_only</span><span class="item_value">false</span>
<span class="item_description">If set only the top hits of every PeptideIdentification will be used</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">fdr_for_targets_smaller</span><span class="item_value">0.05</span>
<span class="item_description">Only used, when top_hits_only set. Additionally, target_decoy information should be available. The score_type must be q-value from an previous False Discovery Rate run.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">ignore_bad_data</span><span class="item_value">false</span>
<span class="item_description">If set errors will be written but ignored. Useful for pipelines with many datasets where only a few are bad, but the pipeline should run through.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">prob_correct</span><span class="item_value">false</span>
<span class="item_description">If set scores will be calculated as 1-ErrorProbabilities and can be interpreted as probabilities for correct identifications.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">log</span><span class="item_value"></span>
<span class="item_description">Name of log file (created only when specified)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">debug</span><span class="item_value">0</span>
<span class="item_description">Sets the debug level</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">threads</span><span class="item_value">1</span>
<span class="item_description">Sets the number of threads allowed to be used by the TOPP tool</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">no_progress</span><span class="item_value">false</span>
<span class="item_description">Disables progress logging to command line</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">test</span><span class="item_value">false</span>
<span class="item_description">Enables the test mode (needed for internal use only)</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="node"><span class="node_name">+++fit_algorithm</span><span class="node_description">Algorithm parameter subsection</span></div>
        <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">number_of_bins</span><span class="item_value">100</span>
<span class="item_description">Number of bins used for visualization. Only needed if each iteration step of the EM-Algorithm will be visualized</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">output_plots</span><span class="item_value">false</span>
<span class="item_description">If true every step of the EM-algorithm will be written to a file as a gnuplot formula</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>        <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">output_name</span><span class="item_value"></span>
<span class="item_description">If output_plots is on, the output files will be saved in the following manner: <output_name>scores.txt for the scores and <output_name> which contains each step of the EM-algorithm e.g. output_name = /usr/home/OMSSA123 then /usr/home/OMSSA123_scores.txt, /usr/home/OMSSA123 will be written. If no directory is specified, e.g. instead of '/usr/home/OMSSA123' just OMSSA123, the files will be written into the working directory.</span><span class="item_tags">output file</span><span class="item_restrictions"> </span></div>        <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">incorrectly_assigned</span><span class="item_value">Gumbel</span>
<span class="item_description">for 'Gumbel', the Gumbel distribution is used to plot incorrectly assigned sequences. For 'Gauss', the Gauss distribution is used.</span><span class="item_tags"></span><span class="item_restrictions">Gumbel,Gauss</span></div></div>
<p>For the parameters of the algorithm section see the algorithms documentation: <br/>
 <a class="el" href="classOpenMS_1_1Math_1_1PosteriorErrorProbabilityModel.html">fit_algorithm</a> <br/>
</p>
</div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>