1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
|
<HTML>
<HEAD>
<TITLE>IDPosteriorErrorProbability</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> ·
<A href="classes.html">Classes</A> ·
<A href="annotated.html">Annotated Classes</A> ·
<A href="modules.html">Modules</A> ·
<A href="functions_func.html">Members</A> ·
<A href="namespaces.html">Namespaces</A> ·
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">IDPosteriorErrorProbability </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Tool to estimate the probability of peptide hits to be incorrectly assigned.</p>
<center> <table class="doxtable">
<tr>
<td align="center" bgcolor="#EBEBEB">potential predecessor tools </td><td valign="middle" rowspan="2"><img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> IDPosteriorErrorProbability <img class="formulaInl" alt="$ \longrightarrow $" src="form_91.png"/> </td><td align="center" bgcolor="#EBEBEB">potential successor tools </td></tr>
<tr>
<td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_MascotAdapter.html">MascotAdapter</a> (or other ID engines) </td><td valign="middle" align="center" rowspan="1"><a class="el" href="TOPP_ConsensusID.html">ConsensusID</a> </td></tr>
</table>
</center><dl class="experimental"><dt><b><a class="el" href="experimental.html#_experimental000011">Experimental classes:</a></b></dt><dd>This tool has not been tested thoroughly and might behave not as expected!</dd></dl>
<p>By default an estimation is performed using the (inverse) Gumbel distribution for incorrectly assigned sequences and a Gaussian distribution for correctly assigned sequences. The probabilities are calculated by using Bayes' law, similar to PeptideProphet. Alternatively, a second Gaussian distribution can be used for incorrectly assigned sequences. At the moment, IDPosteriorErrorProbability is able to handle X!Tandem, Mascot, MyriMatch and OMSSA scores.</p>
<p>No target/decoy information needs to be provided, since the model fits are done on the mixed distribution.</p>
<p>In order to validate the computed probabilities one can adjust the fit_algorithm subsection.</p>
<p>There are three parameters for the plot: The parameter 'output_plots' is by default false. If set to true the plot will be created. The scores are plotted in form of bins. Each bin represents a set of scores in a range of (highest_score - smallest_score)/number_of_bins (if all scores have positive values). The midpoint of the bin is the mean of the scores it represents. Finally, the parameter output_name should be used to give the plot a unique name. Two files are created. One with the binned scores and one with all steps of the estimation. If top_hits_only is set, only the top hits of each PeptideIndentification are used for the estimation process. Additionally, if 'top_hits_only' is set, target_decoy information are available and a False Discovery Rate run was performed before, an additional plot will be plotted with target and decoy bins(output_plot must be true in fit_algorithm subsection). A peptide hit is assumed to be a target if its q-value is smaller than fdr_for_targets_smaller.</p>
<p>Actually, the plots are saved as a gnuplot file. Therefore, to visualize the plots one has to use gnuplot, e.g. gnuplot file_name. This should output a postscript file which contains all steps of the estimation.</p>
<p><b>The command line parameters of this tool are:</b> </p>
<pre class="fragment">
IDPosteriorErrorProbability -- Estimates probabilities for incorrectly assigned peptide sequences and a set
of search engine scores using a mixture model.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976
Usage:
IDPosteriorErrorProbability <options>
This tool has algoritm parameters that are not shown here! Please check the ini file for a detailed descripti
on or use the --helphelp option.
Options (mandatory options marked with '*'):
-in <file>* Input file (valid formats: 'idXML')
-out <file>* Output file (valid formats: 'idXML')
-output_name <file>* Gnuplot file as txt (valid formats: 'txt')
-split_charge The search engine scores are split by charge if this flag is set. Thus, for each char
ge state a new model will be computed.
-top_hits_only If set only the top hits of every PeptideIdentification will be used
-ignore_bad_data If set errors will be written but ignored. Useful for pipelines with many datasets
where only a few are bad, but the pipeline should run through.
-prob_correct If set scores will be calculated as 1-ErrorProbabilities and can be interpreted as
probabilities for correct identifications.
Common TOPP options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)
The following configuration subsections are valid:
- fit_algorithm Algorithm parameter subsection
You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.
</pre><p> <b>INI file documentation of this tool:</b> <div class="ini_global">
<div class="legend">
<b>Legend:</b><br>
<div class="item item_required">required parameter</div>
<div class="item item_advanced">advanced parameter</div>
</div>
<div class="node"><span class="node_name">+IDPosteriorErrorProbability</span><span class="node_description">Estimates probabilities for incorrectly assigned peptide sequences and a set of search engine scores using a mixture model.</span></div>
<div class="item item_advanced"><span class="item_name" style="padding-left:16px;">version</span><span class="item_value">1.11.1</span>
<span class="item_description">Version of the tool that generated this parameters file.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="node"><span class="node_name">++1</span><span class="node_description">Instance '1' section for 'IDPosteriorErrorProbability'</span></div>
<div class="item"><span class="item_name item_required" style="padding-left:24px;">in</span><span class="item_value"></span>
<span class="item_description">input file </span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div> <div class="item"><span class="item_name item_required" style="padding-left:24px;">out</span><span class="item_value"></span>
<span class="item_description">output file </span><span class="item_tags">output file</span><span class="item_restrictions">*.idXML</span></div> <div class="item"><span class="item_name item_required" style="padding-left:24px;">output_name</span><span class="item_value"></span>
<span class="item_description">gnuplot file as txt</span><span class="item_tags">output file</span><span class="item_restrictions">*.txt</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">smallest_e_value</span><span class="item_value">1e-19</span>
<span class="item_description">This value gives a lower bound to E-Values. It should not be 0, as transformation in a real number (log of E-value) is not possible for certain values then.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item"><span class="item_name" style="padding-left:24px;">split_charge</span><span class="item_value">false</span>
<span class="item_description">The search engine scores are split by charge if this flag is set. Thus, for each charge state a new model will be computed.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">top_hits_only</span><span class="item_value">false</span>
<span class="item_description">If set only the top hits of every PeptideIdentification will be used</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">fdr_for_targets_smaller</span><span class="item_value">0.05</span>
<span class="item_description">Only used, when top_hits_only set. Additionally, target_decoy information should be available. The score_type must be q-value from an previous False Discovery Rate run.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item"><span class="item_name" style="padding-left:24px;">ignore_bad_data</span><span class="item_value">false</span>
<span class="item_description">If set errors will be written but ignored. Useful for pipelines with many datasets where only a few are bad, but the pipeline should run through.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item"><span class="item_name" style="padding-left:24px;">prob_correct</span><span class="item_value">false</span>
<span class="item_description">If set scores will be calculated as 1-ErrorProbabilities and can be interpreted as probabilities for correct identifications.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">log</span><span class="item_value"></span>
<span class="item_description">Name of log file (created only when specified)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">debug</span><span class="item_value">0</span>
<span class="item_description">Sets the debug level</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item"><span class="item_name" style="padding-left:24px;">threads</span><span class="item_value">1</span>
<span class="item_description">Sets the number of threads allowed to be used by the TOPP tool</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">no_progress</span><span class="item_value">false</span>
<span class="item_description">Disables progress logging to command line</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">test</span><span class="item_value">false</span>
<span class="item_description">Enables the test mode (needed for internal use only)</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="node"><span class="node_name">+++fit_algorithm</span><span class="node_description">Algorithm parameter subsection</span></div>
<div class="item item_advanced"><span class="item_name" style="padding-left:32px;">number_of_bins</span><span class="item_value">100</span>
<span class="item_description">Number of bins used for visualization. Only needed if each iteration step of the EM-Algorithm will be visualized</span><span class="item_tags"></span><span class="item_restrictions"> </span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">output_plots</span><span class="item_value">false</span>
<span class="item_description">If true every step of the EM-algorithm will be written to a file as a gnuplot formula</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">output_name</span><span class="item_value"></span>
<span class="item_description">If output_plots is on, the output files will be saved in the following manner: <output_name>scores.txt for the scores and <output_name> which contains each step of the EM-algorithm e.g. output_name = /usr/home/OMSSA123 then /usr/home/OMSSA123_scores.txt, /usr/home/OMSSA123 will be written. If no directory is specified, e.g. instead of '/usr/home/OMSSA123' just OMSSA123, the files will be written into the working directory.</span><span class="item_tags">output file</span><span class="item_restrictions"> </span></div> <div class="item item_advanced"><span class="item_name" style="padding-left:32px;">incorrectly_assigned</span><span class="item_value">Gumbel</span>
<span class="item_description">for 'Gumbel', the Gumbel distribution is used to plot incorrectly assigned sequences. For 'Gauss', the Gauss distribution is used.</span><span class="item_tags"></span><span class="item_restrictions">Gumbel,Gauss</span></div></div>
<p>For the parameters of the algorithm section see the algorithms documentation: <br/>
<a class="el" href="classOpenMS_1_1Math_1_1PosteriorErrorProbabilityModel.html">fit_algorithm</a> <br/>
</p>
</div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>
|