File: TOPP_RTModel.html

package info (click to toggle)
openms 1.11.1-5
links: PTS, VCS
area: main
in suites: jessie, jessie-kfreebsd
size: 436,688 kB
ctags: 150,907
sloc: cpp: 387,126; xml: 71,547; python: 7,764; ansic: 2,626; php: 2,499; sql: 737; ruby: 342; sh: 325; makefile: 128
file content (197 lines) | stat: -rw-r--r-- 22,804 bytes
<HTML>
<HEAD>
<TITLE>RTModel</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> &nbsp;&middot;
<A href="classes.html">Classes</A> &nbsp;&middot;
<A href="annotated.html">Annotated Classes</A> &nbsp;&middot;
<A href="modules.html">Modules</A> &nbsp;&middot;
<A href="functions_func.html">Members</A> &nbsp;&middot;
<A href="namespaces.html">Namespaces</A> &nbsp;&middot;
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
  <div class="headertitle">
<div class="title">RTModel </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Used to train a model for peptide retention time prediction or peptide separation prediction.</p>
<p>For retention time prediction, a support vector machine is trained with peptide sequences and their measured retention times. For peptide separation prediction, two files have to be given: One file contains the positive examples (the peptides which are collected) and the other contains the negative examples (the flowthrough peptides).</p>
<p>These methods and applications of this model are described in the following publications:</p>
<p>Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Statistical learning of peptide retention behavior in chromatographic separations: A new kernel-based approach for computational proteomics. BMC Bioinformatics 2007, 8:468</p>
<p>Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Improving Peptide Identification in Proteome Analysis by a Two-Dimensional Retention Time Filtering Approach J. Proteome Res. 2009, 8(8):4109-15</p>
<p>There are a number of parameters which can be changed for the svm (specified in the ini file and command line): </p>
<ul>
<li>
svm_type: the type of the svm (can be NU_SVR or EPSILON_SVR for RT prediction and is C_SVC for separation prediction)  </li>
<li>
kernel_type: the kernel function (e.g., POLY for the polynomial kernel, LINEAR for the linear kernel or RBF for the gaussian kernel); we recommend SVMWrapper::OLIGO for our paired oligo-border kernel (POBK)  </li>
<li>
border_length: border length for the POBK  </li>
<li>
k_mer_length: length of the signals considered in the POBK  </li>
<li>
sigma: the amount of positional smoothing for the POBK  </li>
<li>
degree: the degree parameter for the polynomial kernel  </li>
<li>
c: the penalty parameter of the svm  </li>
<li>
nu: the nu parameter for nu-SVR  </li>
<li>
p: the epsilon parameter for epsilon-SVR  </li>
</ul>
<p><br/>
</p>
<p>The last five parameters (sigma, degree, c, nu and p) can be used in a cross validation (CV) to find the best parameters according to the training set. Therefore you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation for example for the parameter c, enable CV (across all 5 parameters) and set <em>skip_cv</em> to <b>false</b> in the INI file. This can be easily done with using the INIFileEditor.</p>
<p>Furthermore, you can specify the number of partitions for the CV with <b>number_of_partitions</b> in the ini file and the number of runs with <b>number_of_runs</b>.</p>
<p><br/>
 Consequently you have two choices to use this application:</p>
<ol>
<li>
Set the parameters of the svm: The RTModel application will train the svm with the training data and store the svm model  </li>
<li>
Give a range of parameters for which a CV should be performed: The RTModel application will perform a CV to find the best parameter combination in the given range and afterwards train the svm with the best parameters and the whole training data. Then the model is stored.  </li>
</ol>
<p><br/>
 The model can be used in <a class="el" href="TOPP_RTPredict.html">RTPredict</a>, to predict retention times for peptides or peptide separation depending on how you trained the model.</p>
<p><b>The command line parameters of this tool are:</b> </p>
<pre class="fragment">
RTModel -- Trains a model for the retention time prediction of peptides from a training set.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  RTModel &lt;options&gt;

Options (mandatory options marked with '*'):
  -in &lt;file&gt;                      This is the name of the input file (RT prediction). It is assumed that the 
                                  file type is idXML. Alternatively you can provide a .txt file having a sequ
                                  ence and the corresponding rt per line.
                                  (valid formats: 'idXML', 'txt')
  -in_positive &lt;file&gt;             Input file with positive examples (peptide separation prediction)
                                  (valid formats: 'idXML')
  -in_negative &lt;file&gt;             Input file with negative examples (peptide separation prediction)
                                  (valid formats: 'idXML')
  -out &lt;file&gt;*                    Output file: the model in libsvm format (valid formats: 'txt')
  -svm_type &lt;type&gt;                The type of the svm (NU_SVR or EPSILON_SVR for RT prediction, automatically
                                  set
                                  to C_SVC for separation prediction)
                                  (default: 'NU_SVR' valid: 'NU_SVR', 'NU_SVC', 'EPSILON_SVR', 'C_SVC')
  -nu &lt;float&gt;                     The nu parameter [0..1] of the svm (for nu-SVR) (default: '0.5' min: '0' 
                                  max: '1')
  -p &lt;float&gt;                      The epsilon parameter of the svm (for epsilon-SVR) (default: '0.1')
  -c &lt;float&gt;                      The penalty parameter of the svm (default: '1')
  -kernel_type &lt;type&gt;             The kernel type of the svm (default: 'OLIGO' valid: 'LINEAR', 'RBF', 'POLY'
                                  , 'OLIGO')
  -degree &lt;int&gt;                   The degree parameter of the kernel function of the svm (POLY kernel)
                                  (default: '1' min: '1')
  -border_length &lt;int&gt;            Length of the POBK (default: '22' min: '1')
  -max_std &lt;float&gt;                Max standard deviation for a peptide to be included (if there are several 
                                  ones for one peptide string)(median is taken) (default: '10' min: '0')
  -k_mer_length &lt;int&gt;             K_mer length of the POBK (default: '1' min: '1')
  -sigma &lt;float&gt;                  Sigma of the POBK (default: '5')
  -total_gradient_time &lt;time&gt;     The time (in seconds) of the gradient (only for RT prediction) (default: 
                                  '1' min: '1e-05')
  -first_dim_rt                   If set the model will be built for first_dim_rt
  -additive_cv                    If the step sizes should be interpreted additively (otherwise the actual 
                                  value is multiplied
                                  with the step size to get the new value
                                  

Parameters for the grid search / cross validation::
  -cv:skip_cv                     Set to enable Cross-Validation or set to true if the model should just be 
                                  trained with 1 set of specified parameters.
  -cv:number_of_runs &lt;int&gt;        Number of runs for the CV (each run creates a new random partition of the 
                                  data) (default: '1' min: '1')
  -cv:number_of_partitions &lt;int&gt;  Number of CV partitions (default: '10' min: '2')
  -cv:degree_start &lt;int&gt;          Starting point of degree (default: '1' min: '1')
  -cv:degree_step_size &lt;int&gt;      Step size point of degree (default: '2')
  -cv:degree_stop &lt;int&gt;           Stopping point of degree (default: '4')
  -cv:p_start &lt;float&gt;             Starting point of p (default: '1')
  -cv:p_step_size &lt;float&gt;         Step size point of p (default: '10')
  -cv:p_stop &lt;float&gt;              Stopping point of p (default: '1000')
  -cv:c_start &lt;float&gt;             Starting point of c (default: '1')
  -cv:c_step_size &lt;float&gt;         Step size of c (default: '10')
  -cv:c_stop &lt;float&gt;              Stopping point of c (default: '1000')
  -cv:nu_start &lt;float&gt;            Starting point of nu (default: '0.3' min: '0' max: '1')
  -cv:nu_step_size &lt;float&gt;        Step size of nu (default: '1.2')
  -cv:nu_stop &lt;float&gt;             Stopping point of nu (default: '0.7' min: '0' max: '1')
  -cv:sigma_start &lt;float&gt;         Starting point of sigma (default: '1')
  -cv:sigma_step_size &lt;float&gt;     Step size of sigma (default: '1.3')
  -cv:sigma_stop &lt;float&gt;          Stopping point of sigma (default: '15')

                                  
Common TOPP options:
  -ini &lt;file&gt;                     Use the given TOPP INI file
  -threads &lt;n&gt;                    Sets the number of threads allowed to be used by the TOPP tool (default: 
                                  '1')
  -write_ini &lt;file&gt;               Writes the default configuration file
  --help                          Shows options
  --helphelp                      Shows all options (including advanced)

</pre><p> <b>INI file documentation of this tool:</b> <div class="ini_global">
<div class="legend">
<b>Legend:</b><br>
 <div class="item item_required">required parameter</div>
 <div class="item item_advanced">advanced parameter</div>
</div>
  <div class="node"><span class="node_name">+RTModel</span><span class="node_description">Trains a model for the retention time prediction of peptides from a training set.</span></div>
    <div class="item item_advanced"><span class="item_name" style="padding-left:16px;">version</span><span class="item_value">1.11.1</span>
<span class="item_description">Version of the tool that generated this parameters file.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>    <div class="node"><span class="node_name">++1</span><span class="node_description">Instance '1' section for 'RTModel'</span></div>
      <div class="item"><span class="item_name" style="padding-left:24px;">in</span><span class="item_value"></span>
<span class="item_description">This is the name of the input file (RT prediction). It is assumed that the file type is idXML. Alternatively you can provide a .txt file having a sequence and the corresponding rt per line.<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML,*.txt</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">in_positive</span><span class="item_value"></span>
<span class="item_description">input file with positive examples (peptide separation prediction)<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">in_negative</span><span class="item_value"></span>
<span class="item_description">input file with negative examples (peptide separation prediction)<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name item_required" style="padding-left:24px;">out</span><span class="item_value"></span>
<span class="item_description">output file: the model in libsvm format</span><span class="item_tags">output file</span><span class="item_restrictions">*.txt</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">svm_type</span><span class="item_value">NU_SVR</span>
<span class="item_description">the type of the svm (NU_SVR or EPSILON_SVR for RT prediction, automatically set<br>to C_SVC for separation prediction)<br></span><span class="item_tags"></span><span class="item_restrictions">NU_SVR,NU_SVC,EPSILON_SVR,C_SVC</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">nu</span><span class="item_value">0.5</span>
<span class="item_description">the nu parameter [0..1] of the svm (for nu-SVR)</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">p</span><span class="item_value">0.1</span>
<span class="item_description">the epsilon parameter of the svm (for epsilon-SVR)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">c</span><span class="item_value">1</span>
<span class="item_description">the penalty parameter of the svm</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">kernel_type</span><span class="item_value">OLIGO</span>
<span class="item_description">the kernel type of the svm</span><span class="item_tags"></span><span class="item_restrictions">LINEAR,RBF,POLY,OLIGO</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">degree</span><span class="item_value">1</span>
<span class="item_description">the degree parameter of the kernel function of the svm (POLY kernel)<br></span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">border_length</span><span class="item_value">22</span>
<span class="item_description">length of the POBK</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">max_std</span><span class="item_value">10</span>
<span class="item_description">max standard deviation for a peptide to be included (if there are several ones for one peptide string)(median is taken)</span><span class="item_tags"></span><span class="item_restrictions">0:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">k_mer_length</span><span class="item_value">1</span>
<span class="item_description">k_mer length of the POBK</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">sigma</span><span class="item_value">5</span>
<span class="item_description">sigma of the POBK</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">total_gradient_time</span><span class="item_value">1</span>
<span class="item_description">the time (in seconds) of the gradient (only for RT prediction)</span><span class="item_tags"></span><span class="item_restrictions">1e-05:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">first_dim_rt</span><span class="item_value">false</span>
<span class="item_description">if set the model will be built for first_dim_rt</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">additive_cv</span><span class="item_value">false</span>
<span class="item_description">if the step sizes should be interpreted additively (otherwise the actual value is multiplied<br>with the step size to get the new value</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">log</span><span class="item_value"></span>
<span class="item_description">Name of log file (created only when specified)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">debug</span><span class="item_value">0</span>
<span class="item_description">Sets the debug level</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">threads</span><span class="item_value">1</span>
<span class="item_description">Sets the number of threads allowed to be used by the TOPP tool</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">no_progress</span><span class="item_value">false</span>
<span class="item_description">Disables progress logging to command line</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">test</span><span class="item_value">false</span>
<span class="item_description">Enables the test mode (needed for internal use only)</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="node"><span class="node_name">+++cv</span><span class="node_description">Parameters for the grid search / cross validation:</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">skip_cv</span><span class="item_value">false</span>
<span class="item_description">Set to enable Cross-Validation or set to true if the model should just be trained with 1 set of specified parameters.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">number_of_runs</span><span class="item_value">1</span>
<span class="item_description">number of runs for the CV (each run creates a new random partition of the data)</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">number_of_partitions</span><span class="item_value">10</span>
<span class="item_description">number of CV partitions</span><span class="item_tags"></span><span class="item_restrictions">2:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">degree_start</span><span class="item_value">1</span>
<span class="item_description">starting point of degree</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">degree_step_size</span><span class="item_value">2</span>
<span class="item_description">step size point of degree</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">degree_stop</span><span class="item_value">4</span>
<span class="item_description">stopping point of degree</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">p_start</span><span class="item_value">1</span>
<span class="item_description">starting point of p</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">p_step_size</span><span class="item_value">10</span>
<span class="item_description">step size point of p</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">p_stop</span><span class="item_value">1000</span>
<span class="item_description">stopping point of p</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">c_start</span><span class="item_value">1</span>
<span class="item_description">starting point of c</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">c_step_size</span><span class="item_value">10</span>
<span class="item_description">step size of c</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">c_stop</span><span class="item_value">1000</span>
<span class="item_description">stopping point of c</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">nu_start</span><span class="item_value">0.3</span>
<span class="item_description">starting point of nu</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">nu_step_size</span><span class="item_value">1.2</span>
<span class="item_description">step size of nu</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">nu_stop</span><span class="item_value">0.7</span>
<span class="item_description">stopping point of nu</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">sigma_start</span><span class="item_value">1</span>
<span class="item_description">starting point of sigma</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">sigma_step_size</span><span class="item_value">1.3</span>
<span class="item_description">step size of sigma</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">sigma_stop</span><span class="item_value">15</span>
<span class="item_description">stopping point of sigma</span><span class="item_tags"></span><span class="item_restrictions"> </span></div></div>
 </div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>