File: TOPP_RTModel.html

package info (click to toggle)
openms 1.11.1-5
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 436,688 kB
  • ctags: 150,907
  • sloc: cpp: 387,126; xml: 71,547; python: 7,764; ansic: 2,626; php: 2,499; sql: 737; ruby: 342; sh: 325; makefile: 128
file content (197 lines) | stat: -rw-r--r-- 22,804 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
<HTML>
<HEAD>
<TITLE>RTModel</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> &nbsp;&middot;
<A href="classes.html">Classes</A> &nbsp;&middot;
<A href="annotated.html">Annotated Classes</A> &nbsp;&middot;
<A href="modules.html">Modules</A> &nbsp;&middot;
<A href="functions_func.html">Members</A> &nbsp;&middot;
<A href="namespaces.html">Namespaces</A> &nbsp;&middot;
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
  <div class="headertitle">
<div class="title">RTModel </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Used to train a model for peptide retention time prediction or peptide separation prediction.</p>
<p>For retention time prediction, a support vector machine is trained with peptide sequences and their measured retention times. For peptide separation prediction, two files have to be given: One file contains the positive examples (the peptides which are collected) and the other contains the negative examples (the flowthrough peptides).</p>
<p>These methods and applications of this model are described in the following publications:</p>
<p>Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Statistical learning of peptide retention behavior in chromatographic separations: A new kernel-based approach for computational proteomics. BMC Bioinformatics 2007, 8:468</p>
<p>Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Improving Peptide Identification in Proteome Analysis by a Two-Dimensional Retention Time Filtering Approach J. Proteome Res. 2009, 8(8):4109-15</p>
<p>There are a number of parameters which can be changed for the svm (specified in the ini file and command line): </p>
<ul>
<li>
svm_type: the type of the svm (can be NU_SVR or EPSILON_SVR for RT prediction and is C_SVC for separation prediction)  </li>
<li>
kernel_type: the kernel function (e.g., POLY for the polynomial kernel, LINEAR for the linear kernel or RBF for the gaussian kernel); we recommend SVMWrapper::OLIGO for our paired oligo-border kernel (POBK)  </li>
<li>
border_length: border length for the POBK  </li>
<li>
k_mer_length: length of the signals considered in the POBK  </li>
<li>
sigma: the amount of positional smoothing for the POBK  </li>
<li>
degree: the degree parameter for the polynomial kernel  </li>
<li>
c: the penalty parameter of the svm  </li>
<li>
nu: the nu parameter for nu-SVR  </li>
<li>
p: the epsilon parameter for epsilon-SVR  </li>
</ul>
<p><br/>
</p>
<p>The last five parameters (sigma, degree, c, nu and p) can be used in a cross validation (CV) to find the best parameters according to the training set. Therefore you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation for example for the parameter c, enable CV (across all 5 parameters) and set <em>skip_cv</em> to <b>false</b> in the INI file. This can be easily done with using the INIFileEditor.</p>
<p>Furthermore, you can specify the number of partitions for the CV with <b>number_of_partitions</b> in the ini file and the number of runs with <b>number_of_runs</b>.</p>
<p><br/>
 Consequently you have two choices to use this application:</p>
<ol>
<li>
Set the parameters of the svm: The RTModel application will train the svm with the training data and store the svm model  </li>
<li>
Give a range of parameters for which a CV should be performed: The RTModel application will perform a CV to find the best parameter combination in the given range and afterwards train the svm with the best parameters and the whole training data. Then the model is stored.  </li>
</ol>
<p><br/>
 The model can be used in <a class="el" href="TOPP_RTPredict.html">RTPredict</a>, to predict retention times for peptides or peptide separation depending on how you trained the model.</p>
<p><b>The command line parameters of this tool are:</b> </p>
<pre class="fragment">
RTModel -- Trains a model for the retention time prediction of peptides from a training set.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  RTModel &lt;options&gt;

Options (mandatory options marked with '*'):
  -in &lt;file&gt;                      This is the name of the input file (RT prediction). It is assumed that the 
                                  file type is idXML. Alternatively you can provide a .txt file having a sequ
                                  ence and the corresponding rt per line.
                                  (valid formats: 'idXML', 'txt')
  -in_positive &lt;file&gt;             Input file with positive examples (peptide separation prediction)
                                  (valid formats: 'idXML')
  -in_negative &lt;file&gt;             Input file with negative examples (peptide separation prediction)
                                  (valid formats: 'idXML')
  -out &lt;file&gt;*                    Output file: the model in libsvm format (valid formats: 'txt')
  -svm_type &lt;type&gt;                The type of the svm (NU_SVR or EPSILON_SVR for RT prediction, automatically
                                  set
                                  to C_SVC for separation prediction)
                                  (default: 'NU_SVR' valid: 'NU_SVR', 'NU_SVC', 'EPSILON_SVR', 'C_SVC')
  -nu &lt;float&gt;                     The nu parameter [0..1] of the svm (for nu-SVR) (default: '0.5' min: '0' 
                                  max: '1')
  -p &lt;float&gt;                      The epsilon parameter of the svm (for epsilon-SVR) (default: '0.1')
  -c &lt;float&gt;                      The penalty parameter of the svm (default: '1')
  -kernel_type &lt;type&gt;             The kernel type of the svm (default: 'OLIGO' valid: 'LINEAR', 'RBF', 'POLY'
                                  , 'OLIGO')
  -degree &lt;int&gt;                   The degree parameter of the kernel function of the svm (POLY kernel)
                                  (default: '1' min: '1')
  -border_length &lt;int&gt;            Length of the POBK (default: '22' min: '1')
  -max_std &lt;float&gt;                Max standard deviation for a peptide to be included (if there are several 
                                  ones for one peptide string)(median is taken) (default: '10' min: '0')
  -k_mer_length &lt;int&gt;             K_mer length of the POBK (default: '1' min: '1')
  -sigma &lt;float&gt;                  Sigma of the POBK (default: '5')
  -total_gradient_time &lt;time&gt;     The time (in seconds) of the gradient (only for RT prediction) (default: 
                                  '1' min: '1e-05')
  -first_dim_rt                   If set the model will be built for first_dim_rt
  -additive_cv                    If the step sizes should be interpreted additively (otherwise the actual 
                                  value is multiplied
                                  with the step size to get the new value
                                  

Parameters for the grid search / cross validation::
  -cv:skip_cv                     Set to enable Cross-Validation or set to true if the model should just be 
                                  trained with 1 set of specified parameters.
  -cv:number_of_runs &lt;int&gt;        Number of runs for the CV (each run creates a new random partition of the 
                                  data) (default: '1' min: '1')
  -cv:number_of_partitions &lt;int&gt;  Number of CV partitions (default: '10' min: '2')
  -cv:degree_start &lt;int&gt;          Starting point of degree (default: '1' min: '1')
  -cv:degree_step_size &lt;int&gt;      Step size point of degree (default: '2')
  -cv:degree_stop &lt;int&gt;           Stopping point of degree (default: '4')
  -cv:p_start &lt;float&gt;             Starting point of p (default: '1')
  -cv:p_step_size &lt;float&gt;         Step size point of p (default: '10')
  -cv:p_stop &lt;float&gt;              Stopping point of p (default: '1000')
  -cv:c_start &lt;float&gt;             Starting point of c (default: '1')
  -cv:c_step_size &lt;float&gt;         Step size of c (default: '10')
  -cv:c_stop &lt;float&gt;              Stopping point of c (default: '1000')
  -cv:nu_start &lt;float&gt;            Starting point of nu (default: '0.3' min: '0' max: '1')
  -cv:nu_step_size &lt;float&gt;        Step size of nu (default: '1.2')
  -cv:nu_stop &lt;float&gt;             Stopping point of nu (default: '0.7' min: '0' max: '1')
  -cv:sigma_start &lt;float&gt;         Starting point of sigma (default: '1')
  -cv:sigma_step_size &lt;float&gt;     Step size of sigma (default: '1.3')
  -cv:sigma_stop &lt;float&gt;          Stopping point of sigma (default: '15')

                                  
Common TOPP options:
  -ini &lt;file&gt;                     Use the given TOPP INI file
  -threads &lt;n&gt;                    Sets the number of threads allowed to be used by the TOPP tool (default: 
                                  '1')
  -write_ini &lt;file&gt;               Writes the default configuration file
  --help                          Shows options
  --helphelp                      Shows all options (including advanced)

</pre><p> <b>INI file documentation of this tool:</b> <div class="ini_global">
<div class="legend">
<b>Legend:</b><br>
 <div class="item item_required">required parameter</div>
 <div class="item item_advanced">advanced parameter</div>
</div>
  <div class="node"><span class="node_name">+RTModel</span><span class="node_description">Trains a model for the retention time prediction of peptides from a training set.</span></div>
    <div class="item item_advanced"><span class="item_name" style="padding-left:16px;">version</span><span class="item_value">1.11.1</span>
<span class="item_description">Version of the tool that generated this parameters file.</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>    <div class="node"><span class="node_name">++1</span><span class="node_description">Instance '1' section for 'RTModel'</span></div>
      <div class="item"><span class="item_name" style="padding-left:24px;">in</span><span class="item_value"></span>
<span class="item_description">This is the name of the input file (RT prediction). It is assumed that the file type is idXML. Alternatively you can provide a .txt file having a sequence and the corresponding rt per line.<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML,*.txt</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">in_positive</span><span class="item_value"></span>
<span class="item_description">input file with positive examples (peptide separation prediction)<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">in_negative</span><span class="item_value"></span>
<span class="item_description">input file with negative examples (peptide separation prediction)<br></span><span class="item_tags">input file</span><span class="item_restrictions">*.idXML</span></div>      <div class="item"><span class="item_name item_required" style="padding-left:24px;">out</span><span class="item_value"></span>
<span class="item_description">output file: the model in libsvm format</span><span class="item_tags">output file</span><span class="item_restrictions">*.txt</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">svm_type</span><span class="item_value">NU_SVR</span>
<span class="item_description">the type of the svm (NU_SVR or EPSILON_SVR for RT prediction, automatically set<br>to C_SVC for separation prediction)<br></span><span class="item_tags"></span><span class="item_restrictions">NU_SVR,NU_SVC,EPSILON_SVR,C_SVC</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">nu</span><span class="item_value">0.5</span>
<span class="item_description">the nu parameter [0..1] of the svm (for nu-SVR)</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">p</span><span class="item_value">0.1</span>
<span class="item_description">the epsilon parameter of the svm (for epsilon-SVR)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">c</span><span class="item_value">1</span>
<span class="item_description">the penalty parameter of the svm</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">kernel_type</span><span class="item_value">OLIGO</span>
<span class="item_description">the kernel type of the svm</span><span class="item_tags"></span><span class="item_restrictions">LINEAR,RBF,POLY,OLIGO</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">degree</span><span class="item_value">1</span>
<span class="item_description">the degree parameter of the kernel function of the svm (POLY kernel)<br></span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">border_length</span><span class="item_value">22</span>
<span class="item_description">length of the POBK</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">max_std</span><span class="item_value">10</span>
<span class="item_description">max standard deviation for a peptide to be included (if there are several ones for one peptide string)(median is taken)</span><span class="item_tags"></span><span class="item_restrictions">0:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">k_mer_length</span><span class="item_value">1</span>
<span class="item_description">k_mer length of the POBK</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">sigma</span><span class="item_value">5</span>
<span class="item_description">sigma of the POBK</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">total_gradient_time</span><span class="item_value">1</span>
<span class="item_description">the time (in seconds) of the gradient (only for RT prediction)</span><span class="item_tags"></span><span class="item_restrictions">1e-05:&#8734;</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">first_dim_rt</span><span class="item_value">false</span>
<span class="item_description">if set the model will be built for first_dim_rt</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">additive_cv</span><span class="item_value">false</span>
<span class="item_description">if the step sizes should be interpreted additively (otherwise the actual value is multiplied<br>with the step size to get the new value</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">log</span><span class="item_value"></span>
<span class="item_description">Name of log file (created only when specified)</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">debug</span><span class="item_value">0</span>
<span class="item_description">Sets the debug level</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item"><span class="item_name" style="padding-left:24px;">threads</span><span class="item_value">1</span>
<span class="item_description">Sets the number of threads allowed to be used by the TOPP tool</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">no_progress</span><span class="item_value">false</span>
<span class="item_description">Disables progress logging to command line</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="item item_advanced"><span class="item_name" style="padding-left:24px;">test</span><span class="item_value">false</span>
<span class="item_description">Enables the test mode (needed for internal use only)</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>      <div class="node"><span class="node_name">+++cv</span><span class="node_description">Parameters for the grid search / cross validation:</span></div>
        <div class="item"><span class="item_name" style="padding-left:32px;">skip_cv</span><span class="item_value">false</span>
<span class="item_description">Set to enable Cross-Validation or set to true if the model should just be trained with 1 set of specified parameters.</span><span class="item_tags"></span><span class="item_restrictions">true,false</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">number_of_runs</span><span class="item_value">1</span>
<span class="item_description">number of runs for the CV (each run creates a new random partition of the data)</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">number_of_partitions</span><span class="item_value">10</span>
<span class="item_description">number of CV partitions</span><span class="item_tags"></span><span class="item_restrictions">2:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">degree_start</span><span class="item_value">1</span>
<span class="item_description">starting point of degree</span><span class="item_tags"></span><span class="item_restrictions">1:&#8734;</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">degree_step_size</span><span class="item_value">2</span>
<span class="item_description">step size point of degree</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">degree_stop</span><span class="item_value">4</span>
<span class="item_description">stopping point of degree</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">p_start</span><span class="item_value">1</span>
<span class="item_description">starting point of p</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">p_step_size</span><span class="item_value">10</span>
<span class="item_description">step size point of p</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">p_stop</span><span class="item_value">1000</span>
<span class="item_description">stopping point of p</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">c_start</span><span class="item_value">1</span>
<span class="item_description">starting point of c</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">c_step_size</span><span class="item_value">10</span>
<span class="item_description">step size of c</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">c_stop</span><span class="item_value">1000</span>
<span class="item_description">stopping point of c</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">nu_start</span><span class="item_value">0.3</span>
<span class="item_description">starting point of nu</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">nu_step_size</span><span class="item_value">1.2</span>
<span class="item_description">step size of nu</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">nu_stop</span><span class="item_value">0.7</span>
<span class="item_description">stopping point of nu</span><span class="item_tags"></span><span class="item_restrictions">0:1</span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">sigma_start</span><span class="item_value">1</span>
<span class="item_description">starting point of sigma</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">sigma_step_size</span><span class="item_value">1.3</span>
<span class="item_description">step size of sigma</span><span class="item_tags"></span><span class="item_restrictions"> </span></div>        <div class="item"><span class="item_name" style="padding-left:32px;">sigma_stop</span><span class="item_value">15</span>
<span class="item_description">stopping point of sigma</span><span class="item_tags"></span><span class="item_restrictions"> </span></div></div>
 </div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>