File: featureSelection.html

package info (click to toggle)
ball 1.4.3~beta1-3
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 318,984 kB
  • sloc: cpp: 346,579; ansic: 4,097; python: 2,664; yacc: 1,778; lex: 1,099; xml: 964; sh: 688; sql: 316; awk: 118; makefile: 108
file content (123 lines) | stat: -rwxr-xr-x 7,535 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>QuEasyViz Documentation: Feature Selection</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>

<table width="700" border="0" align="center" cellpadding="0"> <tr> <td>
	
<h2 align="center">
<a href="models.html"><img src="images/left.png" width="58" height="28" border=0 align="MIDDLE"></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="index.html">Index</a> &nbsp;&nbsp;&nbsp;&nbsp;
<a href="validation.html"><img src="images/right.png" width="58" height="28" border=0 align="MIDDLE"></a>
</h2><hr width=700>

<!-- end of header -->	


	
<H2><div align="center">Feature Selections </div></H2>
	<TABLE width="700">
   		<tr>	
		<TD width="520" colspan="1"><IMG src="images/fs1_scaled.png"></TD>
		<TD width="180" colspan="1" valign="top"><p>In order to create a new feature selection, just drag its item from the feature selection panel onto an existing model item within the pipeline area (a).</p>
		<p>A dialog requesting the necessary parameters will appear then. The parameters for the different feature selection techniques will be explained below.</p>
		<p>If the model which is to be reduced by this feature selection has optimizable model parameter and optimization has been enabled (see <a href="models.html#create_model">here</a>), you can decided whether you want to optimize the model's parameters again <em>after</em> this feature selection is done (b).</p> </TD>
		</tr>	
 	</TABLE>
	<p>Similarily, if the model is a kernel-based model and grid-search optimization of kernel-parameters has been enabled (see <a href="models.html#kernel_models">here</a>), you can choose to optimize the kernel-parameters again <em>after</em> this feature selection is done (c).<br>
	Therefore, the same grid-search parameters will be used but grid-search will automatically be done locally around the previous kernel-parameters. </p>


	<!--You can choose between the following feature selection procedures:<br>-->

	<h3><div align="center" id="rm_collinear">Removal of strongly collinear features</div></h3>
		<TABLE width="700">
   			<tr>	
			<TD width="330" colspan="1"><div align="center"><IMG src="images/fs_cor_scaled.png"></div></TD>
			<TD width="370" colspan="1" valign="top">Each feature f<sub>1</sub> that has a correlation to some other feature f<sub>2</sub> that is larger than a user-defined threshold (a) is removed.
			<p>I.e. f<sub>1</sub> is removed if there is an f<sub>2</sub>, so that |cor(f<sub>1</sub>,f<sub>2</sub>)| >= |d|</p> </TD>
			</tr>	
 		</TABLE>

	<h3><div align="center" id="forward">Forward selection</div></h3>
		<TABLE width="700">
   			<tr>	
			<TD width="330" colspan="1"><div align="center"><IMG src="images/fs_forward_scaled.png"></div></TD>
			<TD width="370" colspan="1" valign="top">In each iterator the best features to be added to the list of already selected features is searched. This search is done by cross-validation; the number of its folds (a) can be set by the user.<br>
			<p>The feature selection terminates if adding the best feature found by cross-validation increases the model's quality by less than a user-defined threshold (b).</p>
			</TD>
			</tr>	
 		</TABLE>

	<h3><div align="center" id="backward">Backward selection</div></h3>
		<TABLE width="700">
   			<tr>	
			<TD width="330" colspan="1"><div align="center"><IMG src="images/fs_backward_scaled.png"></div></TD>
			<TD width="370" colspan="1" valign="top">This feature selection starts with all descriptors (that were selected previously) and in each iteration searches the feature whose removal leads to largest increase of the model's prediction quality. The search, which is done in each iteration, is achieved by cross-validation; the number of its folds (a) can be set by the user.<br>
			<p>The feature selection terminates if removing the above stated feature increases the model's quality by less than a user-defined threshold (b).</p></TD>
			</tr>	
 		</TABLE>
		<p>If you specify a negative threshold, the method will continue to remove features as long as the prediction quality is no smaller than the quality of the original model minus absolute value of this threshold. This may be helpful if using several feature selections in succession, where a minimal descrease of quality might thus allow the removal of many features.</p>

	<h3><div align="center" id="stepwise">Stepwise selection</div></h3>
		
			<!--<TD width="330" colspan="1"><div align="center"><IMG src="images/fs_stepwise_scaled.png"></div></TD>-->
			Stepwise selection alternates between forward and backward selection. It starts with one feature and after each forward selection step a backward selection step is done.<br>
			The parameters are the same as for forward and backward selection.
			

	<h3><div align="center" id="respCor">Removal of low response correlation</div></h3>
		<TABLE width="700">
   			<tr>	
			<TD width="330" colspan="1"><div align="center"><IMG src="images/fs_respCor_scaled.png"></div></TD>
			<TD width="370" colspan="1" valign="top">Each feature that has a correlation with the response variable that is smaller than a user-defined threshold (a) is removed. 
			<p>For regression/classification models that consider the features to be independent of each other (like Baysian approaches) the threshold can be set to a high value without violating the models assumptions.<br>
			For the remaining types of models this feature selection might nevertheless be a usefull heuristic if a relatively low value is chosen. </p></TD>
			</tr>	
 		</TABLE>

	<h3><div align="center" id="insig">Removal of insignificant features</div></h3>
		<TABLE width="700">
   			<tr>	
			<TD width="330" colspan="1"><div align="center"><IMG src="images/fs_insig_scaled.png"></div></TD>
			<TD width="370" colspan="1" valign="top">This feature selection runs a bootstrapping in order to obtain the standard deviation of each linear regression coefficient.
			<p>After a bootstrap with (b) samples, all features are removed whose coefficient's absolute value is smaller than (b) times its standard deviation.</p>
			Note that this feature selection technique is applicable only to <a href="models.html#model_types">linear regressions models</a>.</TD>
			</tr>	
 		</TABLE>

	<h3><div align="center" id="twinscan">TwinScan</div></h3>
		<TABLE width="700">
   			<tr>	
			<TD width="330" colspan="1"><div align="center"><IMG src="images/fs_twinscan_scaled.png"></div></TD>
			<TD width="370" colspan="1" valign="top">TwinScan works by scanning all features twice.<br>
			In the first scan the single best feature is searched, while at the same time the quality achieved by each other feature is stored.<br>
			In the second scan, each remaining feature is added to the list of features if this increases the models predictive quality, as determined by (a)-fold cross-validation, by more than a user-defined threshold (b). Those features are checked in the descending order of their single-feature qualities as determined in the first scan. </TD>
			</tr>	
 		</TABLE>




	

<!-- begin of footer -->
  

<hr width=700>
<h2 align="center">
<a href="models.html"><img src="images/left.png" width="58" height="28" border=0 align="MIDDLE"></a>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="index.html">Index</a> &nbsp;&nbsp;&nbsp;&nbsp;
<a href="validation.html"><img src="images/right.png" width="58" height="28" border=0 align="MIDDLE"></a>
</h2>

  </td></tr></table>

  
</body>
</html>