File: safs_initial.Rd

package info (click to toggle)
r-cran-caret 7.0-1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,036 kB
  • sloc: ansic: 210; sh: 10; makefile: 2
file content (162 lines) | stat: -rw-r--r-- 5,316 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/safs.R
\docType{data}
\name{safs_initial}
\alias{safs_initial}
\alias{safs_perturb}
\alias{safs_prob}
\alias{caretSA}
\alias{rfSA}
\alias{treebagSA}
\title{Ancillary simulated annealing functions}
\format{
An object of class \code{list} of length 8.

An object of class \code{list} of length 8.

An object of class \code{list} of length 8.
}
\usage{
safs_initial(vars, prob = 0.2, ...)

safs_perturb(x, vars, number = floor(length(x) * 0.01) + 1)

safs_prob(old, new, iteration = 1)

caretSA

treebagSA

rfSA
}
\arguments{
\item{vars}{the total number of possible predictor variables}

\item{prob}{The probability that an individual predictor is included in the
initial predictor set}

\item{\dots}{not currently used}

\item{x}{the integer index vector for the current subset}

\item{number}{the number of predictor variables to perturb}

\item{old, new}{fitness values associated with the current and new subset}

\item{iteration}{the number of iterations overall or the number of
iterations since restart (if \code{improve} is used in
\code{\link{safsControl}})}
}
\value{
The return value depends on the function. Note that the SA code
encodes the subsets as a vector of integers that are included in the subset
(which is different than the encoding used for GAs).

The objects \code{caretSA}, \code{rfSA} and \code{treebagSA} are example
lists that can be used with the \code{functions} argument of
\code{\link{safsControl}}.

In the case of \code{caretSA}, the \code{...} structure of
\code{\link{safs}} passes through to the model fitting routine. As a
consequence, the \code{\link{train}} function can easily be accessed by
passing important arguments belonging to \code{\link{train}} to
\code{\link{safs}}. See the examples below. By default, using \code{caretSA}
will used the resampled performance estimates produced by
\code{\link{train}} as the internal estimate of fitness.

For \code{rfSA} and \code{treebagSA}, the \code{randomForest} and
\code{bagging} functions are used directly (i.e. \code{\link{train}} is not
used). Arguments to either of these functions can also be passed to them
though the \code{\link{safs}} call (see examples below). For these two
functions, the internal fitness is estimated using the out-of-bag estimates
naturally produced by those functions. While faster, this limits the user to
accuracy or Kappa (for classification) and RMSE and R-squared (for
regression).
}
\description{
Built-in functions related to simulated annealing

These functions are used with the \code{functions} argument of the
\code{\link{safsControl}} function. More information on the details of these
functions are at \url{http://topepo.github.io/caret/feature-selection-using-simulated-annealing.html}.

The \code{initial} function is used to create the first predictor subset.
The function \code{safs_initial} randomly selects 20\% of the predictors.
Note that, instead of a function, \code{\link{safs}} can also accept a
vector of column numbers as the initial subset.

\code{safs_perturb} is an example of the operation that changes the subset
configuration at the start of each new iteration. By default, it will change
roughly 1\% of the variables in the current subset.

The \code{prob} function defines the acceptance probability at each
iteration, given the old and new fitness (i.e. energy values). It assumes
that smaller values are better. The default probability function computed
the percentage difference between the current and new fitness value and
using an exponential function to compute a probability: \preformatted{ prob
= exp[(current-new)/current*iteration] }
}
\examples{

selected_vars <- safs_initial(vars = 10 , prob = 0.2)
selected_vars

###

safs_perturb(selected_vars, vars = 10, number = 1)

###

safs_prob(old = .8, new = .9, iteration = 1)
safs_prob(old = .5, new = .6, iteration = 1)

grid <- expand.grid(old = c(4, 3.5),
                    new = c(4.5, 4, 3.5) + 1,
                    iter = 1:40)
grid <- subset(grid, old < new)

grid$prob <- apply(grid, 1,
                   function(x)
                     safs_prob(new = x["new"],
                               old= x["old"],
                               iteration = x["iter"]))

grid$Difference <- factor(grid$new - grid$old)
grid$Group <- factor(paste("Current Value", grid$old))

ggplot(grid, aes(x = iter, y = prob, color = Difference)) +
  geom_line() + facet_wrap(~Group) + theme_bw() +
  ylab("Probability") + xlab("Iteration")

\dontrun{
###
## Hypothetical examples
lda_sa <- safs(x = predictors,
               y = classes,
               safsControl = safsControl(functions = caretSA),
               ## now pass arguments to `train`
               method = "lda",
               metric = "Accuracy"
               trControl = trainControl(method = "cv", classProbs = TRUE))

rf_sa <- safs(x = predictors,
              y = classes,
              safsControl = safsControl(functions = rfSA),
              ## these are arguments to `randomForest`
              ntree = 1000,
              importance = TRUE)
	}



}
\references{
\url{http://topepo.github.io/caret/feature-selection-using-simulated-annealing.html}
}
\seealso{
\code{\link{safs}}, \code{\link{safsControl}}
}
\author{
Max Kuhn
}
\keyword{datasets}