File: caretFuncs.Rd

package info (click to toggle)
r-cran-caret 7.0-1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,036 kB
  • sloc: ansic: 210; sh: 10; makefile: 2
file content (152 lines) | stat: -rw-r--r-- 4,278 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rfe.R
\docType{data}
\name{pickSizeBest}
\alias{pickSizeBest}
\alias{pickSizeTolerance}
\alias{pickVars}
\alias{caretFuncs}
\alias{lmFuncs}
\alias{rfFuncs}
\alias{gamFuncs}
\alias{treebagFuncs}
\alias{ldaFuncs}
\alias{nbFuncs}
\alias{lrFuncs}
\title{Backwards Feature Selection Helper Functions}
\format{
An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.
}
\usage{
pickSizeBest(x, metric, maximize)

pickSizeTolerance(x, metric, tol = 1.5, maximize)

pickVars(y, size)

caretFuncs

ldaFuncs

treebagFuncs

gamFuncs

rfFuncs

lmFuncs

nbFuncs

lrFuncs
}
\arguments{
\item{x}{a matrix or data frame with the performance metric of interest}

\item{metric}{a character string with the name of the performance metric
that should be used to choose the appropriate number of variables}

\item{maximize}{a logical; should the metric be maximized?}

\item{tol}{a scalar to denote the acceptable difference in optimal
performance (see Details below)}

\item{y}{a list of data frames with variables \code{Overall} and \code{var}}

\item{size}{an integer for the number of variables to retain}
}
\description{
Ancillary functions for backwards selection
}
\details{
This page describes the functions that are used in backwards selection (aka
recursive feature elimination). The functions described here are passed to
the algorithm via the \code{functions} argument of \code{\link{rfeControl}}.

See \code{\link{rfeControl}} for details on how these functions should be
defined.

The 'pick' functions are used to find the appropriate subset size for
different situations. \code{pickBest} will find the position associated with
the numerically best value (see the \code{maximize} argument to help define
this).

\code{pickSizeTolerance} picks the lowest position (i.e. the smallest subset
size) that has no more of an X percent loss in performances. When
maximizing, it calculates (O-X)/O*100, where X is the set of performance
values and O is max(X). This is the percent loss. When X is to be minimized,
it uses (X-O)/O*100 (so that values greater than X have a positive "loss").
The function finds the smallest subset size that has a percent loss less
than \code{tol}.

Both of the 'pick' functions assume that the data are sorted from smallest
subset size to largest.
}
\examples{

## For picking subset sizes:
## Minimize the RMSE
example <- data.frame(RMSE = c(1.2, 1.1, 1.05, 1.01, 1.01, 1.03, 1.00),
                      Variables = 1:7)
## Percent Loss in performance (positive)
example$PctLoss <- (example$RMSE - min(example$RMSE))/min(example$RMSE)*100

xyplot(RMSE ~ Variables, data= example)
xyplot(PctLoss ~ Variables, data= example)

absoluteBest <- pickSizeBest(example, metric = "RMSE", maximize = FALSE)
within5Pct <- pickSizeTolerance(example, metric = "RMSE", maximize = FALSE)

cat("numerically optimal:",
    example$RMSE[absoluteBest],
    "RMSE in position",
    absoluteBest, "\n")
cat("Accepting a 1.5 pct loss:",
    example$RMSE[within5Pct],
    "RMSE in position",
    within5Pct, "\n")

## Example where we would like to maximize
example2 <- data.frame(Rsquared = c(0.4, 0.6, 0.94, 0.95, 0.95, 0.95, 0.95),
                      Variables = 1:7)
## Percent Loss in performance (positive)
example2$PctLoss <- (max(example2$Rsquared) - example2$Rsquared)/max(example2$Rsquared)*100

xyplot(Rsquared ~ Variables, data= example2)
xyplot(PctLoss ~ Variables, data= example2)

absoluteBest2 <- pickSizeBest(example2, metric = "Rsquared", maximize = TRUE)
within5Pct2 <- pickSizeTolerance(example2, metric = "Rsquared", maximize = TRUE)

cat("numerically optimal:",
    example2$Rsquared[absoluteBest2],
    "R^2 in position",
    absoluteBest2, "\n")
cat("Accepting a 1.5 pct loss:",
    example2$Rsquared[within5Pct2],
    "R^2 in position",
    within5Pct2, "\n")

}
\seealso{
\code{\link{rfeControl}}, \code{\link{rfe}}
}
\author{
Max Kuhn
}
\keyword{datasets}
\keyword{models}