File: caretFuncs.Rd

package info (click to toggle)
r-cran-caret 7.0-1%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 4,036 kB
sloc: ansic: 210; sh: 10; makefile: 2
file content (152 lines) | stat: -rw-r--r-- 4,278 bytes
parent folder | download | duplicates (2)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rfe.R
\docType{data}
\name{pickSizeBest}
\alias{pickSizeBest}
\alias{pickSizeTolerance}
\alias{pickVars}
\alias{caretFuncs}
\alias{lmFuncs}
\alias{rfFuncs}
\alias{gamFuncs}
\alias{treebagFuncs}
\alias{ldaFuncs}
\alias{nbFuncs}
\alias{lrFuncs}
\title{Backwards Feature Selection Helper Functions}
\format{
An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.

An object of class \code{list} of length 6.
}
\usage{
pickSizeBest(x, metric, maximize)

pickSizeTolerance(x, metric, tol = 1.5, maximize)

pickVars(y, size)

caretFuncs

ldaFuncs

treebagFuncs

gamFuncs

rfFuncs

lmFuncs

nbFuncs

lrFuncs
}
\arguments{
\item{x}{a matrix or data frame with the performance metric of interest}

\item{metric}{a character string with the name of the performance metric
that should be used to choose the appropriate number of variables}

\item{maximize}{a logical; should the metric be maximized?}

\item{tol}{a scalar to denote the acceptable difference in optimal
performance (see Details below)}

\item{y}{a list of data frames with variables \code{Overall} and \code{var}}

\item{size}{an integer for the number of variables to retain}
}
\description{
Ancillary functions for backwards selection
}
\details{
This page describes the functions that are used in backwards selection (aka
recursive feature elimination). The functions described here are passed to
the algorithm via the \code{functions} argument of \code{\link{rfeControl}}.

See \code{\link{rfeControl}} for details on how these functions should be
defined.

The 'pick' functions are used to find the appropriate subset size for
different situations. \code{pickBest} will find the position associated with
the numerically best value (see the \code{maximize} argument to help define
this).

\code{pickSizeTolerance} picks the lowest position (i.e. the smallest subset
size) that has no more of an X percent loss in performances. When
maximizing, it calculates (O-X)/O*100, where X is the set of performance
values and O is max(X). This is the percent loss. When X is to be minimized,
it uses (X-O)/O*100 (so that values greater than X have a positive "loss").
The function finds the smallest subset size that has a percent loss less
than \code{tol}.

Both of the 'pick' functions assume that the data are sorted from smallest
subset size to largest.
}
\examples{

## For picking subset sizes:
## Minimize the RMSE
example <- data.frame(RMSE = c(1.2, 1.1, 1.05, 1.01, 1.01, 1.03, 1.00),
                      Variables = 1:7)
## Percent Loss in performance (positive)
example$PctLoss <- (example$RMSE - min(example$RMSE))/min(example$RMSE)*100

xyplot(RMSE ~ Variables, data= example)
xyplot(PctLoss ~ Variables, data= example)

absoluteBest <- pickSizeBest(example, metric = "RMSE", maximize = FALSE)
within5Pct <- pickSizeTolerance(example, metric = "RMSE", maximize = FALSE)

cat("numerically optimal:",
    example$RMSE[absoluteBest],
    "RMSE in position",
    absoluteBest, "\n")
cat("Accepting a 1.5 pct loss:",
    example$RMSE[within5Pct],
    "RMSE in position",
    within5Pct, "\n")

## Example where we would like to maximize
example2 <- data.frame(Rsquared = c(0.4, 0.6, 0.94, 0.95, 0.95, 0.95, 0.95),
                      Variables = 1:7)
## Percent Loss in performance (positive)
example2$PctLoss <- (max(example2$Rsquared) - example2$Rsquared)/max(example2$Rsquared)*100

xyplot(Rsquared ~ Variables, data= example2)
xyplot(PctLoss ~ Variables, data= example2)

absoluteBest2 <- pickSizeBest(example2, metric = "Rsquared", maximize = TRUE)
within5Pct2 <- pickSizeTolerance(example2, metric = "Rsquared", maximize = TRUE)

cat("numerically optimal:",
    example2$Rsquared[absoluteBest2],
    "R^2 in position",
    absoluteBest2, "\n")
cat("Accepting a 1.5 pct loss:",
    example2$Rsquared[within5Pct2],
    "R^2 in position",
    within5Pct2, "\n")

}
\seealso{
\code{\link{rfeControl}}, \code{\link{rfe}}
}
\author{
Max Kuhn
}
\keyword{datasets}
\keyword{models}