File: filterVarImp.Rd

package info (click to toggle)
r-cran-caret 7.0-1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,036 kB
  • sloc: ansic: 210; sh: 10; makefile: 2
file content (70 lines) | stat: -rw-r--r-- 2,448 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/filterVarImp.R
\name{filterVarImp}
\alias{filterVarImp}
\title{Calculation of filter-based variable importance}
\usage{
filterVarImp(x, y, nonpara = FALSE, ...)
}
\arguments{
\item{x}{A matrix or data frame of predictor data}

\item{y}{A vector (numeric or factor) of outcomes)}

\item{nonpara}{should nonparametric methods be used to assess the
relationship between the features and response}

\item{...}{options to pass to either \code{\link[stats]{lm}} or
\code{\link[stats]{loess}}}
}
\value{
A data frame with variable importances. Column names depend on the
problem type.  For regression, the data frame contains one column: "Overall"
for the importance values.
}
\description{
Specific engines for variable importance on a model by model basis.
}
\details{
The importance of each predictor is evaluated individually using a
``filter'' approach.

For classification, ROC curve analysis is conducted on each predictor. For
two class problems, a series of cutoffs is applied to the predictor data to
predict the class. The sensitivity and specificity are computed for each
cutoff and the ROC curve is computed. The trapezoidal rule is used to
compute the area under the ROC curve. This area is used as the measure of
variable importance. For multi-class outcomes, the problem is decomposed
into all pair-wise problems and the area under the curve is calculated for
each class pair (i.e class 1 vs. class 2, class 2 vs. class 3 etc.). For a
specific class, the maximum area under the curve across the relevant
pair-wise AUC's is used as the variable importance measure.

For regression, the relationship between each predictor and the outcome is
evaluated. An argument, \code{nonpara}, is used to pick the model fitting
technique. When \code{nonpara = FALSE}, a linear model is fit and the
absolute value of the $t$-value for the slope of the predictor is used.
Otherwise, a loess smoother is fit between the outcome and the predictor.
The $R^2$ statistic is calculated for this model against the intercept only
null model.
}
\examples{

data(mdrr)
filterVarImp(mdrrDescr[, 1:5], mdrrClass)

data(BloodBrain)

filterVarImp(bbbDescr[, 1:5], logBBB, nonpara = FALSE)
apply(bbbDescr[, 1:5],
      2,
      function(x, y) summary(lm(y~x))$coefficients[2,3],
      y = logBBB)

filterVarImp(bbbDescr[, 1:5], logBBB, nonpara = TRUE)

}
\author{
Max Kuhn
}
\keyword{models}