File: postResample.Rd

package info (click to toggle)
r-cran-caret 6.0-73%2Bdfsg1-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 5,884 kB
  • ctags: 9
  • sloc: ansic: 207; sh: 10; makefile: 2
file content (132 lines) | stat: -rw-r--r-- 5,188 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/postResample.R, R/prec_rec.R
\name{postResample}
\alias{R2}
\alias{RMSE}
\alias{defaultSummary}
\alias{getTrainPerf}
\alias{mnLogLoss}
\alias{multiClassSummary}
\alias{postResample}
\alias{prSummary}
\alias{twoClassSummary}
\title{Calculates performance across resamples}
\usage{
postResample(pred, obs)

twoClassSummary(data, lev = NULL, model = NULL)

mnLogLoss(data, lev = NULL, model = NULL)

multiClassSummary(data, lev = NULL, model = NULL)

prSummary(data, lev = NULL, model = NULL)
}
\arguments{
\item{pred}{A vector of numeric data (could be a factor)}

\item{obs}{A vector of numeric data (could be a factor)}

\item{data}{a data frame or matrix with columns \code{obs} and \code{pred}
for the observed and predicted outcomes. For \code{twoClassSummary}, columns
should also include predicted probabilities for each class. See the
\code{classProbs} argument to \code{\link{trainControl}}}

\item{lev}{a character vector of factors levels for the response. In
regression cases, this would be \code{NULL}.}

\item{model}{a character string for the model name (as taken form the
\code{method} argument of \code{\link{train}}.}
}
\value{
A vector of performance estimates.
}
\description{
Given two numeric vectors of data, the mean squared error and R-squared are
calculated. For two factors, the overall agreement rate and Kappa are
determined.
}
\details{
\code{postResample} is meant to be used with \code{apply} across a matrix.
For numeric data the code checks to see if the standard deviation of either
vector is zero. If so, the correlation between those samples is assigned a
value of zero. \code{NA} values are ignored everywhere.

Note that many models have more predictors (or parameters) than data points,
so the typical mean squared error denominator (n - p) does not apply. Root
mean squared error is calculated using \code{sqrt(mean((pred - obs)^2}.
Also, \eqn{R^2} is calculated wither using as the square of the correlation
between the observed and predicted outcomes when \code{form = "corr"}. when
\code{form = "traditional"}, \deqn{ R^2 = 1-\frac{\sum (y_i -
\hat{y}_i)^2}{\sum (y_i - \bar{y}_i)^2} }

For \code{defaultSummary} is the default function to compute performance
metrics in \code{\link{train}}. It is a wrapper around \code{postResample}.

\code{twoClassSummary} computes sensitivity, specificity and the area under
the ROC curve. \code{mnLogLoss} computes the minus log-likelihood of the
multinomial distribution (without the constant term): \deqn{ -logLoss =
\frac{-1}{n}\sum_{i=1}^n \sum_{j=1}^C y_{ij} \log(p_{ij}) } where the
\code{y} values are binary indicators for the classes and \code{p} are the
predicted class probabilities.

\code{prSummary} (for precision and recall) computes values for the default
0.50 probability cutoff as well as the area under the precision-recall curve
across all cutoffs and is labelled as \code{"AUC"} in the output. If assumes
that the first level of the factor variables corresponds to a relevant
result but the \code{lev} argument can be used to change this.

\code{multiClassSummary} computes some overall measures of for performance
(e.g. overall accuracy and the Kappa statistic) and several averages of
statistics calculated from "one-versus-all" configurations. For example, if
there are three classes, three sets of sensitivity values are determined and
the average is reported with the name ("Mean_Sensitivity"). The same is true
for a number of statistics generated by \code{\link{confusionMatrix}}. With
two classes, the basic sensitivity is reported with the name "Sensitivity"

To use \code{twoClassSummary} and/or \code{mnLogLoss}, the \code{classProbs}
argument of \code{\link{trainControl}} should be \code{TRUE}.
\code{multiClassSummary} can be used without class probabilities but some
statistics (e.g. overall log loss and the average of per-class area under
the ROC curves) will not be in the result set.

Other functions can be used via the \code{summaryFunction} argument of
\code{\link{trainControl}}. Custom functions must have the same arguments
as\code{defaultSummary}.

The function \code{getTrainPerf} returns a one row data frame with the
resampling results for the chosen model. The statistics will have the prefix
"\code{Train}" (i.e. "\code{TrainROC}"). There is also a column called
"\code{method}" that echoes the argument of the call to
\code{\link{trainControl}} of the same name.
}
\examples{

predicted <-  matrix(rnorm(50), ncol = 5)
observed <- rnorm(10)
apply(predicted, 2, postResample, obs = observed)

classes <- c("class1", "class2")
set.seed(1)
dat <- data.frame(obs =  factor(sample(classes, 50, replace = TRUE)),
                  pred = factor(sample(classes, 50, replace = TRUE)),
                  class1 = runif(50), class2 = runif(50))

defaultSummary(dat, lev = classes)
twoClassSummary(dat, lev = classes)
prSummary(dat, lev = classes)
mnLogLoss(dat, lev = classes)

}
\author{
Max Kuhn, Zachary Mayer
}
\references{
Kvalseth. Cautionary note about \eqn{R^2}. American Statistician
(1985) vol. 39 (4) pp. 279-285
}
\seealso{
\code{\link{trainControl}}
}
\keyword{utilities}