File: postResample.Rd

package info (click to toggle)
r-cran-caret 6.0-73%2Bdfsg1-1
links: PTS, VCS
area: main
in suites: stretch
size: 5,884 kB
ctags: 9
sloc: ansic: 207; sh: 10; makefile: 2
file content (132 lines) | stat: -rw-r--r-- 5,188 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/postResample.R, R/prec_rec.R
\name{postResample}
\alias{R2}
\alias{RMSE}
\alias{defaultSummary}
\alias{getTrainPerf}
\alias{mnLogLoss}
\alias{multiClassSummary}
\alias{postResample}
\alias{prSummary}
\alias{twoClassSummary}
\title{Calculates performance across resamples}
\usage{
postResample(pred, obs)

twoClassSummary(data, lev = NULL, model = NULL)

mnLogLoss(data, lev = NULL, model = NULL)

multiClassSummary(data, lev = NULL, model = NULL)

prSummary(data, lev = NULL, model = NULL)
}
\arguments{
\item{pred}{A vector of numeric data (could be a factor)}

\item{obs}{A vector of numeric data (could be a factor)}

\item{data}{a data frame or matrix with columns \code{obs} and \code{pred}
for the observed and predicted outcomes. For \code{twoClassSummary}, columns
should also include predicted probabilities for each class. See the
\code{classProbs} argument to \code{\link{trainControl}}}

\item{lev}{a character vector of factors levels for the response. In
regression cases, this would be \code{NULL}.}

\item{model}{a character string for the model name (as taken form the
\code{method} argument of \code{\link{train}}.}
}
\value{
A vector of performance estimates.
}
\description{
Given two numeric vectors of data, the mean squared error and R-squared are
calculated. For two factors, the overall agreement rate and Kappa are
determined.
}
\details{
\code{postResample} is meant to be used with \code{apply} across a matrix.
For numeric data the code checks to see if the standard deviation of either
vector is zero. If so, the correlation between those samples is assigned a
value of zero. \code{NA} values are ignored everywhere.

Note that many models have more predictors (or parameters) than data points,
so the typical mean squared error denominator (n - p) does not apply. Root
mean squared error is calculated using \code{sqrt(mean((pred - obs)^2}.
Also, \eqn{R^2} is calculated wither using as the square of the correlation
between the observed and predicted outcomes when \code{form = "corr"}. when
\code{form = "traditional"}, \deqn{ R^2 = 1-\frac{\sum (y_i -
\hat{y}_i)^2}{\sum (y_i - \bar{y}_i)^2} }

For \code{defaultSummary} is the default function to compute performance
metrics in \code{\link{train}}. It is a wrapper around \code{postResample}.

\code{twoClassSummary} computes sensitivity, specificity and the area under
the ROC curve. \code{mnLogLoss} computes the minus log-likelihood of the
multinomial distribution (without the constant term): \deqn{ -logLoss =
\frac{-1}{n}\sum_{i=1}^n \sum_{j=1}^C y_{ij} \log(p_{ij}) } where the
\code{y} values are binary indicators for the classes and \code{p} are the
predicted class probabilities.

\code{prSummary} (for precision and recall) computes values for the default
0.50 probability cutoff as well as the area under the precision-recall curve
across all cutoffs and is labelled as \code{"AUC"} in the output. If assumes
that the first level of the factor variables corresponds to a relevant
result but the \code{lev} argument can be used to change this.

\code{multiClassSummary} computes some overall measures of for performance
(e.g. overall accuracy and the Kappa statistic) and several averages of
statistics calculated from "one-versus-all" configurations. For example, if
there are three classes, three sets of sensitivity values are determined and
the average is reported with the name ("Mean_Sensitivity"). The same is true
for a number of statistics generated by \code{\link{confusionMatrix}}. With
two classes, the basic sensitivity is reported with the name "Sensitivity"

To use \code{twoClassSummary} and/or \code{mnLogLoss}, the \code{classProbs}
argument of \code{\link{trainControl}} should be \code{TRUE}.
\code{multiClassSummary} can be used without class probabilities but some
statistics (e.g. overall log loss and the average of per-class area under
the ROC curves) will not be in the result set.

Other functions can be used via the \code{summaryFunction} argument of
\code{\link{trainControl}}. Custom functions must have the same arguments
as\code{defaultSummary}.

The function \code{getTrainPerf} returns a one row data frame with the
resampling results for the chosen model. The statistics will have the prefix
"\code{Train}" (i.e. "\code{TrainROC}"). There is also a column called
"\code{method}" that echoes the argument of the call to
\code{\link{trainControl}} of the same name.
}
\examples{

predicted <-  matrix(rnorm(50), ncol = 5)
observed <- rnorm(10)
apply(predicted, 2, postResample, obs = observed)

classes <- c("class1", "class2")
set.seed(1)
dat <- data.frame(obs =  factor(sample(classes, 50, replace = TRUE)),
                  pred = factor(sample(classes, 50, replace = TRUE)),
                  class1 = runif(50), class2 = runif(50))

defaultSummary(dat, lev = classes)
twoClassSummary(dat, lev = classes)
prSummary(dat, lev = classes)
mnLogLoss(dat, lev = classes)

}
\author{
Max Kuhn, Zachary Mayer
}
\references{
Kvalseth. Cautionary note about \eqn{R^2}. American Statistician
(1985) vol. 39 (4) pp. 279-285
}
\seealso{
\code{\link{trainControl}}
}
\keyword{utilities}