File: postResample.Rd

package info (click to toggle)
r-cran-caret 7.0-1%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 4,036 kB
sloc: ansic: 210; sh: 10; makefile: 2
file content (157 lines) | stat: -rw-r--r-- 6,067 bytes
parent folder | download | duplicates (2)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aaa.R, R/postResample.R, R/prec_rec.R
\name{defaultSummary}
\alias{defaultSummary}
\alias{postResample}
\alias{twoClassSummary}
\alias{prSummary}
\alias{getTrainPerf}
\alias{mnLogLoss}
\alias{R2}
\alias{RMSE}
\alias{multiClassSummary}
\alias{MAE}
\title{Calculates performance across resamples}
\usage{
defaultSummary(data, lev = NULL, model = NULL)

postResample(pred, obs)

twoClassSummary(data, lev = NULL, model = NULL)

mnLogLoss(data, lev = NULL, model = NULL)

multiClassSummary(data, lev = NULL, model = NULL)

prSummary(data, lev = NULL, model = NULL)
}
\arguments{
\item{data}{a data frame with columns \code{obs} and
\code{pred} for the observed and predicted outcomes. For metrics
that rely on class probabilities, such as
\code{twoClassSummary}, columns should also include predicted
probabilities for each class. See the \code{classProbs} argument
to \code{\link{trainControl}}.}

\item{lev}{a character vector of factors levels for the
response. In regression cases, this would be \code{NULL}.}

\item{model}{a character string for the model name (as taken
from the \code{method} argument of \code{\link{train}}.}

\item{pred}{A vector of numeric data (could be a factor)}

\item{obs}{A vector of numeric data (could be a factor)}
}
\value{
A vector of performance estimates.
}
\description{
Given two numeric vectors of data, the mean squared error and
 R-squared are calculated. For two factors, the overall agreement
 rate and Kappa are determined.
}
\details{
\code{postResample} is meant to be used with \code{apply}
 across a matrix. For numeric data the code checks to see if the
 standard deviation of either vector is zero. If so, the
 correlation between those samples is assigned a value of zero.
 \code{NA} values are ignored everywhere.

Note that many models have more predictors (or parameters) than
 data points, so the typical mean squared error denominator (n -
 p) does not apply. Root mean squared error is calculated using
 \code{sqrt(mean((pred - obs)^2}. Also, \eqn{R^2} is calculated
 wither using as the square of the correlation between the
 observed and predicted outcomes when \code{form = "corr"}. when
 \code{form = "traditional"}, \deqn{ R^2 = 1-\frac{\sum (y_i -
 \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} }. Mean absolute error
 is calculated using \code{mean(abs(pred-obs))}.

\code{defaultSummary} is the default function to compute
 performance metrics in \code{\link{train}}. It is a wrapper
 around \code{postResample}. The first argument is \code{data},
 which is \code{data.frame} with columns named \code{obs} and
 \code{pred} for the observed and predicted outcome values
 (either numeric data for regression or character values for
 classification). The second argument is \code{lev}, a character
 string that has the outcome factor levels or NULL for a
 regression model. The third parameter is \code{model}, which can
 be used if a summary metric is specific to a model function. If
 other columns from the data are required to compute the summary
 statistics, but should not be used in the model, the
 \code{recipe} method for \code{\link{train}} can be used.

\code{twoClassSummary} computes sensitivity, specificity and
 the area under the ROC curve. \code{mnLogLoss} computes the
 minus log-likelihood of the multinomial distribution (without
 the constant term): \deqn{ -logLoss = \frac{-1}{n}\sum_{i=1}^n
 \sum_{j=1}^C y_{ij} \log(p_{ij}) } where the \code{y} values are
 binary indicators for the classes and \code{p} are the predicted
 class probabilities.

\code{prSummary} (for precision and recall) computes values for
 the default 0.50 probability cutoff as well as the area under
 the precision-recall curve across all cutoffs and is labelled as
 \code{"AUC"} in the output. If assumes that the first level of
 the factor variables corresponds to a relevant result but the
 \code{lev} argument can be used to change this.

\code{multiClassSummary} computes some overall measures of for
 performance (e.g. overall accuracy and the Kappa statistic) and
 several averages of statistics calculated from "one-versus-all"
 configurations. For example, if there are three classes, three
 sets of sensitivity values are determined and the average is
 reported with the name ("Mean_Sensitivity"). The same is true
 for a number of statistics generated by
 \code{\link{confusionMatrix}}. With two classes, the basic
 sensitivity is reported with the name "Sensitivity".

To use \code{twoClassSummary} and/or \code{mnLogLoss}, the
 \code{classProbs} argument of \code{\link{trainControl}} should
 be \code{TRUE}. \code{multiClassSummary} can be used without
 class probabilities but some statistics (e.g. overall log loss
 and the average of per-class area under the ROC curves) will not
 be in the result set.

Other functions can be used via the \code{summaryFunction}
 argument of \code{\link{trainControl}}. Custom functions must
 have the same arguments as\code{defaultSummary}.

The function \code{getTrainPerf} returns a one row data frame
 with the resampling results for the chosen model. The statistics
 will have the prefix "\code{Train}" (i.e. "\code{TrainROC}").
 There is also a column called "\code{method}" that echoes the
 argument of the call to \code{\link{trainControl}} of the same
 name.
}
\examples{

predicted <-  matrix(rnorm(50), ncol = 5)
observed <- rnorm(10)
apply(predicted, 2, postResample, obs = observed)

classes <- c("class1", "class2")
set.seed(1)
dat <- data.frame(obs =  factor(sample(classes, 50, replace = TRUE)),
                  pred = factor(sample(classes, 50, replace = TRUE)),
                  class1 = runif(50))
dat$class2 <- 1 - dat$class1

defaultSummary(dat, lev = classes)
twoClassSummary(dat, lev = classes)
prSummary(dat, lev = classes)
mnLogLoss(dat, lev = classes)

}
\references{
Kvalseth. Cautionary note about \eqn{R^2}. American Statistician
(1985) vol. 39 (4) pp. 279-285
}
\seealso{
\code{\link{trainControl}}
}
\author{
Max Kuhn, Zachary Mayer
}
\keyword{utilities}