File: calibration.Rd

package info (click to toggle)
r-cran-caret 7.0-1%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 4,036 kB
sloc: ansic: 210; sh: 10; makefile: 2
file content (137 lines) | stat: -rw-r--r-- 5,377 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calibration.R
\name{calibration}
\alias{calibration}
\alias{calibration.formula}
\alias{calibration.default}
\alias{xyplot.calibration}
\alias{ggplot.calibration}
\alias{panel.calibration}
\alias{print.calibration}
\title{Probability Calibration Plot}
\usage{
calibration(x, ...)

\method{calibration}{default}(x, ...)

\method{calibration}{formula}(
  x,
  data = NULL,
  class = NULL,
  cuts = 11,
  subset = TRUE,
  lattice.options = NULL,
  ...
)

\method{print}{calibration}(x, ...)

\method{xyplot}{calibration}(x, data = NULL, ...)

\method{ggplot}{calibration}(data, ..., bwidth = 2, dwidth = 3)
}
\arguments{
\item{x}{a \code{lattice} formula (see \code{\link[lattice:xyplot]{xyplot}} for syntax) where the left
-hand side of the formula is a factor class variable of the observed outcome and the right-hand side
specifies one or model columns corresponding to a numeric ranking variable for a model (e.g. class
probabilities). The classification variable should have two levels.}

\item{\dots}{options to pass through to \code{\link[lattice:xyplot]{xyplot}} or the panel function (not
used in \code{calibration.formula}).}

\item{data}{For \code{calibration.formula}, a data frame (or more precisely, anything that is a valid
\code{envir} argument in \code{eval}, e.g., a list or an environment) containing values for any
variables in the formula, as well as \code{groups} and \code{subset} if applicable. If not found in
\code{data}, or if \code{data} is unspecified, the variables are looked for in the environment of the
formula. This argument is not used for \code{xyplot.calibration}. For \code{ggplot.calibration}, \code{data}
should be an object of class "\code{calibration}".}

\item{class}{a character string for the class of interest}

\item{cuts}{If a single number this indicates the number of splits of the data are used to create the
plot. By default, it uses as many cuts as there are rows in \code{data}. If a vector, these are the
actual cuts that will be used.}

\item{subset}{An expression that evaluates to a logical or integer indexing vector. It is evaluated in
\code{data}. Only the resulting rows of \code{data} are used for the plot.}

\item{lattice.options}{A list that could be supplied to \code{\link[lattice:lattice.options]{lattice.options}}}

\item{bwidth, dwidth}{a numeric value for the confidence interval bar width and dodge width, respectively.
In the latter case, a dodge is only used when multiple models are specified in the formula.}
}
\value{
\code{calibration.formula} returns a list with elements:
\item{data}{the data used for plotting}
\item{cuts}{the number of cuts}
\item{class}{the event class}
\item{probNames}{the names of the model probabilities}

\code{xyplot.calibration} returns a \pkg{lattice} object
}
\description{
For classification models, this function creates a 'calibration plot' that describes
how consistent model probabilities are with observed event rates.
}
\details{
\code{calibration.formula} is used to process the data and \code{xyplot.calibration} is used to create the plot.

To construct the calibration plot, the following steps are used for each model:

\enumerate{
   \item The data are split into \code{cuts - 1} roughly equal groups by their class probabilities
   \item the number of samples with true results equal to \code{class} are determined
   \item the event rate is determined for each bin}
\code{xyplot.calibration} produces a plot of the observed event rate by the mid-point of the bins.

This implementation uses the \pkg{lattice} function \code{\link[lattice:xyplot]{xyplot}}, so plot
elements can be changed via panel functions, \code{\link[lattice:trellis.par.get]{trellis.par.set}} or
other means. \code{calibration} uses the panel function \code{\link{panel.calibration}} by default, but
it can be changed by passing that argument into \code{xyplot.calibration}.

The following elements are set by default in the plot but can be changed by passing new values into
\code{xyplot.calibration}: \code{xlab = "Bin Midpoint"}, \code{ylab = "Observed Event Percentage"},
\code{type = "o"}, \code{ylim = extendrange(c(0, 100))},\code{xlim = extendrange(c(0, 100))} and
\code{panel = panel.calibration}

For the \code{ggplot} method, confidence intervals on the estimated proportions (from
\code{\link[stats]{binom.test}}) are also shown.
}
\examples{
\dontrun{
data(mdrr)
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .5)]


inTrain <- createDataPartition(mdrrClass)
trainX <- mdrrDescr[inTrain[[1]], ]
trainY <- mdrrClass[inTrain[[1]]]
testX <- mdrrDescr[-inTrain[[1]], ]
testY <- mdrrClass[-inTrain[[1]]]

library(MASS)

ldaFit <- lda(trainX, trainY)
qdaFit <- qda(trainX, trainY)

testProbs <- data.frame(obs = testY,
                        lda = predict(ldaFit, testX)$posterior[,1],
                        qda = predict(qdaFit, testX)$posterior[,1])

calibration(obs ~ lda + qda, data = testProbs)

calPlotData <- calibration(obs ~ lda + qda, data = testProbs)
calPlotData

xyplot(calPlotData, auto.key = list(columns = 2))
}

}
\seealso{
\code{\link[lattice:xyplot]{xyplot}}, \code{\link[lattice:trellis.par.get]{trellis.par.set}}
}
\author{
Max Kuhn, some \pkg{lattice} code and documentation by Deepayan Sarkar
}
\keyword{hplot}