1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/thresholder.R
\name{thresholder}
\alias{thresholder}
\title{Generate Data to Choose a Probability Threshold}
\usage{
thresholder(x, threshold, final = TRUE, statistics = "all")
}
\arguments{
\item{x}{A \code{\link{train}} object where the values of
\code{savePredictions} was either \code{TRUE}, \code{"all"},
or \code{"final"} in \code{\link{trainControl}}. Also, the
control argument \code{clasProbs} should have been \code{TRUE}.}
\item{threshold}{A numeric vector of candidate probability thresholds
between [0,1]. If the class probability corresponding to the first
level of the outcome is greater than the threshold, the data point
is classified as that level.}
\item{final}{A logical: should only the final tuning parameters
chosen by \code{\link{train}} be used when
\code{savePredictions = 'all'}?}
\item{statistics}{A character vector indicating which statistics to
calculate. See details below for possible choices; the default value
\code{"all"} computes all of these.}
}
\value{
A data frame with columns for each of the tuning parameters
from the model along with an additional column called
\code{prob_threshold} for the probability threshold. There are
also columns for summary statistics averaged over resamples with
column names corresponding to the input argument \code{statistics}.
}
\description{
This function uses the resampling results from a \code{\link{train}}
object to generate performance statistics over a set of probability
thresholds for two-class problems.
}
\details{
The argument \code{statistics} designates the statistics to compute
for each probability threshold. One or more of the following statistics can
be selected:
\itemize{
\item Sensitivity
\item Specificity
\item Pos Pred Value
\item Neg Pred Value
\item Precision
\item Recall
\item F1
\item Prevalence
\item Detection Rate
\item Detection Prevalence
\item Balanced Accuracy
\item Accuracy
\item Kappa
\item J
\item Dist
}
For a description of these statistics (except the last two), see the
documentation of \code{\link{confusionMatrix}}. The last two statistics
are Youden's J statistic and the distance to the best possible cutoff (i.e.
perfect sensitivity and specificity.
}
\examples{
\dontrun{
set.seed(2444)
dat <- twoClassSim(500, intercept = -10)
table(dat$Class)
ctrl <- trainControl(method = "cv",
classProbs = TRUE,
savePredictions = "all",
summaryFunction = twoClassSummary)
set.seed(2863)
mod <- train(Class ~ ., data = dat,
method = "rda",
tuneLength = 4,
metric = "ROC",
trControl = ctrl)
resample_stats <- thresholder(mod,
threshold = seq(.5, 1, by = 0.05),
final = TRUE)
ggplot(resample_stats, aes(x = prob_threshold, y = J)) +
geom_point()
ggplot(resample_stats, aes(x = prob_threshold, y = Dist)) +
geom_point()
ggplot(resample_stats, aes(x = prob_threshold, y = Sensitivity)) +
geom_point() +
geom_point(aes(y = Specificity), col = "red")
}
}
|