File: multiclass.Rd

package info (click to toggle)
r-cran-proc 1.18.5-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,260 kB
  • sloc: cpp: 144; sh: 14; makefile: 2
file content (188 lines) | stat: -rw-r--r-- 6,770 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
\encoding{UTF-8}
\name{multiclass.roc}
\alias{multiclass.roc}
\alias{multiclass.roc.default}
\alias{multiclass.roc.formula}
\title{
 Multi-class AUC
}
\description{
  This function builds builds multiple ROC curve to compute the
  multi-class AUC as defined by Hand and Till.
}
\usage{
multiclass.roc(...)
\S3method{multiclass.roc}{formula}(formula, data, ...)
\S3method{multiclass.roc}{default}(response, predictor,
levels=base::levels(as.factor(response)), 
percent=FALSE, direction = c("auto", "<", ">"), ...)

}

\arguments{
  \item{response}{a factor, numeric or character vector of
    responses (true class), typically encoded with 0 (controls) and 1 (cases), as in
    \code{\link{roc}}.
  }
  \item{predictor}{either a numeric vector, containing the value of each
    observation, as in \code{\link{roc}}, or, a matrix giving the decision value
    (e.g. probability) for each class.
  }
  \item{formula}{a formula of the type \code{response~predictor}.}
  \item{data}{a matrix or data.frame containing the variables in the
    formula. See \code{\link{model.frame}} for more details.}
  \item{levels}{the value of the response for controls and cases
    respectively. In contrast with \code{levels} argument to
    \code{\link{roc}}, all the levels are used and
    \link[=combn]{combined} to compute the multiclass AUC.
  }
  \item{percent}{if the sensitivities, specificities and AUC must be
    given in percent (\code{TRUE}) or in fraction (\code{FALSE}, default).
  }
  \item{direction}{in which direction to make the comparison?
    \dQuote{auto} (default for univariate curves):
    automatically define in which group the
    median is higher and take the direction accordingly. 
    Not available for multivariate curves.
    \dQuote{>} (default for multivariate curves):
    if the predictor values for the control group are
    higher than the values of the case group (controls > t >= cases).
    \dQuote{<}: if the predictor values for the control group are lower
    or equal than the values of the case group (controls < t <= cases).
  }
  \item{...}{further arguments passed to \code{\link{roc}}.
  }
}
\details{
This function performs multiclass AUC as defined by Hand and Till
(2001). A multiclass AUC is a mean of several \code{\link{auc}} and
cannot be plotted. Only AUCs can be computed for such curves.
Confidence intervals, standard deviation, smoothing and
comparison tests are not implemented.

The \code{multiclass.roc} function can handle two types of datasets: uni- and multi-variate.
In the univariate case, a single \code{predictor} vector is passed
and all the combinations of responses are assessed.
I the multivariate case, a \code{\link{matrix}} or \code{\link{data.frame}}
is passed as \code{predictor}. The columns must be named according to the
levels of the \code{response}.

This function has been much less tested than the rest of the package and
is more subject to bugs. Please report them if you find one.
}

\value{
  If \code{predictor} is a vector, a list of class \dQuote{multiclass.roc} 
  (univariate) or \dQuote{mv.multiclass.roc} (multivariate), 
  with the following fields: 
  \item{auc}{if called with \code{auc=TRUE}, a numeric of class \dQuote{auc} as
    defined in \code{\link{auc}}. Note that this is not the standard AUC
    but the multi-class AUC as defined by Hand and Till.
  }
  \item{ci}{if called with \code{ci=TRUE}, a numeric of class \dQuote{ci} as
    defined in \code{\link{ci}}.
  }
  \item{response}{the response vector as passed in argument. If
    \code{NA} values were removed, a \code{na.action} attribute similar
    to \code{\link{na.omit}} stores the row numbers.
  }
  \item{predictor}{the predictor vector as passed in argument. If
    \code{NA} values were removed, a \code{na.action} attribute similar
    to \code{\link{na.omit}} stores the row numbers.
  }
  \item{levels}{the levels of the response as defined in argument.}
  \item{percent}{if the sensitivities, specificities and AUC are
    reported in percent, as defined in argument.
  }
  \item{call}{how the function was called. See \code{\link{match.call}} for
    more details.
  }
}

\section{Warnings}{
  If \code{response} is an ordered factor and one of the levels
  specified in \code{levels} is missing, a warning is issued and the
  level is ignored.
}

\references{
  David J. Hand and Robert J. Till (2001). A Simple Generalisation of
  the Area Under the ROC Curve for Multiple Class Classification
  Problems. \emph{Machine Learning} \bold{45}(2), p. 171--186. DOI:
  \doi{10.1023/A:1010920819831}.
}

\seealso{
 \code{\link{auc}}
}

\examples{
####
# Examples for a univariate decision value
####
data(aSAH)

# Basic example
multiclass.roc(aSAH$gos6, aSAH$s100b)
# Produces an innocuous warning because one level has no observation

# Select only 3 of the aSAH$gos6 levels:
multiclass.roc(aSAH$gos6, aSAH$s100b, levels=c(3, 4, 5))

# Give the result in percent
multiclass.roc(aSAH$gos6, aSAH$s100b, percent=TRUE)

####
# Examples for multivariate decision values (e.g. class probabilities)
####

\dontrun{
# Example with a multinomial log-linear model from nnet
# We use the iris dataset and split into a training and test set
requireNamespace("nnet")
data(iris)
iris.sample <- sample(1:150)
iris.train <- iris[iris.sample[1:75],]
iris.test <- iris[iris.sample[76:150],]
mn.net <- nnet::multinom(Species ~ ., iris.train)

# Use predict with type="prob" to get class probabilities
iris.predictions <- predict(mn.net, newdata=iris.test, type="prob")
head(iris.predictions)

# This can be used directly in multiclass.roc:
multiclass.roc(iris.test$Species, iris.predictions)
}


# Let's see an other example with an artificial dataset
n <- c(100, 80, 150)
responses <- factor(c(rep("X1", n[1]), rep("X2", n[2]), rep("X3", n[3])))
# construct prediction matrix: one column per class

preds <- lapply(n, function(x) runif(x, 0.4, 0.6))
predictor <- as.matrix(data.frame(
                "X1" = c(preds[[1]], runif(n[2] + n[3], 0, 0.7)),
                "X2" = c(runif(n[1], 0.1, 0.4), preds[[2]], runif(n[3], 0.2, 0.8)),
                "X3" = c(runif(n[1] + n[2], 0.3, 0.7), preds[[3]])
             ))
multiclass.roc(responses, predictor)

# One can change direction , partial.auc, percent, etc:
multiclass.roc(responses, predictor, direction = ">")
multiclass.roc(responses, predictor, percent = TRUE, 
	partial.auc = c(100, 90), partial.auc.focus = "se")


# Limit set of levels
multiclass.roc(responses, predictor, levels = c("X1", "X2"))
# Use with formula. Here we need a data.frame to store the responses as characters
data <- cbind(as.data.frame(predictor), "response" = responses)
multiclass.roc(response ~ X1+X3, data)

}

\keyword{univar}
\keyword{nonparametric}
\keyword{utilities}
\keyword{roc}