1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
|
\name{predict.randomForest}
\alias{predict.randomForest}
\title{predict method for random forest objects}
\description{
Prediction of test data using random forest.
}
\usage{
\method{predict}{randomForest}(object, newdata, type="response",
norm.votes=TRUE, predict.all=FALSE, proximity=FALSE, nodes=FALSE,
cutoff, ...)
}
\arguments{
\item{object}{an object of class \code{randomForest}, as that
created by the function \code{randomForest}.}
\item{newdata}{a data frame or matrix containing new data. (Note: If
not given, the out-of-bag prediction in \code{object} is returned.}
\item{type}{one of \code{response}, \code{prob}. or \code{votes},
indicating the type of output: predicted values, matrix of class
probabilities, or matrix of vote counts. \code{class} is allowed, but
automatically converted to "response", for backward compatibility.}
\item{norm.votes}{Should the vote counts be normalized (i.e.,
expressed as fractions)? Ignored if \code{object$type} is
\code{regression}.}
\item{predict.all}{Should the predictions of all trees be kept?}
\item{proximity}{Should proximity measures be computed? An error is
issued if \code{object$type} is \code{regression}.}
\item{nodes}{Should the terminal node indicators (an n by ntree
matrix) be return? If so, it is in the ``nodes'' attribute of the
returned object.}
\item{cutoff}{(Classification only) A vector of length equal to
number of classes. The `winning' class for an observation is the
one with the maximum ratio of proportion of votes to cutoff.
Default is taken from the \code{forest$cutoff} component of
\code{object} (i.e., the setting used when running
\code{\link{randomForest}}).}
\item{...}{not used currently.}
}
\value{
If \code{object$type} is \code{regression}, a vector of predicted
values is returned. If \code{predict.all=TRUE}, then the returned
object is a list of two components: \code{aggregate}, which is the
vector of predicted values by the forest, and \code{individual}, which
is a matrix where each column contains prediction by a tree in the
forest.
If \code{object$type} is \code{classification}, the object returned
depends on the argument \code{type}:
\item{response}{predicted classes (the classes with majority vote).}
\item{prob}{matrix of class probabilities (one column for each class
and one row for each input).}
\item{vote}{matrix of vote counts (one column for each class
and one row for each new input); either in raw counts or in fractions
(if \code{norm.votes=TRUE}).}
If \code{predict.all=TRUE}, then the \code{individual} component of the
returned object is a character matrix where each column contains the
predicted class by a tree in the forest.
If \code{proximity=TRUE}, the returned object is a list with two
components: \code{pred} is the prediction (as described above) and
\code{proximity} is the proximitry matrix. An error is issued if
\code{object$type} is \code{regression}.
If \code{nodes=TRUE}, the returned object has a ``nodes'' attribute,
which is an n by ntree matrix, each column containing the node number
that the cases fall in for that tree.
NOTE: If the \code{object} inherits from \code{randomForest.formula},
then any data with \code{NA} are silently omitted from the prediction.
The returned value will contain \code{NA} correspondingly in the
aggregated and individual tree predictions (if requested), but not in
the proximity or node matrices.
NOTE2: Any ties are broken at random, so if this is undesirable, avoid it by
using odd number \code{ntree} in \code{randomForest()}.
}
\references{
Breiman, L. (2001), \emph{Random Forests}, Machine Learning 45(1),
5-32.
}
\author{ Andy Liaw \email{andy_liaw@merck.com} and Matthew Wiener
\email{matthew_wiener@merck.com}, based on original Fortran code by
Leo Breiman and Adele Cutler.}
\seealso{\code{\link{randomForest}}}
\examples{
data(iris)
set.seed(111)
ind <- sample(2, nrow(iris), replace = TRUE, prob=c(0.8, 0.2))
iris.rf <- randomForest(Species ~ ., data=iris[ind == 1,])
iris.pred <- predict(iris.rf, iris[ind == 2,])
table(observed = iris[ind==2, "Species"], predicted = iris.pred)
## Get prediction for all trees.
predict(iris.rf, iris[ind == 2,], predict.all=TRUE)
## Proximities.
predict(iris.rf, iris[ind == 2,], proximity=TRUE)
## Nodes matrix.
str(attr(predict(iris.rf, iris[ind == 2,], nodes=TRUE), "nodes"))
}
\keyword{classif}% at least one, from doc/KEYWORDS
\keyword{regression}
|