File: predict.xgb.Booster.Rd

package info (click to toggle)
xgboost 1.2.1-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 8,472 kB
  • sloc: cpp: 32,873; python: 12,641; java: 2,926; xml: 1,024; sh: 662; ansic: 448; makefile: 306; javascript: 19
file content (202 lines) | stat: -rw-r--r-- 9,067 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.Booster.R
\name{predict.xgb.Booster}
\alias{predict.xgb.Booster}
\alias{predict.xgb.Booster.handle}
\title{Predict method for eXtreme Gradient Boosting model}
\usage{
\method{predict}{xgb.Booster}(
  object,
  newdata,
  missing = NA,
  outputmargin = FALSE,
  ntreelimit = NULL,
  predleaf = FALSE,
  predcontrib = FALSE,
  approxcontrib = FALSE,
  predinteraction = FALSE,
  reshape = FALSE,
  training = FALSE,
  ...
)

\method{predict}{xgb.Booster.handle}(object, ...)
}
\arguments{
\item{object}{Object of class \code{xgb.Booster} or \code{xgb.Booster.handle}}

\item{newdata}{takes \code{matrix}, \code{dgCMatrix}, local data file or \code{xgb.DMatrix}.}

\item{missing}{Missing is only used when input is dense matrix. Pick a float value that represents
missing values in data (e.g., sometimes 0 or some other extreme value is used).}

\item{outputmargin}{whether the prediction should be returned in the for of original untransformed
sum of predictions from boosting iterations' results. E.g., setting \code{outputmargin=TRUE} for
logistic regression would result in predictions for log-odds instead of probabilities.}

\item{ntreelimit}{limit the number of model's trees or boosting iterations used in prediction (see Details).
It will use all the trees by default (\code{NULL} value).}

\item{predleaf}{whether predict leaf index.}

\item{predcontrib}{whether to return feature contributions to individual predictions (see Details).}

\item{approxcontrib}{whether to use a fast approximation for feature contributions (see Details).}

\item{predinteraction}{whether to return contributions of feature interactions to individual predictions (see Details).}

\item{reshape}{whether to reshape the vector of predictions to a matrix form when there are several
prediction outputs per case. This option has no effect when either of predleaf, predcontrib,
or predinteraction flags is TRUE.}

\item{training}{whether is the prediction result used for training.  For dart booster,
training predicting will perform dropout.}

\item{...}{Parameters passed to \code{predict.xgb.Booster}}
}
\value{
For regression or binary classification, it returns a vector of length \code{nrows(newdata)}.
For multiclass classification, either a \code{num_class * nrows(newdata)} vector or
a \code{(nrows(newdata), num_class)} dimension matrix is returned, depending on
the \code{reshape} value.

When \code{predleaf = TRUE}, the output is a matrix object with the
number of columns corresponding to the number of trees.

When \code{predcontrib = TRUE} and it is not a multiclass setting, the output is a matrix object with
\code{num_features + 1} columns. The last "+ 1" column in a matrix corresponds to bias.
For a multiclass case, a list of \code{num_class} elements is returned, where each element is
such a matrix. The contribution values are on the scale of untransformed margin
(e.g., for binary classification would mean that the contributions are log-odds deviations from bias).

When \code{predinteraction = TRUE} and it is not a multiclass setting, the output is a 3d array with
dimensions \code{c(nrow, num_features + 1, num_features + 1)}. The off-diagonal (in the last two dimensions)
elements represent different features interaction contributions. The array is symmetric WRT the last
two dimensions. The "+ 1" columns corresponds to bias. Summing this array along the last dimension should
produce practically the same result as predict with \code{predcontrib = TRUE}.
For a multiclass case, a list of \code{num_class} elements is returned, where each element is
such an array.
}
\description{
Predicted values based on either xgboost model or model handle object.
}
\details{
Note that \code{ntreelimit} is not necessarily equal to the number of boosting iterations
and it is not necessarily equal to the number of trees in a model.
E.g., in a random forest-like model, \code{ntreelimit} would limit the number of trees.
But for multiclass classification, while there are multiple trees per iteration,
\code{ntreelimit} limits the number of boosting iterations.

Also note that \code{ntreelimit} would currently do nothing for predictions from gblinear,
since gblinear doesn't keep its boosting history.

One possible practical applications of the \code{predleaf} option is to use the model
as a generator of new features which capture non-linearity and interactions,
e.g., as implemented in \code{\link{xgb.create.features}}.

Setting \code{predcontrib = TRUE} allows to calculate contributions of each feature to
individual predictions. For "gblinear" booster, feature contributions are simply linear terms
(feature_beta * feature_value). For "gbtree" booster, feature contributions are SHAP
values (Lundberg 2017) that sum to the difference between the expected output
of the model and the current prediction (where the hessian weights are used to compute the expectations).
Setting \code{approxcontrib = TRUE} approximates these values following the idea explained
in \url{http://blog.datadive.net/interpreting-random-forests/}.

With \code{predinteraction = TRUE}, SHAP values of contributions of interaction of each pair of features
are computed. Note that this operation might be rather expensive in terms of compute and memory.
Since it quadratically depends on the number of features, it is recommended to perform selection
of the most important features first. See below about the format of the returned results.
}
\examples{
## binary classification:

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test

bst <- xgboost(data = train$data, label = train$label, max_depth = 2,
               eta = 0.5, nthread = 2, nrounds = 5, objective = "binary:logistic")
# use all trees by default
pred <- predict(bst, test$data)
# use only the 1st tree
pred1 <- predict(bst, test$data, ntreelimit = 1)

# Predicting tree leafs:
# the result is an nsamples X ntrees matrix
pred_leaf <- predict(bst, test$data, predleaf = TRUE)
str(pred_leaf)

# Predicting feature contributions to predictions:
# the result is an nsamples X (nfeatures + 1) matrix
pred_contr <- predict(bst, test$data, predcontrib = TRUE)
str(pred_contr)
# verify that contributions' sums are equal to log-odds of predictions (up to float precision):
summary(rowSums(pred_contr) - qlogis(pred))
# for the 1st record, let's inspect its features that had non-zero contribution to prediction:
contr1 <- pred_contr[1,]
contr1 <- contr1[-length(contr1)]    # drop BIAS
contr1 <- contr1[contr1 != 0]        # drop non-contributing features
contr1 <- contr1[order(abs(contr1))] # order by contribution magnitude
old_mar <- par("mar")
par(mar = old_mar + c(0,7,0,0))
barplot(contr1, horiz = TRUE, las = 2, xlab = "contribution to prediction in log-odds")
par(mar = old_mar)


## multiclass classification in iris dataset:

lb <- as.numeric(iris$Species) - 1
num_class <- 3
set.seed(11)
bst <- xgboost(data = as.matrix(iris[, -5]), label = lb,
               max_depth = 4, eta = 0.5, nthread = 2, nrounds = 10, subsample = 0.5,
               objective = "multi:softprob", num_class = num_class)
# predict for softmax returns num_class probability numbers per case:
pred <- predict(bst, as.matrix(iris[, -5]))
str(pred)
# reshape it to a num_class-columns matrix
pred <- matrix(pred, ncol=num_class, byrow=TRUE)
# convert the probabilities to softmax labels
pred_labels <- max.col(pred) - 1
# the following should result in the same error as seen in the last iteration
sum(pred_labels != lb)/length(lb)

# compare that to the predictions from softmax:
set.seed(11)
bst <- xgboost(data = as.matrix(iris[, -5]), label = lb,
               max_depth = 4, eta = 0.5, nthread = 2, nrounds = 10, subsample = 0.5,
               objective = "multi:softmax", num_class = num_class)
pred <- predict(bst, as.matrix(iris[, -5]))
str(pred)
all.equal(pred, pred_labels)
# prediction from using only 5 iterations should result
# in the same error as seen in iteration 5:
pred5 <- predict(bst, as.matrix(iris[, -5]), ntreelimit=5)
sum(pred5 != lb)/length(lb)


## random forest-like model of 25 trees for binary classification:

set.seed(11)
bst <- xgboost(data = train$data, label = train$label, max_depth = 5,
               nthread = 2, nrounds = 1, objective = "binary:logistic",
               num_parallel_tree = 25, subsample = 0.6, colsample_bytree = 0.1)
# Inspect the prediction error vs number of trees:
lb <- test$label
dtest <- xgb.DMatrix(test$data, label=lb)
err <- sapply(1:25, function(n) {
  pred <- predict(bst, dtest, ntreelimit=n)
  sum((pred > 0.5) != lb)/length(lb)
})
plot(err, type='l', ylim=c(0,0.1), xlab='#trees')

}
\references{
Scott M. Lundberg, Su-In Lee, "A Unified Approach to Interpreting Model Predictions", NIPS Proceedings 2017, \url{https://arxiv.org/abs/1705.07874}

Scott M. Lundberg, Su-In Lee, "Consistent feature attribution for tree ensembles", \url{https://arxiv.org/abs/1706.06060}
}
\seealso{
\code{\link{xgb.train}}.
}