File: xgb.create.features.Rd

package info (click to toggle)
xgboost 1.2.1-1
links: PTS, VCS
area: main
in suites: bullseye
size: 8,472 kB
sloc: cpp: 32,873; python: 12,641; java: 2,926; xml: 1,024; sh: 662; ansic: 448; makefile: 306; javascript: 19
file content (92 lines) | stat: -rw-r--r-- 3,697 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.create.features.R
\name{xgb.create.features}
\alias{xgb.create.features}
\title{Create new features from a previously learned model}
\usage{
xgb.create.features(model, data, ...)
}
\arguments{
\item{model}{decision tree boosting model learned on the original data}

\item{data}{original data (usually provided as a \code{dgCMatrix} matrix)}

\item{...}{currently not used}
}
\value{
\code{dgCMatrix} matrix including both the original data and the new features.
}
\description{
May improve the learning by adding new features to the training data based on the decision trees from a previously learned model.
}
\details{
This is the function inspired from the paragraph 3.1 of the paper:

\strong{Practical Lessons from Predicting Clicks on Ads at Facebook}

\emph{(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers,
Joaquin Quinonero Candela)}

International Workshop on Data Mining for Online Advertising (ADKDD) - August 24, 2014

\url{https://research.fb.com/publications/practical-lessons-from-predicting-clicks-on-ads-at-facebook/}.

Extract explaining the method:

"We found that boosted decision trees are a powerful and very
convenient way to implement non-linear and tuple transformations
of the kind we just described. We treat each individual
tree as a categorical feature that takes as value the
index of the leaf an instance ends up falling in. We use
1-of-K coding of this type of features.

For example, consider the boosted tree model in Figure 1 with 2 subtrees,
where the first subtree has 3 leafs and the second 2 leafs. If an
instance ends up in leaf 2 in the first subtree and leaf 1 in
second subtree, the overall input to the linear classifier will
be the binary vector \code{[0, 1, 0, 1, 0]}, where the first 3 entries
correspond to the leaves of the first subtree and last 2 to
those of the second subtree.

[...]

We can understand boosted decision tree
based transformation as a supervised feature encoding that
converts a real-valued vector into a compact binary-valued
vector. A traversal from root node to a leaf node represents
a rule on certain features."
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(data = agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(data = agaricus.test$data, label = agaricus.test$label)

param <- list(max_depth=2, eta=1, silent=1, objective='binary:logistic')
nrounds = 4

bst = xgb.train(params = param, data = dtrain, nrounds = nrounds, nthread = 2)

# Model accuracy without new features
accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) /
                   length(agaricus.test$label)

# Convert previous features to one hot encoding
new.features.train <- xgb.create.features(model = bst, agaricus.train$data)
new.features.test <- xgb.create.features(model = bst, agaricus.test$data)

# learning with new features
new.dtrain <- xgb.DMatrix(data = new.features.train, label = agaricus.train$label)
new.dtest <- xgb.DMatrix(data = new.features.test, label = agaricus.test$label)
watchlist <- list(train = new.dtrain)
bst <- xgb.train(params = param, data = new.dtrain, nrounds = nrounds, nthread = 2)

# Model accuracy with new features
accuracy.after <- sum((predict(bst, new.dtest) >= 0.5) == agaricus.test$label) /
                  length(agaricus.test$label)

# Here the accuracy was already good and is now perfect.
cat(paste("The accuracy was", accuracy.before, "before adding leaf features and it is now",
          accuracy.after, "!\n"))

}