1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.ggplot.R, R/xgb.plot.importance.R
\name{xgb.ggplot.importance}
\alias{xgb.ggplot.importance}
\alias{xgb.plot.importance}
\title{Plot feature importance as a bar graph}
\usage{
xgb.ggplot.importance(
importance_matrix = NULL,
top_n = NULL,
measure = NULL,
rel_to_first = FALSE,
n_clusters = c(1:10),
...
)
xgb.plot.importance(
importance_matrix = NULL,
top_n = NULL,
measure = NULL,
rel_to_first = FALSE,
left_margin = 10,
cex = NULL,
plot = TRUE,
...
)
}
\arguments{
\item{importance_matrix}{a \code{data.table} returned by \code{\link{xgb.importance}}.}
\item{top_n}{maximal number of top features to include into the plot.}
\item{measure}{the name of importance measure to plot.
When \code{NULL}, 'Gain' would be used for trees and 'Weight' would be used for gblinear.}
\item{rel_to_first}{whether importance values should be represented as relative to the highest ranked feature.
See Details.}
\item{n_clusters}{(ggplot only) a \code{numeric} vector containing the min and the max range
of the possible number of clusters of bars.}
\item{...}{other parameters passed to \code{barplot} (except horiz, border, cex.names, names.arg, and las).}
\item{left_margin}{(base R barplot) allows to adjust the left margin size to fit feature names.
When it is NULL, the existing \code{par('mar')} is used.}
\item{cex}{(base R barplot) passed as \code{cex.names} parameter to \code{barplot}.}
\item{plot}{(base R barplot) whether a barplot should be produced.
If FALSE, only a data.table is returned.}
}
\value{
The \code{xgb.plot.importance} function creates a \code{barplot} (when \code{plot=TRUE})
and silently returns a processed data.table with \code{n_top} features sorted by importance.
The \code{xgb.ggplot.importance} function returns a ggplot graph which could be customized afterwards.
E.g., to change the title of the graph, add \code{+ ggtitle("A GRAPH NAME")} to the result.
}
\description{
Represents previously calculated feature importance as a bar graph.
\code{xgb.plot.importance} uses base R graphics, while \code{xgb.ggplot.importance} uses the ggplot backend.
}
\details{
The graph represents each feature as a horizontal bar of length proportional to the importance of a feature.
Features are shown ranked in a decreasing importance order.
It works for importances from both \code{gblinear} and \code{gbtree} models.
When \code{rel_to_first = FALSE}, the values would be plotted as they were in \code{importance_matrix}.
For gbtree model, that would mean being normalized to the total of 1
("what is feature's importance contribution relative to the whole model?").
For linear models, \code{rel_to_first = FALSE} would show actual values of the coefficients.
Setting \code{rel_to_first = TRUE} allows to see the picture from the perspective of
"what is feature's importance contribution relative to the most important feature?"
The ggplot-backend method also performs 1-D clustering of the importance values,
with bar colors corresponding to different clusters that have somewhat similar importance values.
}
\examples{
data(agaricus.train)
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 3,
eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
importance_matrix <- xgb.importance(colnames(agaricus.train$data), model = bst)
xgb.plot.importance(importance_matrix, rel_to_first = TRUE, xlab = "Relative importance")
(gg <- xgb.ggplot.importance(importance_matrix, measure = "Frequency", rel_to_first = TRUE))
gg + ggplot2::ylab("Frequency")
}
\seealso{
\code{\link[graphics]{barplot}}.
}
|