File: PPC-discrete.Rd

package info (click to toggle)
r-cran-bayesplot 1.11.1-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 7,080 kB
sloc: sh: 13; makefile: 2
file content (228 lines) | stat: -rw-r--r-- 7,014 bytes
parent folder | download | duplicates (2)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ppc-discrete.R
\name{PPC-discrete}
\alias{PPC-discrete}
\alias{ppc_bars}
\alias{ppc_bars_grouped}
\alias{ppc_rootogram}
\alias{ppc_bars_data}
\title{PPCs for discrete outcomes}
\usage{
ppc_bars(
  y,
  yrep,
  ...,
  prob = 0.9,
  width = 0.9,
  size = 1,
  fatten = 2.5,
  linewidth = 1,
  freq = TRUE
)

ppc_bars_grouped(
  y,
  yrep,
  group,
  ...,
  facet_args = list(),
  prob = 0.9,
  width = 0.9,
  size = 1,
  fatten = 2.5,
  linewidth = 1,
  freq = TRUE
)

ppc_rootogram(
  y,
  yrep,
  style = c("standing", "hanging", "suspended"),
  ...,
  prob = 0.9,
  size = 1
)

ppc_bars_data(y, yrep, group = NULL, prob = 0.9, freq = TRUE)
}
\arguments{
\item{y}{A vector of observations. See \strong{Details}.}

\item{yrep}{An \code{S} by \code{N} matrix of draws from the posterior (or prior)
predictive distribution. The number of rows, \code{S}, is the size of the
posterior (or prior) sample used to generate \code{yrep}. The number of columns,
\code{N} is the number of predicted observations (\code{length(y)}). The columns of
\code{yrep} should be in the same order as the data points in \code{y} for the plots
to make sense. See the \strong{Details} and \strong{Plot Descriptions} sections for
additional advice specific to particular plots.}

\item{...}{Currently unused.}

\item{prob}{A value between \code{0} and \code{1} indicating the desired probability
mass to include in the \code{yrep} intervals. Set \code{prob=0} to remove the
intervals. (Note: for rootograms these are intervals of the \emph{square roots}
of the expected counts.)}

\item{width}{For bar plots only, passed to \code{\link[ggplot2:geom_bar]{ggplot2::geom_bar()}} to control
the bar width.}

\item{size, fatten, linewidth}{For bar plots, \code{size}, \code{fatten}, and \code{linewidth}
are passed to \code{\link[ggplot2:geom_linerange]{ggplot2::geom_pointrange()}} to control the appearance of the
\code{yrep} points and intervals. For rootograms \code{size} is passed to
\code{\link[ggplot2:geom_path]{ggplot2::geom_line()}}.}

\item{freq}{For bar plots only, if \code{TRUE} (the default) the y-axis will
display counts. Setting \code{freq=FALSE} will put proportions on the y-axis.}

\item{group}{A grouping variable of the same length as \code{y}.
Will be coerced to \link[base:factor]{factor} if not already a factor.
Each value in \code{group} is interpreted as the group level pertaining
to the corresponding observation.}

\item{facet_args}{An optional list of  arguments (other than \code{facets})
passed to \code{\link[ggplot2:facet_wrap]{ggplot2::facet_wrap()}} to control faceting.}

\item{style}{For \code{ppc_rootogram}, a string specifying the rootogram
style. The options are \code{"standing"}, \code{"hanging"}, and
\code{"suspended"}. See the \strong{Plot Descriptions} section, below, for
details on the different styles.}
}
\value{
The plotting functions return a ggplot object that can be further
customized using the \strong{ggplot2} package. The functions with suffix
\verb{_data()} return the data that would have been drawn by the plotting
function.
}
\description{
Many of the \link[=PPC-overview]{PPC} functions in \strong{bayesplot} can
be used with discrete data. The small subset of these functions that can
\emph{only} be used if \code{y} and \code{yrep} are discrete are documented
on this page. Currently these include rootograms for count outcomes and bar
plots for ordinal, categorical, and multinomial outcomes. See the
\strong{Plot Descriptions} section below.
}
\details{
For all of these plots \code{y} and \code{yrep} must be integers, although
they need not be integers in the strict sense of \R's
\link[base:integer]{integer} type. For rootogram plots \code{y} and \code{yrep} must also
be non-negative.
}
\section{Plot Descriptions}{

\describe{
\item{\code{ppc_bars()}}{
Bar plot of \code{y} with \code{yrep} medians and uncertainty intervals
superimposed on the bars.
}
\item{\code{ppc_bars_grouped()}}{
Same as \code{ppc_bars()} but a separate plot (facet) is generated for each
level of a grouping variable.
}
\item{\code{ppc_rootogram()}}{
Rootograms allow for diagnosing problems in count data models such as
overdispersion or excess zeros. They consist of a histogram of \code{y} with the
expected counts based on \code{yrep} overlaid as a line along with uncertainty
intervals. The y-axis represents the square roots of the counts to
approximately adjust for scale differences and thus ease comparison between
observed and expected counts. Using the \code{style} argument, the histogram
style can be adjusted to focus on different aspects of the data:
\itemize{
\item \emph{Standing}: basic histogram of observed counts with curve
showing expected counts.
\item \emph{Hanging}: observed counts counts hanging from the curve
representing expected counts.
\item \emph{Suspended}: histogram of the differences between expected and
observed counts.
}

\strong{All of the rootograms are plotted on the square root scale}. See Kleiber
and Zeileis (2016) for advice on interpreting rootograms and selecting
among the different styles.
}
}
}

\examples{
set.seed(9222017)

# bar plots
f <- function(N) {
  sample(1:4, size = N, replace = TRUE, prob = c(0.25, 0.4, 0.1, 0.25))
}
y <- f(100)
yrep <- t(replicate(500, f(100)))
dim(yrep)
group <- gl(2, 50, length = 100, labels = c("GroupA", "GroupB"))

color_scheme_set("mix-pink-blue")
ppc_bars(y, yrep)

# split by group, change interval width, and display proportion
# instead of count on y-axis
color_scheme_set("mix-blue-pink")
ppc_bars_grouped(y, yrep, group, prob = 0.5, freq = FALSE)

\dontrun{
# example for ordinal regression using rstanarm
library(rstanarm)
fit <- stan_polr(
  tobgp ~ agegp,
  data = esoph,
  method = "probit",
  prior = R2(0.2, "mean"),
  init_r = 0.1,
  seed = 12345,
  # cores = 4,
  refresh = 0
 )

# coded as character, so convert to integer
yrep_char <- posterior_predict(fit)
print(yrep_char[1, 1:4])

yrep_int <- sapply(data.frame(yrep_char, stringsAsFactors = TRUE), as.integer)
y_int <- as.integer(esoph$tobgp)

ppc_bars(y_int, yrep_int)

ppc_bars_grouped(
  y = y_int,
  yrep = yrep_int,
  group = esoph$agegp,
  freq=FALSE,
  prob = 0.5,
  fatten = 1,
  size = 1.5
)
}

# rootograms for counts
y <- rpois(100, 20)
yrep <- matrix(rpois(10000, 20), ncol = 100)

color_scheme_set("brightblue")
ppc_rootogram(y, yrep)
ppc_rootogram(y, yrep, prob = 0)

ppc_rootogram(y, yrep, style = "hanging", prob = 0.8)
ppc_rootogram(y, yrep, style = "suspended")

}
\references{
Kleiber, C. and Zeileis, A. (2016).
Visualizing count data regressions using rootograms.
\emph{The American Statistician}. 70(3): 296--303.
\url{https://arxiv.org/abs/1605.01311}.
}
\seealso{
Other PPCs: 
\code{\link{PPC-censoring}},
\code{\link{PPC-distributions}},
\code{\link{PPC-errors}},
\code{\link{PPC-intervals}},
\code{\link{PPC-loo}},
\code{\link{PPC-overview}},
\code{\link{PPC-scatterplots}},
\code{\link{PPC-test-statistics}}
}
\concept{PPCs}