1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stripplot.mids.R
\name{stripplot.mids}
\alias{stripplot.mids}
\alias{stripplot}
\title{Stripplot of observed and imputed data}
\usage{
\method{stripplot}{mids}(
x,
data,
na.groups = NULL,
groups = NULL,
as.table = TRUE,
theme = mice.theme(),
allow.multiple = TRUE,
outer = TRUE,
drop.unused.levels = lattice::lattice.getOption("drop.unused.levels"),
panel = lattice::lattice.getOption("panel.stripplot"),
default.prepanel = lattice::lattice.getOption("prepanel.default.stripplot"),
jitter.data = TRUE,
horizontal = FALSE,
...,
subscripts = TRUE,
subset = TRUE
)
}
\arguments{
\item{x}{A \code{mids} object, typically created by \code{mice()} or
\code{mice.mids()}.}
\item{data}{Formula that selects the data to be plotted. This argument
follows the \pkg{lattice} rules for \emph{formulas}, describing the primary
variables (used for the per-panel display) and the optional conditioning
variables (which define the subsets plotted in different panels) to be used
in the plot.
The formula is evaluated on the complete data set in the \code{long} form.
Legal variable names for the formula include \code{names(x$data)} plus the
two administrative factors \code{.imp} and \code{.id}.
\bold{Extended formula interface:} The primary variable terms (both the LHS
\code{y} and RHS \code{x}) may consist of multiple terms separated by a
\sQuote{+} sign, e.g., \code{y1 + y2 ~ x | a * b}. This formula would be
taken to mean that the user wants to plot both \code{y1 ~ x | a * b} and
\code{y2 ~ x | a * b}, but with the \code{y1 ~ x} and \code{y2 ~ x} in
\emph{separate panels}. This behavior differs from standard \pkg{lattice}.
\emph{Only combine terms of the same type}, i.e. only factors or only
numerical variables. Mixing numerical and categorical data occasionally
produces odds labeling of vertical axis.
For convenience, in \code{stripplot()} and \code{bwplot} the formula
\code{y~.imp} may be abbreviated as \code{y}. This applies only to a single
\code{y}, and does not (yet) work for \code{y1+y2~.imp}.}
\item{na.groups}{An expression evaluating to a logical vector indicating
which two groups are distinguished (e.g. using different colors) in the
display. The environment in which this expression is evaluated in the
response indicator \code{is.na(x$data)}.
The default \code{na.group = NULL} contrasts the observed and missing data
in the LHS \code{y} variable of the display, i.e. groups created by
\code{is.na(y)}. The expression \code{y} creates the groups according to
\code{is.na(y)}. The expression \code{y1 & y2} creates groups by
\code{is.na(y1) & is.na(y2)}, and \code{y1 | y2} creates groups as
\code{is.na(y1) | is.na(y2)}, and so on.}
\item{groups}{This is the usual \code{groups} arguments in \pkg{lattice}. It
differs from \code{na.groups} because it evaluates in the completed data
\code{data.frame(complete(x, "long", inc=TRUE))} (as usual), whereas
\code{na.groups} evaluates in the response indicator. See
\code{\link[lattice]{xyplot}} for more details. When both \code{na.groups} and
\code{groups} are specified, \code{na.groups} takes precedence, and
\code{groups} is ignored.}
\item{as.table}{See \code{\link[lattice]{xyplot}}.}
\item{theme}{A named list containing the graphical parameters. The default
function \code{mice.theme} produces a short list of default colors, line
width, and so on. The extensive list may be obtained from
\code{trellis.par.get()}. Global graphical parameters like \code{col} or
\code{cex} in high-level calls are still honored, so first experiment with
the global parameters. Many setting consists of a pair. For example,
\code{mice.theme} defines two symbol colors. The first is for the observed
data, the second for the imputed data. The theme settings only exist during
the call, and do not affect the trellis graphical parameters.}
\item{allow.multiple}{See \code{\link[lattice]{xyplot}}.}
\item{outer}{See \code{\link[lattice]{xyplot}}.}
\item{drop.unused.levels}{See \code{\link[lattice]{xyplot}}.}
\item{panel}{See \code{\link[lattice]{xyplot}}.}
\item{default.prepanel}{See \code{\link[lattice]{xyplot}}.}
\item{jitter.data}{See \code{\link[lattice]{panel.xyplot}}.}
\item{horizontal}{See \code{\link[lattice]{xyplot}}.}
\item{\dots}{Further arguments, usually not directly processed by the
high-level functions documented here, but instead passed on to other
functions.}
\item{subscripts}{See \code{\link[lattice]{xyplot}}.}
\item{subset}{See \code{\link[lattice]{xyplot}}.}
}
\value{
The high-level functions documented here, as well as other high-level
Lattice functions, return an object of class \code{"trellis"}. The
\code{\link[lattice]{update.trellis}} method can be used to
subsequently update components of the object, and the
\code{\link[lattice]{print.trellis}} method (usually called by default)
will plot it on an appropriate plotting device.
}
\description{
Plotting methods for imputed data using \pkg{lattice}.
\code{stripplot} produces one-dimensional
scatterplots. The function
automatically separates the observed and imputed data. The
functions extend the usual features of \pkg{lattice}.
}
\details{
The argument \code{na.groups} may be used to specify (combinations of)
missingness in any of the variables. The argument \code{groups} can be used
to specify groups based on the variable values themselves. Only one of both
may be active at the same time. When both are specified, \code{na.groups}
takes precedence over \code{groups}.
Use the \code{subset} and \code{na.groups} together to plots parts of the
data. For example, select the first imputed data set by by
\code{subset=.imp==1}.
Graphical parameters like \code{col}, \code{pch} and \code{cex} can be
specified in the arguments list to alter the plotting symbols. If
\code{length(col)==2}, the color specification to define the observed and
missing groups. \code{col[1]} is the color of the 'observed' data,
\code{col[2]} is the color of the missing or imputed data. A convenient color
choice is \code{col=mdc(1:2)}, a transparent blue color for the observed
data, and a transparent red color for the imputed data. A good choice is
\code{col=mdc(1:2), pch=20, cex=1.5}. These choices can be set for the
duration of the session by running \code{mice.theme()}.
}
\note{
The first two arguments (\code{x} and \code{data}) are reversed
compared to the standard Trellis syntax implemented in \pkg{lattice}. This
reversal was necessary in order to benefit from automatic method dispatch.
In \pkg{mice} the argument \code{x} is always a \code{mids} object, whereas
in \pkg{lattice} the argument \code{x} is always a formula.
In \pkg{mice} the argument \code{data} is always a formula object, whereas in
\pkg{lattice} the argument \code{data} is usually a data frame.
All other arguments have identical interpretation.
}
\examples{
imp <- mice(boys, maxit = 1)
### stripplot, all numerical variables
\dontrun{
stripplot(imp)
}
### same, but with improved display
\dontrun{
stripplot(imp, col = c("grey", mdc(2)), pch = c(1, 20))
}
### distribution per imputation of height, weight and bmi
### labeled by their own missingness
\dontrun{
stripplot(imp, hgt + wgt + bmi ~ .imp,
cex = c(2, 4), pch = c(1, 20), jitter = FALSE,
layout = c(3, 1)
)
}
### same, but labeled with the missingness of wgt (just four cases)
\dontrun{
stripplot(imp, hgt + wgt + bmi ~ .imp,
na = wgt, cex = c(2, 4), pch = c(1, 20), jitter = FALSE,
layout = c(3, 1)
)
}
### distribution of age and height, labeled by missingness in height
### most height values are missing for those around
### the age of two years
### some additional missings occur in region WEST
\dontrun{
stripplot(imp, age + hgt ~ .imp | reg, hgt,
col = c(grDevices::hcl(0, 0, 40, 0.2), mdc(2)), pch = c(1, 20)
)
}
### heavily jitted relation between two categorical variables
### labeled by missingness of gen
### aggregated over all imputed data sets
\dontrun{
stripplot(imp, gen ~ phb, factor = 2, cex = c(8, 1), hor = TRUE)
}
### circle fun
stripplot(imp, gen ~ .imp,
na = wgt, factor = 2, cex = c(8.6),
hor = FALSE, outer = TRUE, scales = "free", pch = c(1, 19)
)
}
\references{
Sarkar, Deepayan (2008) \emph{Lattice: Multivariate Data
Visualization with R}, Springer.
van Buuren S and Groothuis-Oudshoorn K (2011). \code{mice}: Multivariate
Imputation by Chained Equations in \code{R}. \emph{Journal of Statistical
Software}, \bold{45}(3), 1-67. \doi{10.18637/jss.v045.i03}
}
\author{
Stef van Buuren
}
\keyword{hplot}
|