1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
|
\name{stat_summary}
\alias{stat_summary}
\alias{StatSummary}
\title{stat\_summary}
\description{Summarise y values at every unique x}
\details{
stat\_summary allows for tremendous flexibilty in the specification of summary functions. The summary function can either operate on a data frame (with argument name data) or on a vector. A simple vector function is easiest to work with as you can return a single number, but is somewhat less flexible. If your summary function operates on a data.frame it should return a data frame with variables that the geom can use.
This page describes stat\_summary, see \code{\link{layer}} and \code{\link{qplot}} for how to create a complete plot from individual components.
}
\section{Aesthetics}{
The following aesthetics can be used with stat\_summary. Aesthetics are mapped to variables in the data with the aes function: \code{stat\_summary(aes(x = var))}
\itemize{
\item \code{x}: x position (\strong{required})
\item \code{y}: y position (\strong{required})
}
}
\usage{stat_summary(mapping = NULL, data = NULL, geom = "pointrange",
position = "identity", ...)}
\arguments{
\item{mapping}{mapping between variables and aesthetics generated by aes}
\item{data}{dataset used in this layer, if not specified uses plot dataset}
\item{geom}{geometric used by this layer}
\item{position}{position adjustment used by this layer}
\item{...}{other arguments}
}
\seealso{\itemize{
\item \code{\link{geom_errorbar}}: error bars
\item \code{\link{geom_pointrange}}: range indicated by straight line, with point in the middle
\item \code{\link{geom_linerange}}: range indicated by straight line
\item \code{\link{geom_crossbar}}: hollow bar with middle indicated by horizontal line
\item \code{\link{stat_smooth}}: for continuous analog
\item \url{http://had.co.nz/ggplot2/stat_summary.html}
}}
\value{A \code{\link{layer}}}
\examples{\dontrun{
# Basic operation on a small dataset
c <- qplot(cyl, mpg, data=mtcars)
c + stat_summary(fun.data = "mean_cl_boot", colour = "red")
p <- qplot(cyl, mpg, data = mtcars, stat="summary", fun.y = "mean")
p
# Don't use ylim to zoom into a summary plot - this throws the
# data away
p + ylim(15, 30)
# Instead use coord_cartesian
p + coord_cartesian(ylim = c(15, 30))
# You can supply individual functions to summarise the value at
# each x:
stat_sum_single <- function(fun, geom="point", ...) {
stat_summary(fun.y=fun, colour="red", geom=geom, size = 3, ...)
}
c + stat_sum_single(mean)
c + stat_sum_single(mean, geom="line")
c + stat_sum_single(median)
c + stat_sum_single(sd)
c + stat_summary(fun.y = mean, fun.ymin = min, fun.ymax = max,
colour = "red")
c + aes(colour = factor(vs)) + stat_summary(fun.y = mean, geom="line")
# Alternatively, you can supply a function that operates on a data.frame.
# A set of useful summary functions is provided from the Hmisc package:
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="red", geom=geom, width=0.2, ...)
}
c + stat_sum_df("mean_cl_boot")
c + stat_sum_df("mean_sdl")
c + stat_sum_df("mean_sdl", mult=1)
c + stat_sum_df("median_hilow")
# There are lots of different geoms you can use to display the summaries
c + stat_sum_df("mean_cl_normal")
c + stat_sum_df("mean_cl_normal", geom = "errorbar")
c + stat_sum_df("mean_cl_normal", geom = "pointrange")
c + stat_sum_df("mean_cl_normal", geom = "smooth")
# Summaries are much more useful with a bigger data set:
m <- ggplot(movies, aes(x=round(rating), y=votes)) + geom_point()
m2 <- m +
stat_summary(fun.data = "mean_cl_boot", geom = "crossbar",
colour = "red", width = 0.3)
m2
# Notice how the overplotting skews off visual perception of the mean
# supplementing the raw data with summary statisitcs is _very_ important
# Next, we'll look at votes on a log scale.
# Transforming the scale performs the transforming before the statistic.
# This means we're calculating the summary on the logged data
m2 + scale_y_log10()
# Transforming the coordinate system performs the transforming after the
# statistic. This means we're calculating the summary on the raw data,
# and stretching the geoms onto the log scale. Compare the widths of the
# standard errors.
m2 + coord_trans(y="log10")
}}
\author{Hadley Wickham, \url{http://had.co.nz/}}
\keyword{hplot}
|