File: layer_stats.Rd

package info (click to toggle)
r-cran-ggplot2 3.5.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 9,944 kB
  • sloc: sh: 15; makefile: 5
file content (141 lines) | stat: -rw-r--r-- 6,781 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docs_layer.R
\name{layer_stats}
\alias{layer_stats}
\title{Layer statistical transformations}
\description{
In ggplot2, a plot is constructed by adding layers to it. A layer consists
of two important parts: the geometry (geoms), and statistical transformations
(stats). The 'stat' part of a layer is important because it performs a
computation on the data before it is displayed. Stats determine \emph{what} is
displayed, not \emph{how} it is displayed.

For example, if you add \code{\link[=stat_density]{stat_density()}} to a plot, a kernel density
estimation is performed, which can be displayed with the 'geom' part of a
layer. For many \verb{geom_*()} functions, \code{\link[=stat_identity]{stat_identity()}} is used,
which performs no extra computation on the data.
}
\section{Specifying stats}{
There are five ways in which the 'stat' part of a layer can be specified.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# 1. The stat can have a layer constructor
stat_density()

# 2. A geom can default to a particular stat
geom_density() # has `stat = "density"` as default

# 3. It can be given to a geom as a string
geom_line(stat = "density")

# 4. The ggproto object of a stat can be given
geom_area(stat = StatDensity)

# 5. It can be given to `layer()` directly:
layer(
  geom = "line",
  stat = "density",
  position = "identity"
)
}\if{html}{\out{</div>}}

Many of these ways are absolutely equivalent. Using
\code{stat_density(geom = "line")} is identical to using
\code{geom_line(stat = "density")}. Note that for \code{\link[=layer]{layer()}}, you need to
provide the \code{"position"} argument as well. To give stats as a string, take
the function name, and remove the \code{stat_} prefix, such that \code{stat_bin}
becomes \code{"bin"}.

Some of the more well known stats that can be used for the \code{stat} argument
are: \code{\link[=stat_density]{"density"}}, \code{\link[=stat_bin]{"bin"}},
\code{\link[=stat_count]{"count"}}, \code{\link[=stat_function]{"function"}} and
\code{\link[=stat_smooth]{"smooth"}}.
}

\section{Paired geoms and stats}{
Some geoms have paired stats. In some cases, like \code{\link[=geom_density]{geom_density()}}, it is
just a variant of another geom, \code{\link[=geom_area]{geom_area()}}, with slightly different
defaults.

In other cases, the relationship is more complex. In the case of boxplots for
example, the stat and the geom have distinct roles. The role of the stat is
to compute the five-number summary of the data. In addition to just
displaying the box of the five-number summary, the geom also provides display
options for the outliers and widths of boxplots. In such cases, you cannot
freely exchange geoms and stats: using \code{stat_boxplot(geom = "line")} or
\code{geom_area(stat = "boxplot")} give errors.

Some stats and geoms that are paired are:
\itemize{
\item \code{\link[=geom_violin]{geom_violin()}} and \code{\link[=stat_ydensity]{stat_ydensity()}}
\item \code{\link[=geom_histogram]{geom_histogram()}} and \code{\link[=stat_bin]{stat_bin()}}
\item \code{\link[=geom_contour]{geom_contour()}} and \code{\link[=stat_contour]{stat_contour()}}
\item \code{\link[=geom_function]{geom_function()}} and \code{\link[=stat_function]{stat_function()}}
\item \code{\link[=geom_bin_2d]{geom_bin_2d()}} and \code{\link[=stat_bin_2d]{stat_bin_2d()}}
\item \code{\link[=geom_boxplot]{geom_boxplot()}} and \code{\link[=stat_boxplot]{stat_boxplot()}}
\item \code{\link[=geom_count]{geom_count()}} and \code{\link[=stat_sum]{stat_sum()}}
\item \code{\link[=geom_density]{geom_density()}} and \code{\link[=stat_density]{stat_density()}}
\item \code{\link[=geom_density_2d]{geom_density_2d()}} and \code{\link[=stat_density_2d]{stat_density_2d()}}
\item \code{\link[=geom_hex]{geom_hex()}} and \code{\link[=stat_binhex]{stat_binhex()}}
\item \code{\link[=geom_quantile]{geom_quantile()}} and \code{\link[=stat_quantile]{stat_quantile()}}
\item \code{\link[=geom_smooth]{geom_smooth()}} and \code{\link[=stat_smooth]{stat_smooth()}}
}
}

\section{Using computed variables}{
As mentioned above, the role of stats is to perform computation on the data.
As a result, stats have 'computed variables' that determine compatibility
with geoms. These computed variables are documented in the
\strong{Computed variables} sections of the documentation, for example in
\code{\link[=stat_bin]{?stat_bin}}. While more thoroughly documented
in \code{\link[=after_stat]{after_stat()}}, it should briefly be mentioned that these computed stats
can be accessed in \code{\link[=aes]{aes()}}.

For example, the \code{\link[=stat_density]{?stat_density}} documentation states that,
in addition to a variable called \code{density}, the stat computes a variable
named \code{count}. Instead of scaling such that the area integrates to 1, the
\code{count} variable scales the computed density such that the values
can be interpreted as counts. If \code{stat_density(aes(y = after_stat(count)))}
is used, we can display these count-scaled densities instead of the regular
densities.

The computed variables offer flexibility in that arbitrary geom-stat pairings
can be made. While not necessarily recommended, \code{\link[=geom_line]{geom_line()}} \emph{can} be paired
with \code{stat = "boxplot"} if the line is instructed on how to use the boxplot
computed variables:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{ggplot(mpg, aes(factor(cyl))) +
  geom_line(
    # Stage gives 'displ' to the stat, and afterwards chooses 'middle' as
    # the y-variable to display
    aes(y = stage(displ, after_stat = middle),
        # Regroup after computing the stats to display a single line
        group = after_stat(1)),
    stat = "boxplot"
  )
}\if{html}{\out{</div>}}
}

\section{Under the hood}{
Internally, stats are represented as \code{\link[=ggproto]{ggproto}} classes that
occupy a slot in a layer. All these classes inherit from the parental
\code{\link{Stat}} ggproto object that orchestrates how stats work. Briefly, stats
are given the opportunity to perform computation either on the layer as a
whole, a facet panel, or on individual groups. For more information on
extending stats, see the \strong{Creating a new stat} section after
running \code{vignette("extending-ggplot2")}. Additionally, see the \strong{New stats}
section of the
\href{https://ggplot2-book.org/extensions.html#new-stats}{online book}.
}

\seealso{
For an overview of all stat layers, see the
\href{https://ggplot2.tidyverse.org/reference/index.html#stats}{online reference}.

How \link[=after_stat]{computed aesthetics} work.

Other layer documentation: 
\code{\link{layer}()},
\code{\link{layer_geoms}},
\code{\link{layer_positions}}
}
\concept{layer documentation}