1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregating.R
\name{aggre}
\alias{aggre}
\title{Aggregation of split \code{Lexis} data}
\usage{
aggre(
lex,
by = NULL,
type = c("unique", "full"),
sum.values = NULL,
subset = NULL,
verbose = FALSE
)
}
\arguments{
\item{lex}{a \code{Lexis} object split with e.g.
\verb{[Epi::splitLexis]} or \verb{[splitMulti]}}
\item{by}{variables to tabulate (aggregate) by.
\link[=flexible_argument]{Flexible input}, typically e.g.
\code{by = c("V1", "V2")}. See Details and Examples.}
\item{type}{determines output levels to which data is aggregated varying
from returning only rows with \code{pyrs > 0} (\code{"unique"}) to
returning all possible combinations of variables given in \code{aggre} even
if those combinations are not represented in data (\code{"full"});
see Details}
\item{sum.values}{optional: additional variables to sum by argument
\code{by}. \link[=flexible_argument]{Flexible input}, typically e.g.
\code{sum.values = c("V1", "V2")}}
\item{subset}{a logical condition to subset by before computations;
e.g. \code{subset = area \%in\% c("A", "B")}}
\item{verbose}{\code{logical}; if \code{TRUE}, the function returns timings
and some information useful for debugging along the aggregation process}
}
\value{
A long \code{data.frame} or \code{data.table} of aggregated person-years
(\code{pyrs}), numbers of subjects at risk (\code{at.risk}), and events
formatted \code{fromXtoY}, where \code{X} and \code{X} are states
transitioning from and to or states at the end of each \code{lex.id}'s
follow-up (implying \code{X} = \code{Y}). Subjects at risk are computed
in the beginning of an interval defined by any Lexis time scales and
mentioned in \code{by}, but events occur at any point within an interval.
When the data has been split along multiple time scales, the last
time scale mentioned in \code{by} is considered to be the survival time
scale with regard to computing events. Time lines cut short by the
extrema of non-survival-time-scales are considered to be censored
("transitions" from the current state to the current state).
}
\description{
Aggregates a split \code{Lexis} object by given variables
and / or expressions into a long-format table of person-years and
transitions / end-points. Automatic aggregation over time scales
by which data has been split if the respective time scales are mentioned
in the aggregation argument to e.g. intervals of calendar time, follow-up time
and/or age.
}
\details{
\strong{Basics}
\code{aggre} is intended for aggregation of split \code{Lexis} data only.
See \verb{[Epi::Lexis]} for forming \code{Lexis} objects by hand
and e.g. \verb{[Epi::splitLexis]}, \verb{[splitLexisDT]}, and
\verb{[splitMulti]} for splitting the data. \verb{[lexpand]}
may be used for simple data sets to do both steps as well as aggregation
in the same function call.
Here aggregation refers to computing person-years and the appropriate events
(state transitions and end points in status) for the subjects in the data.
Hence, it computes e.g. deaths (end-point and state transition) and
censorings (end-point) as well as events in a multi-state setting
(state transitions).
The result is a long-format \code{data.frame} or \code{data.table}
(depending on \code{options("popEpi.datatable")}; see \code{?popEpi})
with the columns \code{pyrs} and the appropriate transitions named as
\code{fromXtoY}, e.g. \code{from0to0} and \code{from0to1} depending
on the values of \code{lex.Cst} and \code{lex.Xst}.
\strong{The by argument}
The \code{by} argument determines the length of the table, i.e.
the combinations of variables to which data is aggregated.
\code{by} is relatively flexible, as it can be supplied as
\itemize{
\item{a character string vector, e.g. \code{c("sex", "area")},
naming variables existing in \code{lex}}
\item{an expression, e.g. \code{factor(sex, 0:1, c("m", "f"))}
using any variable found in \code{lex}}
\item{a list (fully or partially named) of expressions, e.g.
\verb{list(gender = factor(sex, 0:1, c("m", "f"), area)}}
}
Note that expressions effectively allow a variable to be supplied simply as
e.g. \code{by = sex} (as a symbol/name in R lingo).
The data is then aggregated to the levels of the given variables
or expression(s). Variables defined to be time scales in the supplied
\code{Lexis} are processed in a special way: If any are mentioned in the
\code{by} argument, intervals of them are formed based on the breaks
used to split the data: e.g. if \code{age} was split using the breaks
\code{c(0, 50, Inf)}, mentioning \code{age} in \code{by} leads to
creating the \code{age} intervals \verb{[0, 50)} and \verb{[50, Inf)}
and aggregating to them. The intervals are identified in the output
as the lower bounds of the appropriate intervals.
The order of multiple time scales mentioned in \code{by} matters,
as the last mentioned time scale is assumed to be a survival time scale
for when computing event counts. E.g. when the data is split by the breaks
\code{list(FUT = 0:5, CAL = c(2008,2010))}, time lines cut short at
\code{CAL = 2010} are considered to be censored, but time lines cut short at
\code{FUT = 5} are not. See Return.
\strong{Aggregation types (styles)}
It is almost always enough to aggregate the data to variable levels
that are actually represented in the data
(default \code{aggre = "unique"}; alias \code{"non-empty"}).
For certain uses it may be useful
to have also "empty" levels represented (resulting in some rows in output
with zero person-years and events); in these cases supplying
\code{aggre = "full"} (alias \code{"cartesian"}) causes \code{aggre}
to determine the Cartesian product of all the levels of the supplied
\code{by} variables or expressions and aggregate to them. As an example
of a Cartesian product, try
\code{merge(1:2, 1:5)}.
}
\examples{
## form a Lexis object
library(Epi)
data(sibr)
x <- sibr[1:10,]
x[1:5,]$sex <- 0 ## pretend some are male
x <- Lexis(data = x,
entry = list(AGE = dg_age, CAL = get.yrs(dg_date)),
exit = list(CAL = get.yrs(ex_date)),
entry.status=0, exit.status = status)
x <- splitMulti(x, breaks = list(CAL = seq(1993, 2013, 5),
AGE = seq(0, 100, 50)))
## these produce the same results (with differing ways of determining aggre)
a1 <- aggre(x, by = list(gender = factor(sex, 0:1, c("m", "f")),
agegroup = AGE, period = CAL))
a2 <- aggre(x, by = c("sex", "AGE", "CAL"))
a3 <- aggre(x, by = list(sex, agegroup = AGE, CAL))
## returning also empty levels
a4 <- aggre(x, by = c("sex", "AGE", "CAL"), type = "full")
## computing also expected numbers of cases
x <- lexpand(sibr[1:10,], birth = bi_date, entry = dg_date,
exit = ex_date, status = status \%in\% 1:2,
pophaz = popmort, fot = 0:5, age = c(0, 50, 100))
x$d.exp <- with(x, lex.dur*pop.haz)
## these produce the same result
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = list(d.exp))
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = "d.exp")
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = d.exp)
## same result here with custom name
a5 <- aggre(x, by = c("sex", "age", "fot"),
sum.values = list(expCases = d.exp))
## computing pohar-perme weighted figures
x$d.exp.pp <- with(x, lex.dur*pop.haz*pp)
a6 <- aggre(x, by = c("sex", "age", "fot"),
sum.values = c("d.exp", "d.exp.pp"))
## or equivalently e.g. sum.values = list(expCases = d.exp, expCases.p = d.exp.pp).
}
\seealso{
\verb{[aggregate]} for a similar base R solution,
and \verb{[ltable]} for a \code{data.table} based aggregator. Neither
are directly applicable to split \code{Lexis} data.
Other aggregation functions:
\code{\link{as.aggre}()},
\code{\link{lexpand}()},
\code{\link{setaggre}()},
\code{\link{summary.aggre}()}
}
\author{
Joonas Miettinen
}
\concept{aggregation functions}
|