File: aggre.Rd

package info (click to toggle)
r-cran-popepi 0.4.13%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,656 kB
  • sloc: sh: 13; makefile: 2
file content (193 lines) | stat: -rw-r--r-- 8,106 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregating.R
\name{aggre}
\alias{aggre}
\title{Aggregation of split \code{Lexis} data}
\usage{
aggre(
  lex,
  by = NULL,
  type = c("unique", "full"),
  sum.values = NULL,
  subset = NULL,
  verbose = FALSE
)
}
\arguments{
\item{lex}{a \code{Lexis} object split with e.g.
\verb{[Epi::splitLexis]} or \verb{[splitMulti]}}

\item{by}{variables to tabulate (aggregate) by.
\link[=flexible_argument]{Flexible input}, typically e.g.
\code{by = c("V1", "V2")}. See Details and Examples.}

\item{type}{determines output levels to which data is aggregated varying
from returning only rows with \code{pyrs > 0} (\code{"unique"}) to
returning all possible combinations of variables given in \code{aggre} even
if those combinations are not represented in data (\code{"full"});
see Details}

\item{sum.values}{optional: additional variables to sum by argument
\code{by}. \link[=flexible_argument]{Flexible input}, typically e.g.
\code{sum.values = c("V1", "V2")}}

\item{subset}{a logical condition to subset by before computations;
e.g. \code{subset = area \%in\% c("A", "B")}}

\item{verbose}{\code{logical}; if \code{TRUE}, the function returns timings
and some information useful for debugging along the aggregation process}
}
\value{
A long \code{data.frame} or \code{data.table} of aggregated person-years
(\code{pyrs}), numbers of subjects at risk (\code{at.risk}), and events
formatted \code{fromXtoY}, where \code{X} and \code{X} are states
transitioning from and to or states at the end of each \code{lex.id}'s
follow-up (implying \code{X} = \code{Y}). Subjects at risk are computed
in the beginning of an interval defined by any Lexis time scales and
mentioned in \code{by}, but events occur at any point within an interval.

When the data has been split along multiple time scales, the last
time scale mentioned in \code{by} is considered to be the survival time
scale with regard to computing events. Time lines cut short by the
extrema of non-survival-time-scales are considered to be censored
("transitions" from the current state to the current state).
}
\description{
Aggregates a split \code{Lexis} object by given variables
and / or expressions into a long-format table of person-years and
transitions / end-points. Automatic aggregation over time scales
by which data has been split if the respective time scales are mentioned
in the aggregation argument to e.g. intervals of calendar time, follow-up time
and/or age.
}
\details{
\strong{Basics}

\code{aggre} is intended for aggregation of split \code{Lexis} data only.
See \verb{[Epi::Lexis]} for forming \code{Lexis} objects by hand
and e.g. \verb{[Epi::splitLexis]}, \verb{[splitLexisDT]}, and
\verb{[splitMulti]} for splitting the data. \verb{[lexpand]}
may be used for simple data sets to do both steps as well as aggregation
in the same function call.

Here aggregation refers to computing person-years and the appropriate events
(state transitions and end points in status) for the subjects in the data.
Hence, it computes e.g. deaths (end-point and state transition) and
censorings (end-point) as well as events in a multi-state setting
(state transitions).

The result is a long-format \code{data.frame} or \code{data.table}
(depending on \code{options("popEpi.datatable")}; see \code{?popEpi})
with the columns \code{pyrs} and the appropriate transitions named as
\code{fromXtoY}, e.g. \code{from0to0} and \code{from0to1} depending
on the values of \code{lex.Cst} and \code{lex.Xst}.

\strong{The by argument}

The \code{by} argument determines the length of the table, i.e.
the combinations of variables to which data is aggregated.
\code{by} is relatively flexible, as it can be supplied as

\itemize{
\item{a character string vector, e.g. \code{c("sex", "area")},
naming variables existing in \code{lex}}
\item{an expression, e.g. \code{factor(sex, 0:1, c("m", "f"))}
using any variable found in \code{lex}}
\item{a list (fully or partially named) of expressions, e.g.
\verb{list(gender = factor(sex, 0:1, c("m", "f"), area)}}
}

Note that expressions effectively allow a variable to be supplied simply as
e.g. \code{by = sex} (as a symbol/name in R lingo).

The data is then aggregated to the levels of the given variables
or expression(s). Variables defined to be time scales in the supplied
\code{Lexis} are processed in a special way: If any are mentioned in the
\code{by} argument, intervals of them are formed based on the breaks
used to split the data: e.g. if \code{age} was split using the breaks
\code{c(0, 50, Inf)}, mentioning \code{age} in \code{by} leads to
creating the \code{age} intervals \verb{[0, 50)} and \verb{[50, Inf)}
and aggregating to them. The intervals are identified in the output
as the lower bounds of the appropriate intervals.

The order of multiple time scales mentioned in \code{by} matters,
as the last mentioned time scale is assumed to be a survival time scale
for when computing event counts. E.g. when the data is split by the breaks
\code{list(FUT = 0:5, CAL = c(2008,2010))}, time lines cut short at
\code{CAL = 2010} are considered to be censored, but time lines cut short at
\code{FUT = 5} are not. See Return.

\strong{Aggregation types (styles)}

It is almost always enough to aggregate the data to variable levels
that are actually represented in the data
(default \code{aggre = "unique"}; alias \code{"non-empty"}).
For certain uses it may be useful
to have also "empty" levels represented (resulting in some rows in output
with zero person-years and events); in these cases supplying
\code{aggre = "full"} (alias \code{"cartesian"}) causes \code{aggre}
to determine the Cartesian product of all the levels of the supplied
\code{by} variables or expressions and aggregate to them. As an example
of a Cartesian product, try

\code{merge(1:2, 1:5)}.
}
\examples{

## form a Lexis object
library(Epi)
data(sibr)
x <- sibr[1:10,]
x[1:5,]$sex <- 0 ## pretend some are male
x <- Lexis(data = x,
           entry = list(AGE = dg_age, CAL = get.yrs(dg_date)),
           exit = list(CAL = get.yrs(ex_date)),
           entry.status=0, exit.status = status)
x <- splitMulti(x, breaks = list(CAL = seq(1993, 2013, 5),
                                 AGE = seq(0, 100, 50)))

## these produce the same results (with differing ways of determining aggre)
a1 <- aggre(x, by = list(gender = factor(sex, 0:1, c("m", "f")),
             agegroup = AGE, period = CAL))

a2 <- aggre(x, by = c("sex", "AGE", "CAL"))

a3 <- aggre(x, by = list(sex, agegroup = AGE, CAL))

## returning also empty levels
a4 <- aggre(x, by = c("sex", "AGE", "CAL"), type = "full")

## computing also expected numbers of cases
x <- lexpand(sibr[1:10,], birth = bi_date, entry = dg_date,
             exit = ex_date, status = status \%in\% 1:2,
             pophaz = popmort, fot = 0:5, age = c(0, 50, 100))
x$d.exp <- with(x, lex.dur*pop.haz)
## these produce the same result
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = list(d.exp))
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = "d.exp")
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = d.exp)
## same result here with custom name
a5 <- aggre(x, by = c("sex", "age", "fot"),
             sum.values = list(expCases = d.exp))

## computing pohar-perme weighted figures
x$d.exp.pp <- with(x, lex.dur*pop.haz*pp)
a6 <- aggre(x, by = c("sex", "age", "fot"),
             sum.values = c("d.exp", "d.exp.pp"))
## or equivalently e.g. sum.values = list(expCases = d.exp, expCases.p = d.exp.pp).
}
\seealso{
\verb{[aggregate]} for a similar base R solution,
and \verb{[ltable]} for a \code{data.table} based aggregator. Neither
are directly applicable to split \code{Lexis} data.

Other aggregation functions: 
\code{\link{as.aggre}()},
\code{\link{lexpand}()},
\code{\link{setaggre}()},
\code{\link{summary.aggre}()}
}
\author{
Joonas Miettinen
}
\concept{aggregation functions}