File: aggre.Rd

package info (click to toggle)
r-cran-popepi 0.4.13%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,656 kB
sloc: sh: 13; makefile: 2
file content (193 lines) | stat: -rw-r--r-- 8,106 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregating.R
\name{aggre}
\alias{aggre}
\title{Aggregation of split \code{Lexis} data}
\usage{
aggre(
  lex,
  by = NULL,
  type = c("unique", "full"),
  sum.values = NULL,
  subset = NULL,
  verbose = FALSE
)
}
\arguments{
\item{lex}{a \code{Lexis} object split with e.g.
\verb{[Epi::splitLexis]} or \verb{[splitMulti]}}

\item{by}{variables to tabulate (aggregate) by.
\link[=flexible_argument]{Flexible input}, typically e.g.
\code{by = c("V1", "V2")}. See Details and Examples.}

\item{type}{determines output levels to which data is aggregated varying
from returning only rows with \code{pyrs > 0} (\code{"unique"}) to
returning all possible combinations of variables given in \code{aggre} even
if those combinations are not represented in data (\code{"full"});
see Details}

\item{sum.values}{optional: additional variables to sum by argument
\code{by}. \link[=flexible_argument]{Flexible input}, typically e.g.
\code{sum.values = c("V1", "V2")}}

\item{subset}{a logical condition to subset by before computations;
e.g. \code{subset = area \%in\% c("A", "B")}}

\item{verbose}{\code{logical}; if \code{TRUE}, the function returns timings
and some information useful for debugging along the aggregation process}
}
\value{
A long \code{data.frame} or \code{data.table} of aggregated person-years
(\code{pyrs}), numbers of subjects at risk (\code{at.risk}), and events
formatted \code{fromXtoY}, where \code{X} and \code{X} are states
transitioning from and to or states at the end of each \code{lex.id}'s
follow-up (implying \code{X} = \code{Y}). Subjects at risk are computed
in the beginning of an interval defined by any Lexis time scales and
mentioned in \code{by}, but events occur at any point within an interval.

When the data has been split along multiple time scales, the last
time scale mentioned in \code{by} is considered to be the survival time
scale with regard to computing events. Time lines cut short by the
extrema of non-survival-time-scales are considered to be censored
("transitions" from the current state to the current state).
}
\description{
Aggregates a split \code{Lexis} object by given variables
and / or expressions into a long-format table of person-years and
transitions / end-points. Automatic aggregation over time scales
by which data has been split if the respective time scales are mentioned
in the aggregation argument to e.g. intervals of calendar time, follow-up time
and/or age.
}
\details{
\strong{Basics}

\code{aggre} is intended for aggregation of split \code{Lexis} data only.
See \verb{[Epi::Lexis]} for forming \code{Lexis} objects by hand
and e.g. \verb{[Epi::splitLexis]}, \verb{[splitLexisDT]}, and
\verb{[splitMulti]} for splitting the data. \verb{[lexpand]}
may be used for simple data sets to do both steps as well as aggregation
in the same function call.

Here aggregation refers to computing person-years and the appropriate events
(state transitions and end points in status) for the subjects in the data.
Hence, it computes e.g. deaths (end-point and state transition) and
censorings (end-point) as well as events in a multi-state setting
(state transitions).

The result is a long-format \code{data.frame} or \code{data.table}
(depending on \code{options("popEpi.datatable")}; see \code{?popEpi})
with the columns \code{pyrs} and the appropriate transitions named as
\code{fromXtoY}, e.g. \code{from0to0} and \code{from0to1} depending
on the values of \code{lex.Cst} and \code{lex.Xst}.

\strong{The by argument}

The \code{by} argument determines the length of the table, i.e.
the combinations of variables to which data is aggregated.
\code{by} is relatively flexible, as it can be supplied as

\itemize{
\item{a character string vector, e.g. \code{c("sex", "area")},
naming variables existing in \code{lex}}
\item{an expression, e.g. \code{factor(sex, 0:1, c("m", "f"))}
using any variable found in \code{lex}}
\item{a list (fully or partially named) of expressions, e.g.
\verb{list(gender = factor(sex, 0:1, c("m", "f"), area)}}
}

Note that expressions effectively allow a variable to be supplied simply as
e.g. \code{by = sex} (as a symbol/name in R lingo).

The data is then aggregated to the levels of the given variables
or expression(s). Variables defined to be time scales in the supplied
\code{Lexis} are processed in a special way: If any are mentioned in the
\code{by} argument, intervals of them are formed based on the breaks
used to split the data: e.g. if \code{age} was split using the breaks
\code{c(0, 50, Inf)}, mentioning \code{age} in \code{by} leads to
creating the \code{age} intervals \verb{[0, 50)} and \verb{[50, Inf)}
and aggregating to them. The intervals are identified in the output
as the lower bounds of the appropriate intervals.

The order of multiple time scales mentioned in \code{by} matters,
as the last mentioned time scale is assumed to be a survival time scale
for when computing event counts. E.g. when the data is split by the breaks
\code{list(FUT = 0:5, CAL = c(2008,2010))}, time lines cut short at
\code{CAL = 2010} are considered to be censored, but time lines cut short at
\code{FUT = 5} are not. See Return.

\strong{Aggregation types (styles)}

It is almost always enough to aggregate the data to variable levels
that are actually represented in the data
(default \code{aggre = "unique"}; alias \code{"non-empty"}).
For certain uses it may be useful
to have also "empty" levels represented (resulting in some rows in output
with zero person-years and events); in these cases supplying
\code{aggre = "full"} (alias \code{"cartesian"}) causes \code{aggre}
to determine the Cartesian product of all the levels of the supplied
\code{by} variables or expressions and aggregate to them. As an example
of a Cartesian product, try

\code{merge(1:2, 1:5)}.
}
\examples{

## form a Lexis object
library(Epi)
data(sibr)
x <- sibr[1:10,]
x[1:5,]$sex <- 0 ## pretend some are male
x <- Lexis(data = x,
           entry = list(AGE = dg_age, CAL = get.yrs(dg_date)),
           exit = list(CAL = get.yrs(ex_date)),
           entry.status=0, exit.status = status)
x <- splitMulti(x, breaks = list(CAL = seq(1993, 2013, 5),
                                 AGE = seq(0, 100, 50)))

## these produce the same results (with differing ways of determining aggre)
a1 <- aggre(x, by = list(gender = factor(sex, 0:1, c("m", "f")),
             agegroup = AGE, period = CAL))

a2 <- aggre(x, by = c("sex", "AGE", "CAL"))

a3 <- aggre(x, by = list(sex, agegroup = AGE, CAL))

## returning also empty levels
a4 <- aggre(x, by = c("sex", "AGE", "CAL"), type = "full")

## computing also expected numbers of cases
x <- lexpand(sibr[1:10,], birth = bi_date, entry = dg_date,
             exit = ex_date, status = status \%in\% 1:2,
             pophaz = popmort, fot = 0:5, age = c(0, 50, 100))
x$d.exp <- with(x, lex.dur*pop.haz)
## these produce the same result
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = list(d.exp))
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = "d.exp")
a5 <- aggre(x, by = c("sex", "age", "fot"), sum.values = d.exp)
## same result here with custom name
a5 <- aggre(x, by = c("sex", "age", "fot"),
             sum.values = list(expCases = d.exp))

## computing pohar-perme weighted figures
x$d.exp.pp <- with(x, lex.dur*pop.haz*pp)
a6 <- aggre(x, by = c("sex", "age", "fot"),
             sum.values = c("d.exp", "d.exp.pp"))
## or equivalently e.g. sum.values = list(expCases = d.exp, expCases.p = d.exp.pp).
}
\seealso{
\verb{[aggregate]} for a similar base R solution,
and \verb{[ltable]} for a \code{data.table} based aggregator. Neither
are directly applicable to split \code{Lexis} data.

Other aggregation functions: 
\code{\link{as.aggre}()},
\code{\link{lexpand}()},
\code{\link{setaggre}()},
\code{\link{summary.aggre}()}
}
\author{
Joonas Miettinen
}
\concept{aggregation functions}