File: Follow-up.rnw

package info (click to toggle)
r-cran-epi 2.7-1
links: PTS, VCS
area: main
in suites: stretch
size: 2,440 kB
ctags: 9
sloc: ansic: 376; sh: 16; makefile: 3
file content (399 lines) | stat: -rw-r--r-- 15,521 bytes
parent folder | download | duplicates (3)
\SweaveOpts{results=verbatim,keep.source=TRUE,include=FALSE}
%\VignetteIndexEntry{Follow-up data with the Epi package}
\documentclass[a4paper,twoside,12pt]{article}

\usepackage[english]{babel}
\usepackage{booktabs,rotating,graphicx,amsmath,verbatim,fancyhdr,Sweave}
\usepackage[colorlinks,linkcolor=red,urlcolor=blue]{hyperref}
\newcommand{\R}{\textsf{\bf R}}
\renewcommand{\topfraction}{0.95}
\renewcommand{\bottomfraction}{0.95}
\renewcommand{\textfraction}{0.1}
\renewcommand{\floatpagefraction}{0.9}
\DeclareGraphicsExtensions{.pdf,.jpg}
\setcounter{secnumdepth}{1}
\setcounter{tocdepth}{1}

\oddsidemargin 1mm
\evensidemargin 1mm
\textwidth 160mm
\textheight 230mm
\topmargin -5mm
\headheight 8mm
\headsep 5mm
\footskip 15mm

\begin{document}

\raggedleft
\pagestyle{empty}
\vspace*{0.1\textheight}
\Huge
{\bf Follow-up data with the\\ \texttt{Epi} package}
\noindent\rule[-1ex]{\textwidth}{5pt}\\[2.5ex]
\Large
Summer 2014
\vfill
\normalsize
\begin{tabular}{rl}
 Michael Hills & Retired \\
               & Highgate, London \\[1em]
Martyn Plummer & International Agency for Research on Cancer, Lyon\\
               & \texttt{plummer@iarc.fr} \\[1em]
Bendix Carstensen & Steno Diabetes Center, Gentofte, Denmark\\
                  & \small \& Department of Biostatistics,
                    University of Copenhagen\\
                  & \normalsize \texttt{bxc@steno.dk} \\
                  & \url{www.pubhealth.ku.dk/~bxc}
\end{tabular}
\normalsize
\newpage
\raggedright
\parindent 3ex
\parskip 0ex
\tableofcontents
\cleardoublepage
\setcounter{page}{1}
\pagestyle{fancy}
\renewcommand{\sectionmark}[1]{\markboth{\thesection #1}{\thesection \ #1}}
\fancyhead[OL]{\sl Follow-up data with the \texttt{Epi} package.}
\fancyhead[ER]{\sl \rightmark}
\fancyhead[EL,OR]{\bf \thepage}
\fancyfoot{}
\renewcommand{\headrulewidth}{0.1pt}

<<>>=
library(Epi)
print( sessionInfo(), l=F )
@

\section{Follow-up data in the \texttt{Epi} package}

In the \texttt{Epi}-package, follow-up data is represented by adding
some extra variables to a dataframe. Such a dataframe is called a
\texttt{Lexis} object. The tools for handling follow-up data then use
the structure of this for special plots, tabulations etc.

Follow-up data basically consists of a time of entry, a time of exit
and an indication of the status at exit (normally either ``alive'' or
``dead''). Implicitly is also assumed a status \emph{during} the
follow-up (usually ``alive'').

\begin{figure}[htbp]
  \centering
\setlength{\unitlength}{1pt}
\begin{picture}(210,70)(0,75)
%\scriptsize
\thicklines
 \put(  0,80){\makebox(0,0)[r]{Age-scale}}
 \put( 50,80){\line(1,0){150}}
 \put( 50,80){\line(0,1){5}}
 \put(100,80){\line(0,1){5}}
 \put(150,80){\line(0,1){5}}
 \put(200,80){\line(0,1){5}}
 \put( 50,77){\makebox(0,0)[t]{35}}
 \put(100,77){\makebox(0,0)[t]{40}}
 \put(150,77){\makebox(0,0)[t]{45}}
 \put(200,77){\makebox(0,0)[t]{50}}

 \put(  0,115){\makebox(0,0)[r]{Follow-up}}

 \put( 80,105){\makebox(0,0)[r]{\small Two}}
 \put( 90,105){\line(1,0){87}}
 \put( 90,100){\line(0,1){10}}
 \put(100,100){\line(0,1){10}}
 \put(150,100){\line(0,1){10}}
 \put(180,105){\circle{6}}
 \put( 95,110){\makebox(0,0)[b]{1}}
 \put(125,110){\makebox(0,0)[b]{5}}
 \put(165,110){\makebox(0,0)[b]{3}}

 \put( 50,130){\makebox(0,0)[r]{\small One}}
 \put( 60,130){\line(1,0){70}}
 \put( 60,125){\line(0,1){10}}
 \put(100,125){\line(0,1){10}}
 \put(130,130){\circle*{6}}
 \put( 80,135){\makebox(0,0)[b]{4}}
 \put(115,135){\makebox(0,0)[b]{3}}
\end{picture}
  \caption{\it Follow-up of two persons}
  \label{fig:fu2}
\end{figure}

\section{Timescales}

A timescale is a variable that varies deterministically \emph{within}
each person during follow-up, \textit{e.g.}:
\begin{itemize}
  \item Age
  \item Calendar time
  \item Time since treatment
  \item Time since relapse
\end{itemize}
All timescales advance at the same pace, so the time followed is the
same on all timescales. Therefore, it suffices to use only the entry
point on each of the time scale, for example:
\begin{itemize}
  \item Age at entry.
  \item Date of entry.
  \item Time since treatment (\emph{at} treatment this is 0).
  \item Time since relapse (\emph{at} relapse this is 0)..
\end{itemize}
In the \texttt{Epi} package, follow-up in a cohort is represented in a
\texttt{Lexis} object.  A \texttt{Lexis} object is a dataframe with a
bit of extra structure representing the follow-up. For the
\texttt{nickel} data we would construct a \texttt{Lexis} object by:
<<>>=
data( nickel )
nicL <- Lexis( entry = list( per=agein+dob,
                             age=agein,
                             tfh=agein-age1st ),
                exit = list( age=ageout ),
         exit.status = ( icd %in% c(162,163) )*1,
                data = nickel )
@
The \texttt{entry} argument is a \emph{named} list with the entry
points on each of the timescales we want to use. It defines the names
of the timescales and the entry points. The \texttt{exit} argument
gives the exit time on \emph{one} of the timescales, so the name of
the element in this list must match one of the neames of the
\texttt{entry} list. This is sufficient, because the follow-up time on
all time scales is the same, in this case \texttt{ageout - agein}. Now
take a look at the result:
<<>>=
str( nickel )
str( nicL )
head( nicL )
@
The \texttt{Lexis} object \texttt{nicL} has a variable for each
timescale which is the entry point on this timescale. The follow-up
time is in the variable \texttt{lex.dur} (\textbf{dur}ation).

There is a \texttt{summary} function for \texttt{Lexis} objects that
list the numer of transitions and records as well as the total
follow-up time:
<<>>=
summary( nicL )
@
We defined the exit status to be death from lung cancer (ICD7
162,163), i.e. this variable is 1 if follow-up ended with a death from
this cause. If follow-up ended alive or by death from another cause,
the exit status is coded 0, i.e. as a censoring.

Note that the exit status is in the variable \texttt{lex.Xst}
(e\textbf{X}it \textbf{st}atus. The variable \texttt{lex.Cst} is the
state where the follow-up takes place (\textbf{C}urrent
\textbf{st}atus), in this case 0 (alive).

It is possible to get a visualization of the follow-up along the
timescales chosen by using the \texttt{plot} method for \texttt{Lexis}
objects. \texttt{nicL} is an object of \emph{class} \texttt{Lexis}, so
using the function \texttt{plot()} on it means that \R\ will look for
the function \texttt{plot.Lexis} and use this function.
<<nicL1,fig=TRUE>>=
plot( nicL )
@
The function allows a lot of control over the output, and a
\texttt{points.Lexis} function allows plotting of the endpoints of
follow-up:
<<nicL2,fig=TRUE>>=
par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6 )
plot( nicL, 1:2, lwd=1, col=c("blue","red")[(nicL$exp>0)+1],
      grid=TRUE, lty.grid=1, col.grid=gray(0.7),
      xlim=1900+c(0,90), xaxs="i",
      ylim=  10+c(0,90), yaxs="i", las=1 )
points( nicL, 1:2, pch=c(NA,3)[nicL$lex.Xst+1],
        col="lightgray", lwd=3, cex=1.5 )
points( nicL, 1:2, pch=c(NA,3)[nicL$lex.Xst+1],
        col=c("blue","red")[(nicL$exp>0)+1], lwd=1, cex=1.5 )
@
The results of these two plotting commands are in figure \ref{fig:Lexis-diagram}.
\begin{figure}[tb]
\centering
\label{fig:Lexis-diagram}
\includegraphics[width=0.39\textwidth]{Follow-up-nicL1}
\includegraphics[width=0.59\textwidth]{Follow-up-nicL2}
\caption{\it Lexis diagram of the \texttt{nickel} dataset, left panel
  the default version, the right one with bells
  and whistles. The red lines are for persons with exposure$>0$, so it
  is pretty evident that the oldest ones are the exposed part of the
  cohort.}
\end{figure}

\section{Splitting the follow-up time along a timescale}

The follow-up time in a cohort can be subdivided by for example
current age. This is achieved by the \texttt{splitLexis} (note that it
is \emph{not} called \texttt{split.Lexis}). This requires that the
timescale and the breakpoints on this timescale are supplied. Try:
<<>>=
nicS1 <- splitLexis( nicL, "age", breaks=seq(0,100,10) )
summary( nicL )
summary( nicS1 )
@
So we see that the number of events and the amount of follow-up is the
same in the two datasets; only the number of records differ.

To see how records are split for each individual, it is useful to list
the results for a few individuals:
<<>>=
round( subset( nicS1, id %in% 8:10 ), 2 )
@
The resulting object, \texttt{nicS1}, is again a \texttt{Lexis}
object, and so follow-up may be split further along another
timescale. Try this and list the results for individuals 8, 9 and 10 again:
<<>>=
nicS2 <- splitLexis( nicS1, "tfh", breaks=c(0,1,5,10,20,30,100) )
round( subset( nicS2, id %in% 8:10 ), 2 )
@
If we want to model the effect of these timescales we will for each
interval use either the value of the left endpoint in each interval or
the middle. There is a function \texttt{timeBand} which returns these.
Try:
<<>>=
timeBand( nicS2, "age", "middle" )[1:20]
# For nice printing and column labelling use the data.frame() function:
data.frame( nicS2[,c("id","lex.id","per","age","tfh","lex.dur")],
            mid.age=timeBand( nicS2, "age", "middle" ),
            mid.tfh=timeBand( nicS2, "tfh", "middle" ) )[1:20,]
@
Note that these are the midpoints of the intervals defined by
\texttt{breaks=}, \emph{not} the midpoints of the actual follow-up
intervals. This is because the variable to be used in modelling must
be independent of the consoring and mortality pattern --- it should
only depend on the chosen grouping of the timescale.

\section{Splitting time at a specific date}

If we have a recording of the date of a specific event as for example
recovery or relapse, we may classify follow-up time as being before of
after this intermediate event. This is achieved with the function
\texttt{cutLexis}, which takes three arguments: the time point, the
timescale, and the value of the (new) state following the date.

Now we define the age for the nickel vorkers where the cumulative
exposure exceeds 50 exposure years:
<<>>=
subset( nicL, id %in% 8:10 )
agehi <- nicL$age1st + 50 / nicL$exposure
nicC <- cutLexis( data=nicL, cut=agehi, timescale="age",
                  new.state=2, precursor.states=0 )
subset( nicC, id %in% 8:10 )
@
(The \texttt{precursor.states=} argument is explained below).
Note that individual 6 has had his follow-up split at age 25 where 50
exposure-years were attained. This could also have been achieved in
the split dataset \texttt{nicS2} instead of \texttt{nicL}, try:
<<>>=
subset( nicS2, id %in% 8:10 )
agehi <- nicS2$age1st + 50 / nicS2$exposure
nicS2C <- cutLexis( data=nicS2, cut=agehi, timescale="age",
                    new.state=2, precursor.states=0 )
subset( nicS2C, id %in% 8:10 )
@
Note that follow-up subsequent to the event is classified as being
in state 2, but that the final transition to state 1 (death from lung
cancer) is preserved. This is the point of the \texttt{precursor.states=}
argument. It names the states (in this case 0, ``Alive'') that will be
over-witten by \texttt{new.state} (in this case state 2, ``High
exposure''). Clearly, state 1 (``Dead'') should not be updated even if
it is after the time where the persons moves to state 2. In other
words, only state 0 is a precursor to state 2, state 1 is always
subsequent to state 2.

Note if the intermediate event is to be used as a time-dependent
variable in a Cox-model, then \texttt{lex.Cst} should be used as the
time-dependent variable, and \texttt{lex.Xst==1} as the event.

\section{Competing risks --- multiple types of events}

If we want to consider death from lung cancer and death from other
causes as separate events we can code these as for example 1 and 2.
<<>>=
data( nickel )
nicL <- Lexis( entry = list( per=agein+dob,
                             age=agein,
                             tfh=agein-age1st ),
                exit = list( age=ageout ),
         exit.status = ( icd > 0 ) + ( icd %in% c(162,163) ),
                data = nickel )
summary( nicL )
subset( nicL, id %in% 8:10 )
@
If we want to label the states, we can enter the names of these in the
\texttt{states} parameter, try for example:
<<>>=
nicL <- Lexis( entry = list( per=agein+dob,
                             age=agein,
                             tfh=agein-age1st ),
                exit = list( age=ageout ),
         exit.status = ( icd > 0 ) + ( icd %in% c(162,163) ),
                data = nickel,
              states = c("Alive","D.oth","D.lung") )
summary( nicL )
@

Note that the \texttt{Lexis} function automatically assumes that all
persons enter in the first level (given in the \texttt{states=}
argument)

When we cut at a date as in this case, the date where cumulative
exposure exceeds 50 exposure-years, we get the follow-up \emph{after}
the date classified as being in the new state if the exit
(\texttt{lex.Xst}) was to a state we defined as one of the
\texttt{precursor.states}:
<<>>=
nicL$agehi <- nicL$age1st + 50 / nicL$exposure
nicC <- cutLexis( data = nicL,
                   cut = nicL$agehi,
             timescale = "age",
             new.state = "HiExp",
      precursor.states = "Alive" )
subset( nicC, id %in% 8:10 )
summary( nicC, scale=1000 )
@
Note that the persons-years is the same, but that the number of events
has changed. This is because events are now defined as any transition
from alive, including the transitions to \texttt{HiExp}.

Also note that (so far) it is necessary to specify the variable with
the cutpoints in full, using only \texttt{cut=agehi} would give an error.

\subsection{Subdivision of existing states}
It may be of interest to subdivide the states following the
intermediate event according to wheter the event has occurred or
not. That is done by the argument \texttt{split.states=TRUE}.

Moreover, it will also often be of interest to introduce a new
timescale indicating the time since intermediate event. This can be
done by the argument \texttt{new.scale=TRUE}, alternatively
\texttt{new.scale="tfevent"}, as illustrated here:
<<>>=
nicC <- cutLexis( data = nicL,
                   cut = nicL$agehi,
             timescale = "age",
             new.state = "Hi",
           split.states=TRUE, new.scale=TRUE,
      precursor.states = "Alive" )
subset( nicC, id %in% 8:10 )
summary( nicC, scale=1000 )
@

\section{Multiple events of the same type (recurrent events)}
Sometimes more events of the same type are recorded for each person and
one would then like to count these and put follow-up time in states accordingly.
Essentially, each set of cutpoints represents progressions from one
state to the next. Therefore the states should be numbered, and the
numbering of states subsequently occupied be increased accordingly.

This is a behaviour different from the one outlined above, and it is
achieved by the argument \texttt{count=TRUE} to
\texttt{cutLexis}. When \texttt{count} is set to \texttt{TRUE}, the
value of the arguments \texttt{new.state} and
\texttt{precursor.states} are ignored.  Actually, when using the
argument \texttt{count=TRUE}, the function \texttt{countLexis} is
called, so an alternative is to use this directly.

\end{document}