1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399
|
\SweaveOpts{results=verbatim,keep.source=TRUE,include=FALSE}
%\VignetteIndexEntry{Follow-up data with the Epi package}
\documentclass[a4paper,twoside,12pt]{article}
\usepackage[english]{babel}
\usepackage{booktabs,rotating,graphicx,amsmath,verbatim,fancyhdr,Sweave}
\usepackage[colorlinks,linkcolor=red,urlcolor=blue]{hyperref}
\newcommand{\R}{\textsf{\bf R}}
\renewcommand{\topfraction}{0.95}
\renewcommand{\bottomfraction}{0.95}
\renewcommand{\textfraction}{0.1}
\renewcommand{\floatpagefraction}{0.9}
\DeclareGraphicsExtensions{.pdf,.jpg}
\setcounter{secnumdepth}{1}
\setcounter{tocdepth}{1}
\oddsidemargin 1mm
\evensidemargin 1mm
\textwidth 160mm
\textheight 230mm
\topmargin -5mm
\headheight 8mm
\headsep 5mm
\footskip 15mm
\begin{document}
\raggedleft
\pagestyle{empty}
\vspace*{0.1\textheight}
\Huge
{\bf Follow-up data with the\\ \texttt{Epi} package}
\noindent\rule[-1ex]{\textwidth}{5pt}\\[2.5ex]
\Large
Summer 2014
\vfill
\normalsize
\begin{tabular}{rl}
Michael Hills & Retired \\
& Highgate, London \\[1em]
Martyn Plummer & International Agency for Research on Cancer, Lyon\\
& \texttt{plummer@iarc.fr} \\[1em]
Bendix Carstensen & Steno Diabetes Center, Gentofte, Denmark\\
& \small \& Department of Biostatistics,
University of Copenhagen\\
& \normalsize \texttt{bxc@steno.dk} \\
& \url{www.pubhealth.ku.dk/~bxc}
\end{tabular}
\normalsize
\newpage
\raggedright
\parindent 3ex
\parskip 0ex
\tableofcontents
\cleardoublepage
\setcounter{page}{1}
\pagestyle{fancy}
\renewcommand{\sectionmark}[1]{\markboth{\thesection #1}{\thesection \ #1}}
\fancyhead[OL]{\sl Follow-up data with the \texttt{Epi} package.}
\fancyhead[ER]{\sl \rightmark}
\fancyhead[EL,OR]{\bf \thepage}
\fancyfoot{}
\renewcommand{\headrulewidth}{0.1pt}
<<>>=
library(Epi)
print( sessionInfo(), l=F )
@
\section{Follow-up data in the \texttt{Epi} package}
In the \texttt{Epi}-package, follow-up data is represented by adding
some extra variables to a dataframe. Such a dataframe is called a
\texttt{Lexis} object. The tools for handling follow-up data then use
the structure of this for special plots, tabulations etc.
Follow-up data basically consists of a time of entry, a time of exit
and an indication of the status at exit (normally either ``alive'' or
``dead''). Implicitly is also assumed a status \emph{during} the
follow-up (usually ``alive'').
\begin{figure}[htbp]
\centering
\setlength{\unitlength}{1pt}
\begin{picture}(210,70)(0,75)
%\scriptsize
\thicklines
\put( 0,80){\makebox(0,0)[r]{Age-scale}}
\put( 50,80){\line(1,0){150}}
\put( 50,80){\line(0,1){5}}
\put(100,80){\line(0,1){5}}
\put(150,80){\line(0,1){5}}
\put(200,80){\line(0,1){5}}
\put( 50,77){\makebox(0,0)[t]{35}}
\put(100,77){\makebox(0,0)[t]{40}}
\put(150,77){\makebox(0,0)[t]{45}}
\put(200,77){\makebox(0,0)[t]{50}}
\put( 0,115){\makebox(0,0)[r]{Follow-up}}
\put( 80,105){\makebox(0,0)[r]{\small Two}}
\put( 90,105){\line(1,0){87}}
\put( 90,100){\line(0,1){10}}
\put(100,100){\line(0,1){10}}
\put(150,100){\line(0,1){10}}
\put(180,105){\circle{6}}
\put( 95,110){\makebox(0,0)[b]{1}}
\put(125,110){\makebox(0,0)[b]{5}}
\put(165,110){\makebox(0,0)[b]{3}}
\put( 50,130){\makebox(0,0)[r]{\small One}}
\put( 60,130){\line(1,0){70}}
\put( 60,125){\line(0,1){10}}
\put(100,125){\line(0,1){10}}
\put(130,130){\circle*{6}}
\put( 80,135){\makebox(0,0)[b]{4}}
\put(115,135){\makebox(0,0)[b]{3}}
\end{picture}
\caption{\it Follow-up of two persons}
\label{fig:fu2}
\end{figure}
\section{Timescales}
A timescale is a variable that varies deterministically \emph{within}
each person during follow-up, \textit{e.g.}:
\begin{itemize}
\item Age
\item Calendar time
\item Time since treatment
\item Time since relapse
\end{itemize}
All timescales advance at the same pace, so the time followed is the
same on all timescales. Therefore, it suffices to use only the entry
point on each of the time scale, for example:
\begin{itemize}
\item Age at entry.
\item Date of entry.
\item Time since treatment (\emph{at} treatment this is 0).
\item Time since relapse (\emph{at} relapse this is 0)..
\end{itemize}
In the \texttt{Epi} package, follow-up in a cohort is represented in a
\texttt{Lexis} object. A \texttt{Lexis} object is a dataframe with a
bit of extra structure representing the follow-up. For the
\texttt{nickel} data we would construct a \texttt{Lexis} object by:
<<>>=
data( nickel )
nicL <- Lexis( entry = list( per=agein+dob,
age=agein,
tfh=agein-age1st ),
exit = list( age=ageout ),
exit.status = ( icd %in% c(162,163) )*1,
data = nickel )
@
The \texttt{entry} argument is a \emph{named} list with the entry
points on each of the timescales we want to use. It defines the names
of the timescales and the entry points. The \texttt{exit} argument
gives the exit time on \emph{one} of the timescales, so the name of
the element in this list must match one of the neames of the
\texttt{entry} list. This is sufficient, because the follow-up time on
all time scales is the same, in this case \texttt{ageout - agein}. Now
take a look at the result:
<<>>=
str( nickel )
str( nicL )
head( nicL )
@
The \texttt{Lexis} object \texttt{nicL} has a variable for each
timescale which is the entry point on this timescale. The follow-up
time is in the variable \texttt{lex.dur} (\textbf{dur}ation).
There is a \texttt{summary} function for \texttt{Lexis} objects that
list the numer of transitions and records as well as the total
follow-up time:
<<>>=
summary( nicL )
@
We defined the exit status to be death from lung cancer (ICD7
162,163), i.e. this variable is 1 if follow-up ended with a death from
this cause. If follow-up ended alive or by death from another cause,
the exit status is coded 0, i.e. as a censoring.
Note that the exit status is in the variable \texttt{lex.Xst}
(e\textbf{X}it \textbf{st}atus. The variable \texttt{lex.Cst} is the
state where the follow-up takes place (\textbf{C}urrent
\textbf{st}atus), in this case 0 (alive).
It is possible to get a visualization of the follow-up along the
timescales chosen by using the \texttt{plot} method for \texttt{Lexis}
objects. \texttt{nicL} is an object of \emph{class} \texttt{Lexis}, so
using the function \texttt{plot()} on it means that \R\ will look for
the function \texttt{plot.Lexis} and use this function.
<<nicL1,fig=TRUE>>=
plot( nicL )
@
The function allows a lot of control over the output, and a
\texttt{points.Lexis} function allows plotting of the endpoints of
follow-up:
<<nicL2,fig=TRUE>>=
par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6 )
plot( nicL, 1:2, lwd=1, col=c("blue","red")[(nicL$exp>0)+1],
grid=TRUE, lty.grid=1, col.grid=gray(0.7),
xlim=1900+c(0,90), xaxs="i",
ylim= 10+c(0,90), yaxs="i", las=1 )
points( nicL, 1:2, pch=c(NA,3)[nicL$lex.Xst+1],
col="lightgray", lwd=3, cex=1.5 )
points( nicL, 1:2, pch=c(NA,3)[nicL$lex.Xst+1],
col=c("blue","red")[(nicL$exp>0)+1], lwd=1, cex=1.5 )
@
The results of these two plotting commands are in figure \ref{fig:Lexis-diagram}.
\begin{figure}[tb]
\centering
\label{fig:Lexis-diagram}
\includegraphics[width=0.39\textwidth]{Follow-up-nicL1}
\includegraphics[width=0.59\textwidth]{Follow-up-nicL2}
\caption{\it Lexis diagram of the \texttt{nickel} dataset, left panel
the default version, the right one with bells
and whistles. The red lines are for persons with exposure$>0$, so it
is pretty evident that the oldest ones are the exposed part of the
cohort.}
\end{figure}
\section{Splitting the follow-up time along a timescale}
The follow-up time in a cohort can be subdivided by for example
current age. This is achieved by the \texttt{splitLexis} (note that it
is \emph{not} called \texttt{split.Lexis}). This requires that the
timescale and the breakpoints on this timescale are supplied. Try:
<<>>=
nicS1 <- splitLexis( nicL, "age", breaks=seq(0,100,10) )
summary( nicL )
summary( nicS1 )
@
So we see that the number of events and the amount of follow-up is the
same in the two datasets; only the number of records differ.
To see how records are split for each individual, it is useful to list
the results for a few individuals:
<<>>=
round( subset( nicS1, id %in% 8:10 ), 2 )
@
The resulting object, \texttt{nicS1}, is again a \texttt{Lexis}
object, and so follow-up may be split further along another
timescale. Try this and list the results for individuals 8, 9 and 10 again:
<<>>=
nicS2 <- splitLexis( nicS1, "tfh", breaks=c(0,1,5,10,20,30,100) )
round( subset( nicS2, id %in% 8:10 ), 2 )
@
If we want to model the effect of these timescales we will for each
interval use either the value of the left endpoint in each interval or
the middle. There is a function \texttt{timeBand} which returns these.
Try:
<<>>=
timeBand( nicS2, "age", "middle" )[1:20]
# For nice printing and column labelling use the data.frame() function:
data.frame( nicS2[,c("id","lex.id","per","age","tfh","lex.dur")],
mid.age=timeBand( nicS2, "age", "middle" ),
mid.tfh=timeBand( nicS2, "tfh", "middle" ) )[1:20,]
@
Note that these are the midpoints of the intervals defined by
\texttt{breaks=}, \emph{not} the midpoints of the actual follow-up
intervals. This is because the variable to be used in modelling must
be independent of the consoring and mortality pattern --- it should
only depend on the chosen grouping of the timescale.
\section{Splitting time at a specific date}
If we have a recording of the date of a specific event as for example
recovery or relapse, we may classify follow-up time as being before of
after this intermediate event. This is achieved with the function
\texttt{cutLexis}, which takes three arguments: the time point, the
timescale, and the value of the (new) state following the date.
Now we define the age for the nickel vorkers where the cumulative
exposure exceeds 50 exposure years:
<<>>=
subset( nicL, id %in% 8:10 )
agehi <- nicL$age1st + 50 / nicL$exposure
nicC <- cutLexis( data=nicL, cut=agehi, timescale="age",
new.state=2, precursor.states=0 )
subset( nicC, id %in% 8:10 )
@
(The \texttt{precursor.states=} argument is explained below).
Note that individual 6 has had his follow-up split at age 25 where 50
exposure-years were attained. This could also have been achieved in
the split dataset \texttt{nicS2} instead of \texttt{nicL}, try:
<<>>=
subset( nicS2, id %in% 8:10 )
agehi <- nicS2$age1st + 50 / nicS2$exposure
nicS2C <- cutLexis( data=nicS2, cut=agehi, timescale="age",
new.state=2, precursor.states=0 )
subset( nicS2C, id %in% 8:10 )
@
Note that follow-up subsequent to the event is classified as being
in state 2, but that the final transition to state 1 (death from lung
cancer) is preserved. This is the point of the \texttt{precursor.states=}
argument. It names the states (in this case 0, ``Alive'') that will be
over-witten by \texttt{new.state} (in this case state 2, ``High
exposure''). Clearly, state 1 (``Dead'') should not be updated even if
it is after the time where the persons moves to state 2. In other
words, only state 0 is a precursor to state 2, state 1 is always
subsequent to state 2.
Note if the intermediate event is to be used as a time-dependent
variable in a Cox-model, then \texttt{lex.Cst} should be used as the
time-dependent variable, and \texttt{lex.Xst==1} as the event.
\section{Competing risks --- multiple types of events}
If we want to consider death from lung cancer and death from other
causes as separate events we can code these as for example 1 and 2.
<<>>=
data( nickel )
nicL <- Lexis( entry = list( per=agein+dob,
age=agein,
tfh=agein-age1st ),
exit = list( age=ageout ),
exit.status = ( icd > 0 ) + ( icd %in% c(162,163) ),
data = nickel )
summary( nicL )
subset( nicL, id %in% 8:10 )
@
If we want to label the states, we can enter the names of these in the
\texttt{states} parameter, try for example:
<<>>=
nicL <- Lexis( entry = list( per=agein+dob,
age=agein,
tfh=agein-age1st ),
exit = list( age=ageout ),
exit.status = ( icd > 0 ) + ( icd %in% c(162,163) ),
data = nickel,
states = c("Alive","D.oth","D.lung") )
summary( nicL )
@
Note that the \texttt{Lexis} function automatically assumes that all
persons enter in the first level (given in the \texttt{states=}
argument)
When we cut at a date as in this case, the date where cumulative
exposure exceeds 50 exposure-years, we get the follow-up \emph{after}
the date classified as being in the new state if the exit
(\texttt{lex.Xst}) was to a state we defined as one of the
\texttt{precursor.states}:
<<>>=
nicL$agehi <- nicL$age1st + 50 / nicL$exposure
nicC <- cutLexis( data = nicL,
cut = nicL$agehi,
timescale = "age",
new.state = "HiExp",
precursor.states = "Alive" )
subset( nicC, id %in% 8:10 )
summary( nicC, scale=1000 )
@
Note that the persons-years is the same, but that the number of events
has changed. This is because events are now defined as any transition
from alive, including the transitions to \texttt{HiExp}.
Also note that (so far) it is necessary to specify the variable with
the cutpoints in full, using only \texttt{cut=agehi} would give an error.
\subsection{Subdivision of existing states}
It may be of interest to subdivide the states following the
intermediate event according to wheter the event has occurred or
not. That is done by the argument \texttt{split.states=TRUE}.
Moreover, it will also often be of interest to introduce a new
timescale indicating the time since intermediate event. This can be
done by the argument \texttt{new.scale=TRUE}, alternatively
\texttt{new.scale="tfevent"}, as illustrated here:
<<>>=
nicC <- cutLexis( data = nicL,
cut = nicL$agehi,
timescale = "age",
new.state = "Hi",
split.states=TRUE, new.scale=TRUE,
precursor.states = "Alive" )
subset( nicC, id %in% 8:10 )
summary( nicC, scale=1000 )
@
\section{Multiple events of the same type (recurrent events)}
Sometimes more events of the same type are recorded for each person and
one would then like to count these and put follow-up time in states accordingly.
Essentially, each set of cutpoints represents progressions from one
state to the next. Therefore the states should be numbered, and the
numbering of states subsequently occupied be increased accordingly.
This is a behaviour different from the one outlined above, and it is
achieved by the argument \texttt{count=TRUE} to
\texttt{cutLexis}. When \texttt{count} is set to \texttt{TRUE}, the
value of the arguments \texttt{new.state} and
\texttt{precursor.states} are ignored. Actually, when using the
argument \texttt{count=TRUE}, the function \texttt{countLexis} is
called, so an alternative is to use this directly.
\end{document}
|