File: Follow-up.rnw

package info (click to toggle)
r-cran-epi 2.7-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 2,440 kB
  • ctags: 9
  • sloc: ansic: 376; sh: 16; makefile: 3
file content (399 lines) | stat: -rw-r--r-- 15,521 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
\SweaveOpts{results=verbatim,keep.source=TRUE,include=FALSE}
%\VignetteIndexEntry{Follow-up data with the Epi package}
\documentclass[a4paper,twoside,12pt]{article}

\usepackage[english]{babel}
\usepackage{booktabs,rotating,graphicx,amsmath,verbatim,fancyhdr,Sweave}
\usepackage[colorlinks,linkcolor=red,urlcolor=blue]{hyperref}
\newcommand{\R}{\textsf{\bf R}}
\renewcommand{\topfraction}{0.95}
\renewcommand{\bottomfraction}{0.95}
\renewcommand{\textfraction}{0.1}
\renewcommand{\floatpagefraction}{0.9}
\DeclareGraphicsExtensions{.pdf,.jpg}
\setcounter{secnumdepth}{1}
\setcounter{tocdepth}{1}

\oddsidemargin 1mm
\evensidemargin 1mm
\textwidth 160mm
\textheight 230mm
\topmargin -5mm
\headheight 8mm
\headsep 5mm
\footskip 15mm

\begin{document}

\raggedleft
\pagestyle{empty}
\vspace*{0.1\textheight}
\Huge
{\bf Follow-up data with the\\ \texttt{Epi} package}
\noindent\rule[-1ex]{\textwidth}{5pt}\\[2.5ex]
\Large
Summer 2014
\vfill
\normalsize
\begin{tabular}{rl}
 Michael Hills & Retired \\
               & Highgate, London \\[1em]
Martyn Plummer & International Agency for Research on Cancer, Lyon\\
               & \texttt{plummer@iarc.fr} \\[1em]
Bendix Carstensen & Steno Diabetes Center, Gentofte, Denmark\\
                  & \small \& Department of Biostatistics,
                    University of Copenhagen\\
                  & \normalsize \texttt{bxc@steno.dk} \\
                  & \url{www.pubhealth.ku.dk/~bxc}
\end{tabular}
\normalsize
\newpage
\raggedright
\parindent 3ex
\parskip 0ex
\tableofcontents
\cleardoublepage
\setcounter{page}{1}
\pagestyle{fancy}
\renewcommand{\sectionmark}[1]{\markboth{\thesection #1}{\thesection \ #1}}
\fancyhead[OL]{\sl Follow-up data with the \texttt{Epi} package.}
\fancyhead[ER]{\sl \rightmark}
\fancyhead[EL,OR]{\bf \thepage}
\fancyfoot{}
\renewcommand{\headrulewidth}{0.1pt}

<<>>=
library(Epi)
print( sessionInfo(), l=F )
@

\section{Follow-up data in the \texttt{Epi} package}

In the \texttt{Epi}-package, follow-up data is represented by adding
some extra variables to a dataframe. Such a dataframe is called a
\texttt{Lexis} object. The tools for handling follow-up data then use
the structure of this for special plots, tabulations etc.

Follow-up data basically consists of a time of entry, a time of exit
and an indication of the status at exit (normally either ``alive'' or
``dead''). Implicitly is also assumed a status \emph{during} the
follow-up (usually ``alive'').

\begin{figure}[htbp]
  \centering
\setlength{\unitlength}{1pt}
\begin{picture}(210,70)(0,75)
%\scriptsize
\thicklines
 \put(  0,80){\makebox(0,0)[r]{Age-scale}}
 \put( 50,80){\line(1,0){150}}
 \put( 50,80){\line(0,1){5}}
 \put(100,80){\line(0,1){5}}
 \put(150,80){\line(0,1){5}}
 \put(200,80){\line(0,1){5}}
 \put( 50,77){\makebox(0,0)[t]{35}}
 \put(100,77){\makebox(0,0)[t]{40}}
 \put(150,77){\makebox(0,0)[t]{45}}
 \put(200,77){\makebox(0,0)[t]{50}}

 \put(  0,115){\makebox(0,0)[r]{Follow-up}}

 \put( 80,105){\makebox(0,0)[r]{\small Two}}
 \put( 90,105){\line(1,0){87}}
 \put( 90,100){\line(0,1){10}}
 \put(100,100){\line(0,1){10}}
 \put(150,100){\line(0,1){10}}
 \put(180,105){\circle{6}}
 \put( 95,110){\makebox(0,0)[b]{1}}
 \put(125,110){\makebox(0,0)[b]{5}}
 \put(165,110){\makebox(0,0)[b]{3}}

 \put( 50,130){\makebox(0,0)[r]{\small One}}
 \put( 60,130){\line(1,0){70}}
 \put( 60,125){\line(0,1){10}}
 \put(100,125){\line(0,1){10}}
 \put(130,130){\circle*{6}}
 \put( 80,135){\makebox(0,0)[b]{4}}
 \put(115,135){\makebox(0,0)[b]{3}}
\end{picture}
  \caption{\it Follow-up of two persons}
  \label{fig:fu2}
\end{figure}

\section{Timescales}

A timescale is a variable that varies deterministically \emph{within}
each person during follow-up, \textit{e.g.}:
\begin{itemize}
  \item Age
  \item Calendar time
  \item Time since treatment
  \item Time since relapse
\end{itemize}
All timescales advance at the same pace, so the time followed is the
same on all timescales. Therefore, it suffices to use only the entry
point on each of the time scale, for example:
\begin{itemize}
  \item Age at entry.
  \item Date of entry.
  \item Time since treatment (\emph{at} treatment this is 0).
  \item Time since relapse (\emph{at} relapse this is 0)..
\end{itemize}
In the \texttt{Epi} package, follow-up in a cohort is represented in a
\texttt{Lexis} object.  A \texttt{Lexis} object is a dataframe with a
bit of extra structure representing the follow-up. For the
\texttt{nickel} data we would construct a \texttt{Lexis} object by:
<<>>=
data( nickel )
nicL <- Lexis( entry = list( per=agein+dob,
                             age=agein,
                             tfh=agein-age1st ),
                exit = list( age=ageout ),
         exit.status = ( icd %in% c(162,163) )*1,
                data = nickel )
@
The \texttt{entry} argument is a \emph{named} list with the entry
points on each of the timescales we want to use. It defines the names
of the timescales and the entry points. The \texttt{exit} argument
gives the exit time on \emph{one} of the timescales, so the name of
the element in this list must match one of the neames of the
\texttt{entry} list. This is sufficient, because the follow-up time on
all time scales is the same, in this case \texttt{ageout - agein}. Now
take a look at the result:
<<>>=
str( nickel )
str( nicL )
head( nicL )
@
The \texttt{Lexis} object \texttt{nicL} has a variable for each
timescale which is the entry point on this timescale. The follow-up
time is in the variable \texttt{lex.dur} (\textbf{dur}ation).

There is a \texttt{summary} function for \texttt{Lexis} objects that
list the numer of transitions and records as well as the total
follow-up time:
<<>>=
summary( nicL )
@
We defined the exit status to be death from lung cancer (ICD7
162,163), i.e. this variable is 1 if follow-up ended with a death from
this cause. If follow-up ended alive or by death from another cause,
the exit status is coded 0, i.e. as a censoring.

Note that the exit status is in the variable \texttt{lex.Xst}
(e\textbf{X}it \textbf{st}atus. The variable \texttt{lex.Cst} is the
state where the follow-up takes place (\textbf{C}urrent
\textbf{st}atus), in this case 0 (alive).

It is possible to get a visualization of the follow-up along the
timescales chosen by using the \texttt{plot} method for \texttt{Lexis}
objects. \texttt{nicL} is an object of \emph{class} \texttt{Lexis}, so
using the function \texttt{plot()} on it means that \R\ will look for
the function \texttt{plot.Lexis} and use this function.
<<nicL1,fig=TRUE>>=
plot( nicL )
@
The function allows a lot of control over the output, and a
\texttt{points.Lexis} function allows plotting of the endpoints of
follow-up:
<<nicL2,fig=TRUE>>=
par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6 )
plot( nicL, 1:2, lwd=1, col=c("blue","red")[(nicL$exp>0)+1],
      grid=TRUE, lty.grid=1, col.grid=gray(0.7),
      xlim=1900+c(0,90), xaxs="i",
      ylim=  10+c(0,90), yaxs="i", las=1 )
points( nicL, 1:2, pch=c(NA,3)[nicL$lex.Xst+1],
        col="lightgray", lwd=3, cex=1.5 )
points( nicL, 1:2, pch=c(NA,3)[nicL$lex.Xst+1],
        col=c("blue","red")[(nicL$exp>0)+1], lwd=1, cex=1.5 )
@
The results of these two plotting commands are in figure \ref{fig:Lexis-diagram}.
\begin{figure}[tb]
\centering
\label{fig:Lexis-diagram}
\includegraphics[width=0.39\textwidth]{Follow-up-nicL1}
\includegraphics[width=0.59\textwidth]{Follow-up-nicL2}
\caption{\it Lexis diagram of the \texttt{nickel} dataset, left panel
  the default version, the right one with bells
  and whistles. The red lines are for persons with exposure$>0$, so it
  is pretty evident that the oldest ones are the exposed part of the
  cohort.}
\end{figure}

\section{Splitting the follow-up time along a timescale}

The follow-up time in a cohort can be subdivided by for example
current age. This is achieved by the \texttt{splitLexis} (note that it
is \emph{not} called \texttt{split.Lexis}). This requires that the
timescale and the breakpoints on this timescale are supplied. Try:
<<>>=
nicS1 <- splitLexis( nicL, "age", breaks=seq(0,100,10) )
summary( nicL )
summary( nicS1 )
@
So we see that the number of events and the amount of follow-up is the
same in the two datasets; only the number of records differ.

To see how records are split for each individual, it is useful to list
the results for a few individuals:
<<>>=
round( subset( nicS1, id %in% 8:10 ), 2 )
@
The resulting object, \texttt{nicS1}, is again a \texttt{Lexis}
object, and so follow-up may be split further along another
timescale. Try this and list the results for individuals 8, 9 and 10 again:
<<>>=
nicS2 <- splitLexis( nicS1, "tfh", breaks=c(0,1,5,10,20,30,100) )
round( subset( nicS2, id %in% 8:10 ), 2 )
@
If we want to model the effect of these timescales we will for each
interval use either the value of the left endpoint in each interval or
the middle. There is a function \texttt{timeBand} which returns these.
Try:
<<>>=
timeBand( nicS2, "age", "middle" )[1:20]
# For nice printing and column labelling use the data.frame() function:
data.frame( nicS2[,c("id","lex.id","per","age","tfh","lex.dur")],
            mid.age=timeBand( nicS2, "age", "middle" ),
            mid.tfh=timeBand( nicS2, "tfh", "middle" ) )[1:20,]
@
Note that these are the midpoints of the intervals defined by
\texttt{breaks=}, \emph{not} the midpoints of the actual follow-up
intervals. This is because the variable to be used in modelling must
be independent of the consoring and mortality pattern --- it should
only depend on the chosen grouping of the timescale.

\section{Splitting time at a specific date}

If we have a recording of the date of a specific event as for example
recovery or relapse, we may classify follow-up time as being before of
after this intermediate event. This is achieved with the function
\texttt{cutLexis}, which takes three arguments: the time point, the
timescale, and the value of the (new) state following the date.

Now we define the age for the nickel vorkers where the cumulative
exposure exceeds 50 exposure years:
<<>>=
subset( nicL, id %in% 8:10 )
agehi <- nicL$age1st + 50 / nicL$exposure
nicC <- cutLexis( data=nicL, cut=agehi, timescale="age",
                  new.state=2, precursor.states=0 )
subset( nicC, id %in% 8:10 )
@
(The \texttt{precursor.states=} argument is explained below).
Note that individual 6 has had his follow-up split at age 25 where 50
exposure-years were attained. This could also have been achieved in
the split dataset \texttt{nicS2} instead of \texttt{nicL}, try:
<<>>=
subset( nicS2, id %in% 8:10 )
agehi <- nicS2$age1st + 50 / nicS2$exposure
nicS2C <- cutLexis( data=nicS2, cut=agehi, timescale="age",
                    new.state=2, precursor.states=0 )
subset( nicS2C, id %in% 8:10 )
@
Note that follow-up subsequent to the event is classified as being
in state 2, but that the final transition to state 1 (death from lung
cancer) is preserved. This is the point of the \texttt{precursor.states=}
argument. It names the states (in this case 0, ``Alive'') that will be
over-witten by \texttt{new.state} (in this case state 2, ``High
exposure''). Clearly, state 1 (``Dead'') should not be updated even if
it is after the time where the persons moves to state 2. In other
words, only state 0 is a precursor to state 2, state 1 is always
subsequent to state 2.

Note if the intermediate event is to be used as a time-dependent
variable in a Cox-model, then \texttt{lex.Cst} should be used as the
time-dependent variable, and \texttt{lex.Xst==1} as the event.

\section{Competing risks --- multiple types of events}

If we want to consider death from lung cancer and death from other
causes as separate events we can code these as for example 1 and 2.
<<>>=
data( nickel )
nicL <- Lexis( entry = list( per=agein+dob,
                             age=agein,
                             tfh=agein-age1st ),
                exit = list( age=ageout ),
         exit.status = ( icd > 0 ) + ( icd %in% c(162,163) ),
                data = nickel )
summary( nicL )
subset( nicL, id %in% 8:10 )
@
If we want to label the states, we can enter the names of these in the
\texttt{states} parameter, try for example:
<<>>=
nicL <- Lexis( entry = list( per=agein+dob,
                             age=agein,
                             tfh=agein-age1st ),
                exit = list( age=ageout ),
         exit.status = ( icd > 0 ) + ( icd %in% c(162,163) ),
                data = nickel,
              states = c("Alive","D.oth","D.lung") )
summary( nicL )
@

Note that the \texttt{Lexis} function automatically assumes that all
persons enter in the first level (given in the \texttt{states=}
argument)

When we cut at a date as in this case, the date where cumulative
exposure exceeds 50 exposure-years, we get the follow-up \emph{after}
the date classified as being in the new state if the exit
(\texttt{lex.Xst}) was to a state we defined as one of the
\texttt{precursor.states}:
<<>>=
nicL$agehi <- nicL$age1st + 50 / nicL$exposure
nicC <- cutLexis( data = nicL,
                   cut = nicL$agehi,
             timescale = "age",
             new.state = "HiExp",
      precursor.states = "Alive" )
subset( nicC, id %in% 8:10 )
summary( nicC, scale=1000 )
@
Note that the persons-years is the same, but that the number of events
has changed. This is because events are now defined as any transition
from alive, including the transitions to \texttt{HiExp}.

Also note that (so far) it is necessary to specify the variable with
the cutpoints in full, using only \texttt{cut=agehi} would give an error.

\subsection{Subdivision of existing states}
It may be of interest to subdivide the states following the
intermediate event according to wheter the event has occurred or
not. That is done by the argument \texttt{split.states=TRUE}.

Moreover, it will also often be of interest to introduce a new
timescale indicating the time since intermediate event. This can be
done by the argument \texttt{new.scale=TRUE}, alternatively
\texttt{new.scale="tfevent"}, as illustrated here:
<<>>=
nicC <- cutLexis( data = nicL,
                   cut = nicL$agehi,
             timescale = "age",
             new.state = "Hi",
           split.states=TRUE, new.scale=TRUE,
      precursor.states = "Alive" )
subset( nicC, id %in% 8:10 )
summary( nicC, scale=1000 )
@

\section{Multiple events of the same type (recurrent events)}
Sometimes more events of the same type are recorded for each person and
one would then like to count these and put follow-up time in states accordingly.
Essentially, each set of cutpoints represents progressions from one
state to the next. Therefore the states should be numbered, and the
numbering of states subsequently occupied be increased accordingly.

This is a behaviour different from the one outlined above, and it is
achieved by the argument \texttt{count=TRUE} to
\texttt{cutLexis}. When \texttt{count} is set to \texttt{TRUE}, the
value of the arguments \texttt{new.state} and
\texttt{precursor.states} are ignored.  Actually, when using the
argument \texttt{count=TRUE}, the function \texttt{countLexis} is
called, so an alternative is to use this directly.

\end{document}