File: datafile.5

package info (click to toggle)
fastlink 4.1P-fix95-3
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd, wheezy
  • size: 3,836 kB
  • ctags: 1,856
  • sloc: ansic: 29,878; makefile: 791; sh: 19
file content (404 lines) | stat: -rw-r--r-- 12,944 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
.TH DATAFILE 5 "April 22, 2003"
.SH NAME
DATAFILE \- LINKAGE's DATAFILE
.SH DESCRIPTION
Descriptions of loci and other information are contained in
.BR DATAFILE (5).
The information in this file is divided into four parts
.IP 1. 5
general information on loci and locus order;
.IP 2. 5
description of loci;
.IP 3. 5
information on recombination;
.IP 4. 5
program-specific information. 
.PP
In explaining the structure of
.BR DATAFILE (5)
we will use two concepts of
locus order. The first is the input order, or the order in which the
phenotypes corresponding to the loci appear in
.BR PEDFILE (5).
The second is chromosome order, or the physical order assumed
for the loci. The input order is fixed once 
.BR PEDFILE (5)
is created, but the chromosome order can be changed to test various
hypotheses.

Various parameters such as recombination rates, gene frequencies,
penetrances, etc., are specified in the
.BR DATAFILE (5).
These refer to the initial values of these parameters. The analysis
programs can modify some of these values for specific purposes,
e.g. maximum likelihood estimation.

.SH EXAMPLE

Before we attempt to explain the format of various parts of the
DATAFILE, it is useful to consider a complete file as an example. The
following is the DATAFILE for three sex-linked loci, one of which is
Duchenne muscular dystrophy; creatine kinase measurements are
available for heterozygote testing in women:

 3 0 1 5           << no loci, risk locus, sexlinked (if 1), program code
 3 0.001  0.001 0  << mut locus, mut mal, mut fem, hap freq (if 1)

 1 3 2             << order of loci

 2 2               <<< binary factors, # alleles
 5.00000E-01  5.00000E-01   << gene freqs
 2                 << number of binary factors
 1 0
 0 1               << allelic codes

 2 2               <<< binary factors, # alleles
 5.00000E-01   5.00000E-01   << gene freqs
 2                 << number of binary factors
 1 0
 0 1               << allelic codes

 0 2               <<< quan, # alleles
 9.99800E-01  2.00000E-04   << gene freqs
 1                 << number of traits
 1.57000E+00  2.10000E+00  2.10000E+00  << genotype means
 5.90000E-02       << variance
 2.90000E+00       << multiplier for variance in heterozygotes
 0 0               << sex difference (if 1) and interference (if 1)
 0.1  0.1          << recombination values
 1  0.5  0.5

The last line contains information for the
.BR mlink (1)
program; this is indicated by the program code 5 on the first
line. Other parameters are specified as indicated in the comments
following certain lines (indicated by << ). Comments are allowed on
some lines for easy interpretation of the file.

.SS "Loci and Locus Order"
The first two lines of DATAFILE contain information on a variety of
parameters, including the number of loci (nlocus), a risk locus
(risklocus), sex-linked or autosomal data (sexlink), a mutation locus
(mutsys) and mutation rates (mutmale and mutfem ), linkage
disequilibrium (disequil ), and a program code (nprogram). The first
two lines are followed by a third line giving the chromosome order for
the loci. The format is:

     nlocus    risklocus sexlink   nprogram
     mutsys    mutmale   mutfem    disequil
     (chromosome order)

Mutsys and the chromosome order of the loci must begin on new lines;
comments can follow at the end of each line. Nprogram is not used by
the LINKAGE programs, but is required for interfacing with the shell
program LCP. It is used to describe the program for which the file is
constructed. LCP can use files constructed for one program as input
for a different program. Therefore the datafile is not changed for
different programs when using LCP.

.TS
box center ;
c|l.
Variable Name	Valid Values
_
nlocus	T{
1 to maxlocus (as specified by a constant in the programs)
T}
_
risklocus	T{
0 if risk is not to be calculated
T}
	_
	T{
disease locus number (input order) if risk is to be calculated
T}
_
sexlink	T{
0 for autosomal data
T}
	_
	T{
1 for sex-linked data
T}
_
nprogram	1 CILINK
	_
	2 CMAP
	_
	3 ILINK
	_
	4 LINKMAP
	_
	5 MLINK
	_
	6 LODSCORE
	_
	7 CLODSCORE
_
mutsys	T{
0 if mutation rates are zero
T}
	_
	T{
mutation locus number (input order) for non-zero mutation rates
T}
_
mutmale	male mutation rate
_
mutfem	female mutation rate
_
disequil	T{
0 if loci are assumed to be in linkage equilibrium
T}
	_
	T{
1 if loci are in linkage disequilibrium
T}
.TE

When loci are in linkage equilibrium, allele frequencies must be given
under each locus description; otherwise, haplotype frequencies are
provided. When risk is calculated, a disease allele is provided in the
locus description for the "risklocus." As an example, consider the
analysis of 3 autosomal loci in the chromosome order 1 3 2. The first
three lines of the DATAFILE could be:

 3 0 0 3   << no loci, risk locus, sexlinked (if 1), program code
 3 0.1 0.1 0 << mut locus, mut mal, mut fem, haplotype freq (if 1)
 1 3 2       << order of loci

The data are autosomal with mutation at the third locus.
.SS "Description of Loci"
The loci are described in the order in which they appear in the
.BR PEDFILE (5).
Assuming linkage equilibrium, the gene frequencies are specified as
part of the locus description (linkage disequilibrium will be
documented in a later version). The descriptions differ according to
the type of locus. A numeric code distinguishes each of the types:

.TS
box center;
n|l.
0	Quantitative variable
_
1	Affection status
_
2	Binary factors
_
3	Numbered alleles
.TE

The format for each locus type, assuming linkage equilibrium, is as follows:
.SS "Numbered alleles"
The locus description consists of two lines. The first gives the code
for numbered alleles and the total number of alleles. The second gives
the gene frequencies. For example:

     3 2       << numbered alleles code, total number of alleles
     0.5  0.5  << gene frequencies

specifies two alleles with equal gene frequencies.

.SS "Binary factors"
The first two lines are similar to those in the previous
example. After this the number of factors is specified on a separate
line, followed by one line for each allele specification. As an
example, consider the case of a recessive trait:

     2 2                 << binary factor code, number of alleles
     0.999  0.001        << gene frequencies
     2                   << number of factors
     1 1
     0 1                 << alleles

.SS "Affection status"

The number of liability classes replaces the number of factors, and
penetrances are given for each genotype in each class:

     1 2                 << affection status code, number of alleles
     0.999  0.001        << gene frequencies
     1                   << number of liability classes
     0.0  1.0  1.0       << penetrances

describes a fully penetrant, dominant disease locus. The genotypes are
in the order 11, 12, 22 where 1 is the first allele and 2 is the
second allele specified in the gene frequency list. For three alleles,
the genotype order is 11, 12, 13, 22, 23, 33. The same pattern is
followed for more alleles. To describe a similar locus, but with
reduced penetrance and two liability classes, use the following:

     1 2                 << affection status code, number of alleles
     0.999  0.001        << gene frequencies
     2                   << number of liability classes
     0.0  0.5  0.5
     0.0  0.9  0.9       << penetrances

With sex-linked data, male penetrances must also be defined for each
allele. The following describes a sex-linked disease with 50%
penetrance in males:

     1 2                 << affection status code, number of alleles
     0.999  0.001        << gene frequencies
     1                   << number of liability classes
     0.0  0.0  1.0
     0.0  0.5            << female followed by male penetrances

.SS "Quantitative trait"

Quantitative traits are described by a first line containing the
quantitative code (0) and the number of alleles, and a second line
with gene frequencies, as in the previous examples. These are followed
by lines indicating the number of quantitative variables, genotypic
means for each variable, a variance-covariance matrix, and a constant
that gives the ratio of variance-covariance in heterozygotes to
homozygotes.

For a single quantitative variable, the format is:

     0  2                << quantitative variable code, number of alleles
     0.999  0.001        << gene frequencies
     1                   << number of quantitative variables
     10.0  12.0  14.0    << genotypic means
     1.5                 << variance
     1.0                 << multiplier for heterozygote variance

The genotypes are 1/1, 1/2 and 2/2, respectively, where allele 1 has
the frequency 0.999. For two quantitative variables, the description
is:

     0  2                << quantitative variable code, number of alleles
     0.999  0.001        << gene frequencies
     2                   << number of liability classes
     10.0   12.0   14.0
    -10.0    0.0   10.0  << genotypic means
    1.5  10.0  100.0     << variance-covariance
    1.0                  << multiplier for heterozyg. variance-covariance

Only the upper triangle of the variance-covariance matrix is given;
the order is V11, V12, V13 ... V22, V23 ... etc. Here, the variance of
the first variable is 1.5, the covariance is 10.0, and the variance of
the second variable is 100.0. When describing the "risk locus," the
disease allele (risk allele) must be designated at the end of the
locus description. For example:

     1  2                << affection status code, number of alleles
     0.999  0.001        << gene frequencies
     1                   << number of liability classes
     0.0  1.0  1.0       << penetrances
     2                   << risk allele

.SS "Recombination Information"

In addition to recombination rates, sex-differences and interference
must be specified in this section. Sex-difference options are
indicated by an integer variable that takes the following values:

.TS
box center;
n|l.
0	T{
no sex-difference
T}
_
1	T{
constant sex-difference (the ratio of female/male genetic distance is
the same in all intervals)
T}
_
2	T{
variable sex-difference (the female/male distance ratio can be
different in each interval)
T}
.TE

The interference option can take the following values:

.TS
box center;
n|l.
0	T{
no interference
T}
_
1	T{
interference without a mapping function
T}
_
2	T{
user-specified mapping function
T}
.TE

Interference (i.e. options 1 or 2) is allowed only in some analysis
programs with three loci. The programs, as distributed, contain
Kosambi interference as the user-specified mapping function.

First, consider a case without interference. When the sex-difference
is "0," one recombination rate is given for each of the nlocus-1
segments (see the complete example above). If the sex-difference
option is "1," the male recombination rates are given on one line, and
the female/male genetic distance is specified on the next line, e.g.:

     1  0                << sex difference, interference
     0.1  0.2  0.1       << male recombination
     2.0                 << female/male ratio of genetic distance

When the sex-difference option is "2", the male recombination rates
are followed on the next line by female recombination rates:

     2  0                << sex difference, interference
     0.1  0.2  0.1       << male recombination
     0.2  0.1  0.2       << female recombination

Interference can be specified for three loci. With the interference
option 1, three recombination rates are given. These are the
recombination rates between adjacent loci in the two segments and the
recombination rate between the flanking loci. An example is:

     1  1                << sex difference, interference
     0.1  0.1  0.18      << male recombination
     2.0                 << female/male ratio of genetic distance

With the interference option 2, only the rates between the adjacent
loci are provided:

     1  2                << sex difference, interference
     0.1  0.1            << male recombination
     2.0                 << female/male ratio of genetic distance

.SS "Program-specific information"
The program-specific information consists of a series of lines at the
end of the
.BR DATAFILE (1)
describing which parameters should be varied iteratively by the
analysis programs.
.SH NOTES
The information contained herein was gleaned, often-times verbatim,
from 
.UR http://linkage.rockefeller.edu/soft/linkage/
the LINKAGE User's Guide
.UE
on the web by kind permission of Jurg Ott, Ph.D.

.SH AUTHORS
Mark Lathrop and Jurg Ott.
.PP
This manual page was written by Elizabeth Barham
<lizzy@soggytrousers.net> for the Debian GNU/Linux distribution.

.SH WORLD-WIDE-WEB
.UR http://linkage.rockefeller.edu/soft/linkage/
http://linkage.rockefeller.edu/soft/linkage/
.UE

.SH SEE ALSO
.BR LINKAGE (5),
.BR PEDFILE (5),
.BR ilink (1),
.BR linkmap (1),
.BR lodscore (1),
.BR mlink (1),
and
.BR unknown (1).