1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404
|
.TH DATAFILE 5 "April 22, 2003"
.SH NAME
DATAFILE \- LINKAGE's DATAFILE
.SH DESCRIPTION
Descriptions of loci and other information are contained in
.BR DATAFILE (5).
The information in this file is divided into four parts
.IP 1. 5
general information on loci and locus order;
.IP 2. 5
description of loci;
.IP 3. 5
information on recombination;
.IP 4. 5
program-specific information.
.PP
In explaining the structure of
.BR DATAFILE (5)
we will use two concepts of
locus order. The first is the input order, or the order in which the
phenotypes corresponding to the loci appear in
.BR PEDFILE (5).
The second is chromosome order, or the physical order assumed
for the loci. The input order is fixed once
.BR PEDFILE (5)
is created, but the chromosome order can be changed to test various
hypotheses.
Various parameters such as recombination rates, gene frequencies,
penetrances, etc., are specified in the
.BR DATAFILE (5).
These refer to the initial values of these parameters. The analysis
programs can modify some of these values for specific purposes,
e.g. maximum likelihood estimation.
.SH EXAMPLE
Before we attempt to explain the format of various parts of the
DATAFILE, it is useful to consider a complete file as an example. The
following is the DATAFILE for three sex-linked loci, one of which is
Duchenne muscular dystrophy; creatine kinase measurements are
available for heterozygote testing in women:
3 0 1 5 << no loci, risk locus, sexlinked (if 1), program code
3 0.001 0.001 0 << mut locus, mut mal, mut fem, hap freq (if 1)
1 3 2 << order of loci
2 2 <<< binary factors, # alleles
5.00000E-01 5.00000E-01 << gene freqs
2 << number of binary factors
1 0
0 1 << allelic codes
2 2 <<< binary factors, # alleles
5.00000E-01 5.00000E-01 << gene freqs
2 << number of binary factors
1 0
0 1 << allelic codes
0 2 <<< quan, # alleles
9.99800E-01 2.00000E-04 << gene freqs
1 << number of traits
1.57000E+00 2.10000E+00 2.10000E+00 << genotype means
5.90000E-02 << variance
2.90000E+00 << multiplier for variance in heterozygotes
0 0 << sex difference (if 1) and interference (if 1)
0.1 0.1 << recombination values
1 0.5 0.5
The last line contains information for the
.BR mlink (1)
program; this is indicated by the program code 5 on the first
line. Other parameters are specified as indicated in the comments
following certain lines (indicated by << ). Comments are allowed on
some lines for easy interpretation of the file.
.SS "Loci and Locus Order"
The first two lines of DATAFILE contain information on a variety of
parameters, including the number of loci (nlocus), a risk locus
(risklocus), sex-linked or autosomal data (sexlink), a mutation locus
(mutsys) and mutation rates (mutmale and mutfem ), linkage
disequilibrium (disequil ), and a program code (nprogram). The first
two lines are followed by a third line giving the chromosome order for
the loci. The format is:
nlocus risklocus sexlink nprogram
mutsys mutmale mutfem disequil
(chromosome order)
Mutsys and the chromosome order of the loci must begin on new lines;
comments can follow at the end of each line. Nprogram is not used by
the LINKAGE programs, but is required for interfacing with the shell
program LCP. It is used to describe the program for which the file is
constructed. LCP can use files constructed for one program as input
for a different program. Therefore the datafile is not changed for
different programs when using LCP.
.TS
box center ;
c|l.
Variable Name Valid Values
_
nlocus T{
1 to maxlocus (as specified by a constant in the programs)
T}
_
risklocus T{
0 if risk is not to be calculated
T}
_
T{
disease locus number (input order) if risk is to be calculated
T}
_
sexlink T{
0 for autosomal data
T}
_
T{
1 for sex-linked data
T}
_
nprogram 1 CILINK
_
2 CMAP
_
3 ILINK
_
4 LINKMAP
_
5 MLINK
_
6 LODSCORE
_
7 CLODSCORE
_
mutsys T{
0 if mutation rates are zero
T}
_
T{
mutation locus number (input order) for non-zero mutation rates
T}
_
mutmale male mutation rate
_
mutfem female mutation rate
_
disequil T{
0 if loci are assumed to be in linkage equilibrium
T}
_
T{
1 if loci are in linkage disequilibrium
T}
.TE
When loci are in linkage equilibrium, allele frequencies must be given
under each locus description; otherwise, haplotype frequencies are
provided. When risk is calculated, a disease allele is provided in the
locus description for the "risklocus." As an example, consider the
analysis of 3 autosomal loci in the chromosome order 1 3 2. The first
three lines of the DATAFILE could be:
3 0 0 3 << no loci, risk locus, sexlinked (if 1), program code
3 0.1 0.1 0 << mut locus, mut mal, mut fem, haplotype freq (if 1)
1 3 2 << order of loci
The data are autosomal with mutation at the third locus.
.SS "Description of Loci"
The loci are described in the order in which they appear in the
.BR PEDFILE (5).
Assuming linkage equilibrium, the gene frequencies are specified as
part of the locus description (linkage disequilibrium will be
documented in a later version). The descriptions differ according to
the type of locus. A numeric code distinguishes each of the types:
.TS
box center;
n|l.
0 Quantitative variable
_
1 Affection status
_
2 Binary factors
_
3 Numbered alleles
.TE
The format for each locus type, assuming linkage equilibrium, is as follows:
.SS "Numbered alleles"
The locus description consists of two lines. The first gives the code
for numbered alleles and the total number of alleles. The second gives
the gene frequencies. For example:
3 2 << numbered alleles code, total number of alleles
0.5 0.5 << gene frequencies
specifies two alleles with equal gene frequencies.
.SS "Binary factors"
The first two lines are similar to those in the previous
example. After this the number of factors is specified on a separate
line, followed by one line for each allele specification. As an
example, consider the case of a recessive trait:
2 2 << binary factor code, number of alleles
0.999 0.001 << gene frequencies
2 << number of factors
1 1
0 1 << alleles
.SS "Affection status"
The number of liability classes replaces the number of factors, and
penetrances are given for each genotype in each class:
1 2 << affection status code, number of alleles
0.999 0.001 << gene frequencies
1 << number of liability classes
0.0 1.0 1.0 << penetrances
describes a fully penetrant, dominant disease locus. The genotypes are
in the order 11, 12, 22 where 1 is the first allele and 2 is the
second allele specified in the gene frequency list. For three alleles,
the genotype order is 11, 12, 13, 22, 23, 33. The same pattern is
followed for more alleles. To describe a similar locus, but with
reduced penetrance and two liability classes, use the following:
1 2 << affection status code, number of alleles
0.999 0.001 << gene frequencies
2 << number of liability classes
0.0 0.5 0.5
0.0 0.9 0.9 << penetrances
With sex-linked data, male penetrances must also be defined for each
allele. The following describes a sex-linked disease with 50%
penetrance in males:
1 2 << affection status code, number of alleles
0.999 0.001 << gene frequencies
1 << number of liability classes
0.0 0.0 1.0
0.0 0.5 << female followed by male penetrances
.SS "Quantitative trait"
Quantitative traits are described by a first line containing the
quantitative code (0) and the number of alleles, and a second line
with gene frequencies, as in the previous examples. These are followed
by lines indicating the number of quantitative variables, genotypic
means for each variable, a variance-covariance matrix, and a constant
that gives the ratio of variance-covariance in heterozygotes to
homozygotes.
For a single quantitative variable, the format is:
0 2 << quantitative variable code, number of alleles
0.999 0.001 << gene frequencies
1 << number of quantitative variables
10.0 12.0 14.0 << genotypic means
1.5 << variance
1.0 << multiplier for heterozygote variance
The genotypes are 1/1, 1/2 and 2/2, respectively, where allele 1 has
the frequency 0.999. For two quantitative variables, the description
is:
0 2 << quantitative variable code, number of alleles
0.999 0.001 << gene frequencies
2 << number of liability classes
10.0 12.0 14.0
-10.0 0.0 10.0 << genotypic means
1.5 10.0 100.0 << variance-covariance
1.0 << multiplier for heterozyg. variance-covariance
Only the upper triangle of the variance-covariance matrix is given;
the order is V11, V12, V13 ... V22, V23 ... etc. Here, the variance of
the first variable is 1.5, the covariance is 10.0, and the variance of
the second variable is 100.0. When describing the "risk locus," the
disease allele (risk allele) must be designated at the end of the
locus description. For example:
1 2 << affection status code, number of alleles
0.999 0.001 << gene frequencies
1 << number of liability classes
0.0 1.0 1.0 << penetrances
2 << risk allele
.SS "Recombination Information"
In addition to recombination rates, sex-differences and interference
must be specified in this section. Sex-difference options are
indicated by an integer variable that takes the following values:
.TS
box center;
n|l.
0 T{
no sex-difference
T}
_
1 T{
constant sex-difference (the ratio of female/male genetic distance is
the same in all intervals)
T}
_
2 T{
variable sex-difference (the female/male distance ratio can be
different in each interval)
T}
.TE
The interference option can take the following values:
.TS
box center;
n|l.
0 T{
no interference
T}
_
1 T{
interference without a mapping function
T}
_
2 T{
user-specified mapping function
T}
.TE
Interference (i.e. options 1 or 2) is allowed only in some analysis
programs with three loci. The programs, as distributed, contain
Kosambi interference as the user-specified mapping function.
First, consider a case without interference. When the sex-difference
is "0," one recombination rate is given for each of the nlocus-1
segments (see the complete example above). If the sex-difference
option is "1," the male recombination rates are given on one line, and
the female/male genetic distance is specified on the next line, e.g.:
1 0 << sex difference, interference
0.1 0.2 0.1 << male recombination
2.0 << female/male ratio of genetic distance
When the sex-difference option is "2", the male recombination rates
are followed on the next line by female recombination rates:
2 0 << sex difference, interference
0.1 0.2 0.1 << male recombination
0.2 0.1 0.2 << female recombination
Interference can be specified for three loci. With the interference
option 1, three recombination rates are given. These are the
recombination rates between adjacent loci in the two segments and the
recombination rate between the flanking loci. An example is:
1 1 << sex difference, interference
0.1 0.1 0.18 << male recombination
2.0 << female/male ratio of genetic distance
With the interference option 2, only the rates between the adjacent
loci are provided:
1 2 << sex difference, interference
0.1 0.1 << male recombination
2.0 << female/male ratio of genetic distance
.SS "Program-specific information"
The program-specific information consists of a series of lines at the
end of the
.BR DATAFILE (1)
describing which parameters should be varied iteratively by the
analysis programs.
.SH NOTES
The information contained herein was gleaned, often-times verbatim,
from
.UR http://linkage.rockefeller.edu/soft/linkage/
the LINKAGE User's Guide
.UE
on the web by kind permission of Jurg Ott, Ph.D.
.SH AUTHORS
Mark Lathrop and Jurg Ott.
.PP
This manual page was written by Elizabeth Barham
<lizzy@soggytrousers.net> for the Debian GNU/Linux distribution.
.SH WORLD-WIDE-WEB
.UR http://linkage.rockefeller.edu/soft/linkage/
http://linkage.rockefeller.edu/soft/linkage/
.UE
.SH SEE ALSO
.BR LINKAGE (5),
.BR PEDFILE (5),
.BR ilink (1),
.BR linkmap (1),
.BR lodscore (1),
.BR mlink (1),
and
.BR unknown (1).
|