File: loki_doc.tex

package info (click to toggle)
loki 2.4.7.4-1
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k, sarge
  • size: 3,884 kB
  • ctags: 3,416
  • sloc: ansic: 38,653; yacc: 4,974; sh: 2,615; lex: 706; makefile: 368
file content (2187 lines) | stat: -rw-r--r-- 110,159 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
\documentclass[10pt,a4paper]{article}
\usepackage{setspace,epsfig,harvard,xspace}
\usepackage{amsmath}
\citationstyle{dcu}
\citationmode{abbr}
\pretolerance400
\tolerance800
\emergencystretch2em
\doublehyphendemerits5000
\hfuzz0pt
\leftskip0pt minus 1pt
\rightskip0pt minus 1pt
\parskip=6pt
\newcommand{\IE}{\textit{i.e.},\xspace}
\newcommand{\EG}{\textit{e.g.},\xspace}
\newcommand{\GT}{\ensuremath{\mathcal{G}}\xspace}
\newcommand{\Th}[1]{#1\textit{th}\xspace}
\newcommand{\MCMC}{Markov chain Monte Carlo}
\newcommand{\Prep}{\texttt{prep}\xspace}
\newcommand{\Loki}{\texttt{loki}\xspace}
\newcommand{\Lokiext}{\texttt{loki\_ext}\xspace}

\addtolength{\oddsidemargin}{-.5in}
\addtolength{\textwidth}{1in}
\addtolength{\topmargin}{-.5in}
\addtolength{\textheight}{1in}
\renewcommand{\refname}{References}
\renewcommand{\figurename}{Figure}
\parindent=0pt

\begin{document}
\thispagestyle{empty}
\begin{center}
\begin{Large}
\textbf{Loki 2.4.6}\\[.15in]
A package for multipoint linkage analysis on large pedigrees using
Reversible jump Markov chain Monte Carlo\\[.25in]
\end{Large}
\begin{large}
April 2003\\[4.5in]
\end{large}
\end{center}
Simon C Heath\\ 
Centre National de G\'enotypage\\
2 rue Gaston Cr\'emieux\\
CP 5721, 91057 Evry Cedex
France\\[.1in]
email: \verb+heath@cng.fr+\\
\newpage
\section{Overview}
The programs described here are an implementation of the methods described in
\citeasnoun{heath_97a} for Markov chain Monte Carlo (MCMC) segregation and linkage
analysis of quantitative traits using oligogenic models.  The programs are
intended for performing multipoint linkage analyses on large, possibly
complex, pedigrees.  In contrast to `normal' linkage packages, the exact
penetrance model and even the number of QTL affecting the trait do not have
to be specified \textit{a priori}.  Joint estimation of QTL number,
position, and effects is possible, avoiding the problems that can arise from
mis--specification of these parameters.  It should be noted that although
the aims are the same as with traditional linkage packages, the operation of
the programs and (more importantly) interpretation of the output are very
different.  This program does \emph{not} produce LOD scores - if you only
want LOD scores then you should look elsewhere.  One reason for this is that
the LOD score approach is less flexible than the one used here, particularly
when models with multiple trait loci are considered.  The other reason is
that LOD scores are computationally costly (and tricky) to reliably estimate
using MCMC.  The approach used here is to estimate the posterior probability
for any given chromosome region of at least 1 trait locus being in that
region (and possibly over a given size threshold).

It is also possible to perform affected only IBD sharing analyses.  Although
the same programs are used as for the parametric analyses, the way the
output is interpreted is quite different.  In addition, it is also possible
for Loki to estimate pairwise realized kinship coefficients, that is the
kinship coefficients for all pairs in the pedigree at particular genomic
positions conditional on linked markers. In fact, the values estimated are
twice the kinship coefficients, as this is normally what is required.  These
values are currently not used by Loki, though in the future it is planned to
use this mechanism for implementing variance component approaches for
linkage mapping. Loki has been used to produce kinship coefficient matrices
for variance component analyses using external programs \cite{visscher_99}.

There are two main programs, \Prep and \Loki, which respectively handle data
input and the MCMC analysis.  The data preparation program, \Prep, reads in
data from a wide variety of input formats, tests for consistency of pedigree
and marker data, recodes the data, and outputs standardized binary files for
the analysis program, \Loki.  Also included in the package are several small
programs and scripts which help in the interpretation of the output from
\Loki.  Throughout this document, I will refer to the individual programs as
\Prep and \Loki, and the package as a whole as Loki.

Also included with the loki\_2.4 distribution is a small collection of short
programs and perl scripts which I use to help analyze the output from Loki.
These are included in the hope that they might be of use to other people.
Feedback on the scripts (and on Loki as a whole) is welcome.  If the scripts
prove useful then they might be expanded to fit with people's requirements.
\section{Installation}
It is assumed that you have the latest Loki archive file, it has been
untarred, and you are in the top (loki/) directory.  The simplest way to
compile the package is to type:
\begin{verbatim}  
./configure
make
make tests
\end{verbatim}
The configure script attempts to guess settings that should work with your
system.  If the compilation does not work you could edit by hand the file
./config.h or the Makefiles (./Makefile, libsrc/Makefile, lokisrc/Makefile).
Otherwise, to specify a C compiler to use, and flags for the compiler set
the environment variables CC and CFLAGS before calling configure.  For
example, to tell configure to use the gcc compiler with optimization use
\begin{verbatim}
CC=gcc CFLAGS='-O3 -g' ./configure
\end{verbatim}
If the gcc compiler is used then, by default, optimization is turned on.  If
another compiler is used then CFLAGS should be set with the appropriate
flags to turn on optimization with your compiler (normally -O). It is highly
recommended that some form of optimization is used as the calculations
performed by Loki are very computationally intensive. After successful
compilation, \verb+make install+ can be run to move the programs to a bin
directory.  By default, programs are installed to the bin subdirectory of
the the loki/.  This behaviour can be altered by giving the --prefix option
to configure.  For example, to install the programs in /usr/local/bin,
invoke configure as: \verb+./configure --prefix=/usr/local+

If \verb+make tests+\ fails then this may or may not be serious because
differences can be caused due to differences in floating point
implementations between machines.  As far as possible this has been avoided,
but some problems still remain.  In particular, certain machine
architectures (such as Intel or Alpha) compute floating point with too much
accuracy, i.e., more accurate than specified by the IEEE standards.  To
avoid this, by default, when loki is compiled with gcc, the compiler flag
-ffloat-store is used which instructs the compiler to not use the extra
precision.  If you use a non-gcc compiler, then it may be possible to find
an equivalent switch for your compiler.  Of course, if you are not
interested in comparing runs on different machines then this will not apply.
Note that if you regularly use machines with different architectures, and
intend to start analyses with one machine and continue with a different
machine, it is imperative that the two machines give the same answers as the
test set (more about this later).

If the test fails, try running the tests by hand (cd into the test directory
and read the README file there) to see if the tests ran but gave different
output (possibly due to floating point differences between machines), or if
they failed to run at all.  In either case, please send me any error outputs
you receive.  I would like to hear about any problems that you have
compiling and running the programs on UNIX machines.  The only extensive
testing has been on Solaris 5.8 and FreeBSD 4.1, but the programs have been
compiled and run on a range on different UNIX boxes including OSF1 V4.0,
Irix 6.5, OpenBSD 2.7-3.2 and Linux 2.0.30/2.2.6/2.3.6 (see the file
\verb+README.known_systems+ for an up to date list).  My ability to help with
machine types I do not have access to is obviously limited. If people have
tried to get Loki compiled and running on other machines, I would be very
interested to hear about any problems and, of course, successes!

There are two stages to running an analysis: first \Prep is run on the original
datafiles.  The appropriate information is read from the datafiles and encoded
in a form suitable for the analysis program, \Loki.  A control file is required
for \Prep, and a parameter file for \Loki.  Information on how to write these
files is given in the following sections.

If control and parameter files for an analysis have been set up then running
the analysis can be accomplished by typing:
\begin{verbatim}
prep control_file
loki param_file
\end{verbatim}
Output from the run is written to the file \verb+loki.out+; interpretation of
this output is discussed in the section `Analysis of output'.
\section{Data preparation: \Prep}
Data preparation is performed by \Prep, which takes as input a control file
specifying the data files required and their format (Section
\ref{prep_control}.  \Prep reads in the appropriate data files, sorts and
recodes the data where required, and then outputs the data as a series of
files in a format suitable for \Loki.  Note that if different machine types
are in use, the data files output by \Prep can be freely exchanged between
machines.  It is, for example, possible to run \Prep on a Sun Sparc
workstation, and then run \Loki on an Intel based Linux workstation. There
are some restrictions on this: (1) use of a machine with non--standard
floating--point routines (either too inaccurate or too accurate); (2) use of
machines with different sized `longs'.  This last situation, which could
arise with a mix of Intel and Alpha based machines, will only cause a
problem if the dataset, for reasons of size and complexity, can only be run
on the machine with larger longs of the machines.  For example, a complex
pedigree may run on an Alpha machine but not on an Intel machine due to the
smaller size of longs on the Intel machine.  Note that in general time and
memory requirements prevent this becoming an issue.

Genotype data are checked by \Prep for consistency - a logical peeling
operation is performed on each locus to determine if the likelihood of the
data for that locus is zero.  By default, the program stops if an error is
detected; if there is no error message produced then the likelihood of the
data given the pedigree structure is positive, and is therefore consistent.
Note that this is true whether or not the pedigree contains loops.  If an
error occurs then information about the family where the error was found is
normally displayed.  It is possible, although rare, for a genotype
inconsistency to be detected only at a late stage in the process.  In this
case it unfortunately may not be possible for the program to pinpoint where
the error occurs.  This situation can, however, only arise in complex
(looped) pedigrees.  It is also possible to tell \Prep to try and identify
and delete genotypes which appear to be wrong, rather than stopping at the
first inconsistency. For more details on this, see the \verb+correct_errors+
flag.

When the problem can be localized to a family, \Prep prints out the members
of the family with their observed genotypes and inferences made by \Prep
about unobserved genotypes in previous stages of the process.  This normally
makes it straightforward to spot what genotypes(s) are causing the problem.
An example error message might look like this (note that `*' is shorthand
for any allele):
\begin{verbatim}
DoFamily(): Inconsistent family data for locus D1GS32
Father: 220 [*,*]
(*,*)
Mother: 218 [*,*]
(7,*) (*,7)
 -> 114 [7,7]
    (7,7) 
 -> 330 [7,7]
    (7,7)
 -> 378 [7,8]
    (8,7) (7,8)
 -> 1854 [1,10]
    (10,1) (1,10)
 -> 1988 [7,10]
    (10,7) (7,10)
\end{verbatim}
The parents have no genotype information, but from an earlier operation it
is known 1 of the mother's alleles has to be a `7'.  The five offspring then
follow, with their observed genotypes (in square brackets), and their
inferred ordered genotypes (in round brackets).

Very complex pedigrees (with many interlocking loops) may cause the program
to take so long that it appears to have `hung', or to terminate with an
error message.  Although the peeling calculations could in theory be
bypassed at this stage, they are required for the operation of \Loki, so a
pedigree too large to be processed by \Prep is too large to be analyzed with
\Loki, and must be split up.

Datafiles in a wide variety of formats can be read in, the main requirement
being that files should contain nothing apart from data \IE datafiles should
not also contain commands.  The datafile should contain a number of
\emph{records} with the format of a record, and the splitting of the file
into records, being determined by the control file (described below).
Multiple input files can be used (\IE separate pedigree, phenotype data, and
genotype data files), with the one restriction being that each record must
contain an ID, allowing records to be linked across files.

Recoding of the datafiles is \emph{not} necessary, \Prep can
handle alpha-numeric codes of (almost) any length for pedigree IDs, allele
codes, and discrete environmental factors.  This has 2 main benefits - it
simplifies the use of \Prep, and allows errors to be reported using the
original codes, which makes tracking down of, for example, genotype errors,
much simpler.

The number of input records, pedigree size etc. are determined by \Prep and
do not need to be specified.  Limits for the pedigree size, no. of markers
etc.\ are determined by available memory, the one exception to this being
the maximum number of alleles per marker segregating within a component,
which is limited by the number of bits in a long int (typically 32).  In
many modern systems this can often be extended to 64, but with a possible
performance (and portability) penalty.

\Prep can handle pedigrees containing loops, as long as the pedigree is not
too complex (what constitutes too complex depends on the size of computer
and the amount of patience you have).  The user does not have to do anything
to tell \Prep about loops, it will attempt to deal with them itself.  The
sampling algorithms used in \Prep required that a single locus peeling
(likelihood) calculation is carried out at each locus (marker + trait).
Multiple loci are not peeled simultaneously, so \Prep is not affected by the
number of loci in the way that traditional linkage packages are.

Sometimes a pedigree is too large to be processed by \Prep, not because of
time or memory constraints, but because the indexing system used by \Prep to
keep track of possible genotypes has to fit into a long int.  A pedigree
with a small loops can cause prep to fail if a marker with many segregating
alleles is analyzed. It is possible to avoid this, in the same way as for
the restriction on maximum number of segregating alleles per component, by
increasing the size of index.  Several compilers (including gcc) support
long long int variables, which are commonly 64 bit.  Compiling Loki with the
flag \verb+-DUSE_LONGLONG+\ will do this. Note that this can cause portability
problems with the output files from \Loki and \Prep.  It can also have a
speed penalty, so only use if strictly necessary.
\section{Markov chain Monte Carlo analysis: \Loki}
The actual analysis is performed by \Loki, using the output files from \Prep.
An MCMC analysis consists of a (normally large) number of \emph{sampling
iterations}; each iteration involves updating (or trying to update) every
parameter in the model.  In the context of the analyses performed by \Loki,
parameters are the ordered genotypes (of markers and trait loci), allele
frequencies, trait loci positions and effects, number of trait loci,
covariate effects, and variance parameters.  Inference about any or all of
these parameters is possible, but Loki is intended for use as a linkage
analysis package so attention is focused on inferring the position and size
of trait loci.

The output files from \Prep, which are used by \Loki, are a series of
(mostly) text files called \verb+loki.opt+, \verb+loki.dat+, \verb+loki.gen+
and \verb+loki.nrm+, the last only if a polygenic effect is being fitted.
These files contain information on basic program options (\verb+loki.opt+);
phenotype data, model information, and factor recoding data
(\verb+loki.dat+); genotype data, allele recoding information, peeling
sequences (\verb+loki.gen+); non-zero elements of the inverse of the
Numerator Relationship Matrix (NRM). Note that if the FILTER command is used
in the control file for \Prep, \verb+loki.dat+, \verb+loki.gen+ and
\verb+loki.nrm+ will have been passed through the filter (normally for
compression purposes). There is additional information that \Loki needs to
perform its analysis such as the map positions of markers, how long to run
the analysis for etc., and that information is provided by a parameter file.
In fact \emph{only} the marker positions needs to be provided in the
parameter file, but if further information about, for example, allele
frequencies is available, then this can be given as well.  The format of
the parameter file for \Loki is given in Section \ref{loki_param}, and
analysis of the output files generated by \Loki is discussed in Section
\ref{loki_files}

\section{Control file for \Prep}
\label{prep_control}
The order of commands in the control file and the case of the commands are
not normally important.  New lines are mostly ignored, so long commands can
be split over several lines.  It should be noted that the entire control file is
read in and processed \emph{before} any of the datafiles are read.  Commands
affecting how files are read in can therefore have an effect no matter where
they occur in the control file. Comments can be included anywhere in a
control file.  If a `\#' occurs (outside of quotes) then the rest of the line
is ignored.  In addition, C style comments can be used for multiline
commenting, for example:
\begin{verbatim}
/* 
 * Control file for Dataset Ch1z - November 1997
 *
 */

File "phenotypes",id,sx,age,y # read in phenotype data
\end{verbatim}
\subsection{FILE command}
From each input file, \Prep reads in all the records it can find, each
record consisting of a (possibly variable) number of fields.  To be able
to do this, the control file must have information on (a) where to find the
files, (b) how to split the files into records, and (c) how to split the
records into fields.  This information is supplied by the FILE command
(although it can be modified by other commands).  The format of the FILE
command is FILE [optional format clause] "filename",var$_1$,var$_2$,\ldots
,var$_n$.  If we ignore the format clause for the moment, the number of
variables following the filename determine how many fields \Prep expects to
find in a record.
\begin{figure}[hb]
\caption{\label{fig:1}Test dataset}
\begin{verbatim}
1 0 0
2 0 0
3 1 2
4 1 2
\end{verbatim}
\end{figure}
If our datafile was a triplet file with pedigree information (\IE
individual, father, mother) as in Example \ref{fig:1}. then a FILE statement
for this file would be:
\begin{verbatim}
FILE "filename",id,father,mother
\end{verbatim}
\IE each record has 3 data items (id, father, mother).  If the file contains
unwanted fields then these can be skipped by either specifying dummy
variable names in the FILE statement, or simply by missing out 1 or more variable
names corresponding to the unwanted positions.  For example, the statement:
\begin{verbatim}
FILE "filename",id,father,,mother
\end{verbatim}
would read fields 1,2, and 4 into id, father, and mother respectively, with
field 3 being skipped.

By default, \Prep assumes that there is 1 record per line.  If there are
extra fields on a line then they are ignored, if there are too few fields
then the variables which could not be read in are marked as missing for that
record.  This behaviour may be changed by setting the RS (record separator)
variable as in the following 3 examples:
\begin{verbatim}
RS = "\n"
RS = ":"
FILE [RS = ""] "filename",id,father,mother
\end{verbatim}
The first example sets RS to be equal to the newline character; this is
actually the default setting for RS.  The second example is for the case
where the input file has records separated by colons.  The last example has
RS within the format clause for a FILE statement; this sets
RS for that FILE statement to be equal to the null string, which causes
\Prep to ignore newlines when reading in the data.  Note that the first 2
examples change the default value of RS for all files, whereas the last
example only affects the particular FILE statement the RS command occurs in.

There is an analogous variable, FS, for controlling how \Prep splits records
into fields.  By default \Prep interprets whitespace (space, tab, and
newline characters) as field separators.  As with RS, setting FS to a single
character allows the use of that character as a field separator (a single
space has a special meaning - it gives the default behaviour of using
whitespace as a separator).  If FS is set to a longer string then this is
interpreted as a regular expression.  Note that support for regular
expressions is not currently available on all platforms.  Note that for tab
separated fields use \verb+FS="\t"+.

Another field separator variable, GS, is used for controlling how genotypes
are interpreted.  Normally, individuals alleles are located in separate
fields, and so this variable is not used.  This is not always the case,
however, and sometimes the two alleles occur in the same field separated by
a space or a slash, \IE 183/185.  In this case, GS should be set in a
similar way to RS and FS to `/'.  

To read in the comma separated data file in Figure \ref{fig:2a} where alleles are
separated by a slash, the following file command would be used:
\verb+FILE [FS=",",GS="/"] id,mark1,mark2+
\begin{figure}
\caption{\label{fig:2a}Test dataset}
\begin{verbatim}
1,181/183,223/223
2,181/183,200/210
3,,205/210
4,185/187,
\end{verbatim}
\end{figure}
\subsubsection{Fixed format}
Occasionally data files come in a form where the starting and ending columns
of each field are known, but there might not be any separator between
fields.  For this type of data the format of the datafile must be specified
more precisely.  For example, consider the datafile in Figure \ref{fig:2}.
\begin{figure}
\caption{\label{fig:2}Test dataset}
\begin{verbatim}
    1    0    0 1 80
    2    0    0 2 77
    3    0    0 1 64
    4    0    0 2 63
    6    0    0 1 67
    8    0    0 1 56
  461    1    2 1 80
  462    1    2 2 80
  463    3    4 1 72
  464    3    4 2 63
  678    8  462 2 64 26.23 75.79143.66 42.77267.60
  679    6  464 1 53
  680    6  464 2 64 40.67 97.68157.29 45.94304.39
  681    6  464 1 58 31.98 77.29115.52 42.60197.15
\end{verbatim}
\end{figure}
This datafile has 10 fields, individual, father, mother, sx, age, and 5
quantitative measurements.  The last 5 fields have no intervening spaces, so
they have to be read in fixed format.  This could be done with the
statement:
\begin{verbatim}
FILE [3(5),1x,1,3,5(6)] "filename",
id,father,mother,sx,age,q1,q2,q3,q4,q5
\end{verbatim}
Note that long commands can be split - newlines are (mostly) ignored by
\Prep.  The format clause tells \Prep to read in 3 fields of 5 characters
each, skip 1 character, read in 1 field of 1 character, 1 of 3 characters,
and 5 of 6 characters each.  An empty field is treated as missing data.
\subsubsection{Skipping lines}
Normally, \Prep will read in every non-blank line in the data file.
Sometimes, however, it is desirable that some lines are skipped at the top
of the data file (for example, if a header is present).  This can be easily
done using the FILE command:
\begin{verbatim}
FILE [skip=1] "filename",id,father,mother,sx,q1,q2
\end{verbatim}
\subsubsection{Further processing}
To facilitate further processing of the datafile before it is read into
\Prep, it is possible to pipe the datafile through external programs (such
as sed, awk, or perl).  The following example uses zcat to uncompress the
datafile `on the fly':
\begin{verbatim}
FILE SHELL("zcat filename.gz"),x1,x2,x3,x4
\end{verbatim}
\subsubsection{Reading in genotype data}
\label{genotype_data}. Only codominant markers are supported by \Prep.
Genotype data for an individual normally consists of a field for each allele
(\IE 2 per marker), however \Prep can also read in data formats where both
alleles are in a single field. In a given record, 1 or both alleles may be
missing. Allele codes, as with all identifiers in Loki, are alpha-numeric
and can be any length.  Data for different markers (and even for the 2
alleles at a single marker) may be in separate datafiles.  Genotype data for
an individual can be duplicated; such duplicated records are checked for
consistency.
\subsection{PEDIGREE command}
The PEDIGREE command is used to specify which variables contain the pedigree
information.  There are two forms for the command, depending on whether id
codes are unique or if different families may use the same id codes (\IE the
format used by the LINKAGE package).  Example PEDIGREE commands are given below:
\begin{verbatim}
PEDIGREE id,father,mother  
PEDIGREE family,id,father,mother  
\end{verbatim}
There \emph{must} be exactly 1 PEDIGREE command in a control file, and a
PEDIGREE command \emph{must} have 3 or 4 variables as arguments.  A data record
may have missing father and mother values, but must have a valid id record
(and a family record if used).
An individual's pedigree record may occur multiple times; as with genotype
data, if this happens the duplicated records are checked for consistency.
When a family record is present, whenever the original ids are output from
\Prep or \Loki, the family name is also given.  How this is done differs
depending on the context.  For example, consider the following pedigree file.
\begin{verbatim}
1 1 0 0
1 2 0 0
1 3 1 2
1 4 1 2
2 1 0 0
2 2 0 0
2 3 0 0
2 4 2 3
2 5 2 1
\end{verbatim}
If there is an occasion to output an isolated id (\IE there is a data
record error for individual 3 in family 2), it will be output as \verb+[2]:3+
with the family name in square brackets before the id code.  If, however,
triplet information is output then this will be be as 4 columns - family,
id, father, mother.
\subsection{SEX command}\label{sex1}
The SEX command is used to specify a variable that indicates the sex of
individuals.  The command is optional, but if included allows for more
extensive pedigree checking.  If this command is used then information on
sex must be available for all pedigree members (note that the sex of parents
can be inferred from the pedigree).  Examples of this command appear below:
\begin{verbatim}
SEX sx 1,2
SEX sx ``M'',''F''
\end{verbatim}
The last two entries on the line give the factor values for males and
females respectively.  Only 1 SEX command can appear in a control file.
\subsection{GROUP command}\label{group1}
The GROUP command is used to specify a variable that indicates which genetic
group each founder belongs to, see also Section \ref{group2}.  Different
genetic groups can have different allele frequencies.  This is therefore
intended for situations where it is known that certain founders come from
different populations.  If this option is specified then every founder must
have a record for the group factor.  If non-founders are given a group
designation then this is ignored.  Only 1 GROUP command can appear in a
control file. The format of the command is simply:
\begin{verbatim}
GROUP gen_group
\end{verbatim}
\subsection{MISSING command}
In the example datafiles above, `0' has been used (as is common) to mean a
missing ID (\IE founder individuals have no parents in the datafile, so
their parents IDs are given as 0.  This may not always be true, or different
missing codes may be used for different variables.  Previous versions of
\Prep used no default missing code, and only treated blank fields as
missing.  Versions after 2.4.5 have a default where `0' is the missing code
for all categorical variables (pedigree, genotype and other discrete
variables).  This will be overridden if any Missing command is found.  To
revert to the previous behaviour with no default Missing codes, see
\ref{set_command}.

Examples of valid Missing commands are shown below:
\begin{verbatim}
MISSING "0"  
MISSING "*",x,y
MISSING ["PG"] "0"
MISSING ["R"] -99
\end{verbatim}
The first example tells \Prep that `0' should be treated as a missing value
for all variables.  The second example states that, in addition, `*' should
be treated as a missing value for variables x and y.  If the missing value
is not in quotes then a numeric comparison will be made between the missing
value and the input data, otherwise a string comparison will be made.  An
exception to this is that a numeric comparison will always be used for variables 
which have been explicitly declared to be numeric (using the INTEGER or REAL
commands).  Note that when comparing two real valued variables, a match is
declared if the absolute difference is <1.0e-12, to avoid rounding problems.

The third and fourth examples state that `0' should be used as a missing
values for pedigree and genotype variables, and `-99' should be used for
real valued variables.  For this later forms of the missing commands, the
possible letter codes are:
\begin{itemize}
\item P - Pedigree variables
\item G - Genotype variables
\item F - Factors (discrete variables - includes P and G)
\item R - Real variables
\item C - Continous variables (same as R)
\item I - Integer variables
\end{itemize}
Each letter code can be preceded by a `!' character to indicate a negation
(\IE "!P" would mean everything \emph{apart from} pedigree variables).  The
default Missing rule (which will not be used if any explicit Missing rule is
found) is:
\begin{verbatim}
MISSING ["F"] "0"
\end{verbatim}
Note that multiple Missing commands can be given, in which case the results
are cumulative.
\subsection{MARKER LOCUS command}
If the two alleles for a marker occur in separate fields (as is the
defaults) then
each marker locus needs to be linked with the 2 variables containing the allele
information for that locus.  This is done using the MARKER LOCUS command, as
in the examples below:
\begin{verbatim}
MARKER LOCUS D1G1[all_1a,all_1b]  
MARKER LOCUS D1G2[all_2a,all_2b], D1G3[all_3a,all_3b]  
\end{verbatim}
A given marker can not occur multiple times in MARKER LOCUS statements; in
addition, a given allele variable can only be used once.

If both alleles occur in a single data field then this linking to the allele
variables is not required, but it is still necessary to use the Marker Locus
command to identify the variables as being markers:
\begin{verbatim}
MARKER LOCUS D1G1
MARKER LOCUS D1G2,D1G3
\end{verbatim}

\subsection{LINK command}
This command describes the grouping of markers into linkage groups.  A marker
\emph{must} occur in a LINK statement to be included in the analysis.
Conversely, selectively commenting out LINK statements is an effective way
of selecting which markers to run in an analysis.  Each linkage group must
be given a unique name which allows it to be identified.  A marker can not
occur in multiple linkage groups.  An example LINK command is given below:
\begin{verbatim}
LINK "chromosome 1",D1G1,D1G2,D1G3,D1G4,D1G5
\end{verbatim}
\subsection{TRAIT LOCUS command}
This command is used to assign a name to the trait locus so that it can be
specified in the MODEL command.  Note that you only should specify 1 trait
locus; the number of trait loci (or the range of numbers of trait loci) is
specified in the control file for \Loki.  The command for declaring a trait
locus called QTL would be:
\begin{verbatim}
TRAIT LOCUS QTL
\end{verbatim}
\subsection{MODEL command}
This is where the trait model is specified (note a linear regression model is
assumed, and traits have to be continuous).  Examples of MODEL commands follow:
\begin{verbatim}
MODEL y=QTL
MODEL y=age+sx+QTL
MODEL q=sx+QTL
\end{verbatim}
In these examples, QTL refers to a trait locus and is declared as such elsewhere
in a TRAIT LOCUS statement.  It is not necessary to include a trait locus in the
model.
\subsubsection{Fitting genetic covariates/candidate loci}
It is possible to fit individual marker loci as genetic covariates.  In this
case, an effect of each genotype at the marker locus is estimated.  For this
reason this option is not recommended with loci that have more than 3-4
alleles.  Missing genotype data is sampled as for other loci, conditional on
the pedigree structure, neighbouring loci, and the trait data.
\subsubsection{Fitting a polygenic random effect}
\label{lab_model_poly}
A individual animal polygenic effect can by fitted by including the
individual id variable in the model.  For example:
\begin{verbatim}
PEDIGREE id,father,mother
MODEL y=id+age
MODEL y=id+age+QTL
\end{verbatim}
Note that mixed inheritance models with both polygenic and QTL can be fitted
(2nd example above).  In many cases there is not a lot of information
distinguishing a major gene from a polygenic effect, and there can be strong
negative correlation between the polygenic additive genetic variance and the
variance due to a QTL.  Using the other pedigree variables in the model
statement instead of the id has undefined effects.
\subsection{LOG command}
A report from \Prep is written out to the logfile, if specified in a LOG
command, or to standard output otherwise.  The format for this command is
simply LOG "filename".  A new log file is started with each run of \Prep, so
if the old logfile exists it is renamed to the original filename\verb/+~/.
\subsection{DISCRETE command}
This command specifies that certain variables are to be treated as discrete
levels rather than continuous data.  Note that pedigree variables (those
declared in a PEDIGREE command) and marker allele variables are
automatically declared as discrete, as is any alpha-numeric data.  The
format for the command is DISCRETE var$_1$,var$_2$,\ldots ,var$_n$.
\subsection{REAL and INTEGER commands}
Numeric data is assumed to be real unless it is declared integer with an INTEGER
command.  These commands allow \Prep to check the type of the data, and give
errors if the wrong type of data is found.  The format is the same as for the
DISCRETE command.
\subsection{MULTIPLE and CONSTANT commands}
Pedigree variables and genotype data are automatically declared to be
constant, \IE they can not change for a given individual.  By default, all
other variables are also declared constant, but this can be over-ridden for
sets of variables using the MULTIPLE command.  To set the default for all
non-pedigree and genotype data to be MULTIPLE, set the system variable
multiple\_records to 1 (with the command SET multiple\_records 1
\ref{set_command}).  In this
case, the default can be over-ridden using the CONSTANT command.  The format
for both command are as for the DISCRETE command.

If multiple records are present for an individual, then constant variables
are checked for consistency across records.  More importantly, if there are
multiple observations on an individual, a covariate declared constant will
apply to all observations on that individual, without having to appear in
every record.  This allows a logical record on an individual to be created
from multiple physical records in different files.

For variables declared as multiple, however, variables have to occur in the
same input line in the same datafile to be regarded as occurring in the same
logical record.  This means that if there are 2 datafiles, each with a
different sub-set of variables, if the variables are declared as MULTIPLE
then \emph{every record will appear to be incomplete, and may be discarded
by Loki}.  The MULTIPLE command should therefore be used with care.

(n.b., the default behaviour of Loki prior to version 2.4.2\_3 was that all
non-pedigree and non-genotype data was declared MULTIPLE.  This led to some
confusion so the default was changed).

\subsection{FILTER command}
A number of files are created by \Prep for \Loki to read in.  Some of these
files can be large, particularly when many marker loci are being analyzed.  It
is possible to instruct \Prep to filter the files through an external
command before saving them.  The main (only?) use of this would be to filter
the files through a compression program such as \verb+gzip+\ or
\verb+compress+.  An example of this is given below:
\begin{verbatim}
FILTER "gzip"
\end{verbatim}
Use of this command with programs other than \verb+gzip+,
\verb+compress+\ or \verb+bzip2+\ is not supported.
\subsection{USE WHERE command}
This command allows for the selection of a subset of data where a particular
condition holds.  Two examples are given below:
\begin{verbatim}
USE all_1a,all_1b WHERE(y);
USE all_1a,all_1b WHERE(y<40 and sex="M" and ((x-y)>20 or (z-y)<0));
\end{verbatim}
The first example restricts all\_1a and all\_1b to where y is not missing. The
second example shows that more complex expressions are possible.  Note that a
subexpression containing a missing variable evaluates to false.  Multiple USE
WHERE commands can occur, with each new command acting on the subset of data
left after the previous command.
\subsection{CENSORED command}
Loki currently has limited support for analysis of censored data (such as age of
onset data).  The format is similar to the USE WHERE command, with the WHERE
expression determining when a variable has been censored.  For example:
\begin{verbatim}
CENSORED age WHERE (affected="1")
\end{verbatim}
would mark age as a censored variable when the affected code was equal to "1".
Note that if affected has been declared as a numeric type (INTEGER or REAL),
then the quotes around the 1 should be omitted.
\subsection{AFFECTED command}\label{aff1}
Loki also has limited support for affected only analysis using an IBD
sharing approach. This required identification of affected individuals,
which is done in a similar way to the CENSORED command, for example: 
\begin{verbatim}
AFFECTED WHERE (affected="1")
\end{verbatim}
Note that currently this only has an affect with IBD sharing analyses, not
the parametric analyses.
\subsection{RANDOM command}
Loki\_2.4 has support for fitting uncorrelated random effects to the model.  The effect
should be a discrete factor, and is declared using the RANDOM command:
\begin{verbatim}
RANDOM litter
MODEL y=litter+age
\end{verbatim}
It is not necessary to apply the DISCRETE command to tell Loki the effect is
discrete - this is done automatically.  This command should \emph{not} be
used for polygenic random effects.  These are fitted simply by adding the id
variable to the model statement (Section \ref{lab_model_poly}).
\subsection{OUTPUT command}
\Prep has the option of outputting the final pedigree it is working with
after pruning and recoding.  This can occasionally be useful, though the
output file is not directly used by \Loki.  This is done by, for example:
\begin{verbatim}
OUTPUT "output.dat"
\end{verbatim}
The format of the output file is currently:
col 1: Component number
col 2,3,4: original id codes for individual, father, mother
col 5: sex
col 7,8,9: internal id codes for individual, father, mother
col 10: affected status (if AFFECTED command used).
Note that this is subject to change in later versions.
\subsection{SET command}
\label{set_command}
There are a number of options for \Prep which can be set in the control file
which modify the way in which it operates.  In many cases, the default
values are sensible and can can be left as they are. Each option is set as
in the following examples:
\begin{verbatim}
SET peel_trace 1
SET prune_option 0
\end{verbatim}
Note that most of these options are for debugging purposes only, the notable
exception being correct\_errors. The various possible
options are:
\begin{itemize}
  \item correct\_errors\\
	 On encountering a genotype error, \Prep will normally print a message
	 and stop.  However, by setting this option to 1, \Prep will attempt to
	 identify and remove the offending genotypes.  The way this is done is
	 effective, but without any statistical justification.  It is very useful
	 with large datasets, but the results should be treated with caution. The
	 procedure does, however, give information on which genotypes were
	 deleted for each marker, which provides valuable information on bad
	 markers or possible pedigree errors. The procedure operates by first
	 identifying families where inconsistencies occur, and deleting genotypes
	 for all family members.  After all inconsistent families have been
	 deleted, they are then re--examined in the reverse order to which they
	 were deleted, and genotypes of family members are returned to the
	 analysis where they do not cause an inconsistency.  This is a reasonably
	 quick way of identifying a small (but not necessarily the smallest)
	 subset of genotypes which, if removed, would result in no
	 inconsistencies. After processing a marker, \Prep will write out a file
	 named [marker\_name].err, which will have a line for each delete
	 genotype showing the individual id, and the father and mother of the
	 nuclear family in which the problem arose.  If \Prep is re--run with the
	 same marker, it will read in this file if it exists, and try this first.
	 This file there provides a way both of seeing which genotypes are
	 causing problems, and to impose a list of genotypes to be deleted on
	 \Prep if this is desired.  Note that the original data files are not
	 changed.
  \item prune\_option\\
	 By default, pedigrees are pruned before analysis to remove pedigree sections
	 that contribute no information.  This option determine how much pedigree
	 pruning is performed.  Allowable values are:
	 \begin{itemize}
		\item 0 - no pruning at all.
		\item 1 - prune pedigree based on data on all markers and trait
		  information (\IE an individual is considered observed if they have data on
		  any marker or any phenotype data.
		\item 2 (default) - prune pedigree individually for each trait marker
		  and trait locus.
	 \end{itemize}
  \item recode\_option\\
	 Alleles are normally recoded both within pedigree components and within
	 families (`fuzzy recoding').  This can be controlled using this option.
	 Note that this should only be done for testing purposes - the default
	 recoding level which uses fuzzy recoding is generally much faster than
	 the other levels.  Allowable values are:
	 \begin{itemize}
		\item 0 - no recoding of alleles.
		\item 1 - recode alleles within pedigree components.
		\item 2 (default) - recode alleles within families (fuzzy recoding).
	 \end{itemize}
  \item no\_extra\_allele\\
	 Normally, \Prep allows for the presence of a non--observed allele when
	 setting up the peeling calculations.  This is generally correct, however
	 if it is known  that all possible alleles have been observed, memory and
	 computational requirements can \emph{sometimes} be reduced using this option.
	 \begin{itemize}
		\item 0 (default) - allow extra allele.
		\item 1 - no extra allele allowed.
	 \end{itemize}
	\item peel\_option
	  Loki distinguished between three classes of peeling operations: (1)
	  nuclear family based, peeling onto a single individual; (2) nuclear
	  family based, peeling onto two individuals; (3) non-nuclear family
	  based (peeling some sub--set of individuals onto another sub--set of
	  individuals).  The last class could be used for all peeling operations.
	  However, operations in the other classes, when possible, are much
	  faster.  For this reason, by default all possible class 1 operations
	  are performed, then all class 2, and then class 3.  For debugging
	  purposes it is possible to turn off the family based peeling using this
	  option.  Values for the option are determined by setting the
	  appropriate bits, if you don't understand this then you shouldn't be
	  trying to debug anything{\ldots}
	  \begin{itemize}
		 \item bit 0 - Turn off class 1 and class 2 peeling for marker loci
		 \item bit 1 - Turn off class 1 and class 2 peeling for trait loci
		 \item bit 2 - Turn off class 2 peeling only for all loci
	  \end{itemize}
	\item trace\_restrict, trace\_censored, trace\_affected, trace\_peel
	  These option turn on different levels of debugging information about
	  the restrict (WHERE,USE), CENSORED and AFFECTED commands, and the
	  peeling operations.  Different levels of information are turned on by
	  setting different bits - the higher numbered bits give more
	  information. The levels are not precisely defined as these options are
	  mainly for my benefit, and may change without warning.
	\item no\_default\_missing
	  Turn off the default missing code (\IE revert to the behaviour of \Prep
	  prior to version 2.4.5).
	 \begin{itemize}
		\item 0 (default) - use default missing code.
		\item 1 - no default missing code.
	 \end{itemize}
\end{itemize}
\subsection{Temporary variables}
As well as variables which are read in from datafiles, \Prep also allows the
use of temporary variables which exist only while the control file is being
scanned.  Unlike standard variables, the values of temporary variables are
set as \Prep processes the control file, so the order of statements using
such variables is important.  One use for temporary variables is given below:
\begin{verbatim}
Directory="../../data/"
FILE Directory+"phenotypes",x,y,z
FILE Directory+"genotypes",x,a1,a2,a3,a4,a5,a6
\end{verbatim}
Note how strings can be concatenated using the `+' operator. Temporary
variables can also hold numeric values and can be assigned values using
complex expressions, for example:
\begin{verbatim}
n_markers=45
s=4*(n_markers-5)+n_markers/2.3
\end{verbatim}
\subsection{Arrays}
Multipoint analysis can involve hundreds of markers, and specifying each of
these individually in the control file can become tedious.  To reduce this a
little, arrays can be used.  Arrays must be declared before they are used
using the ARRAY command:
\begin{verbatim}
ARRAY id(3),q(4),allele_a(30),allele_b(30)
\end{verbatim}
The $i$th array elements is referenced as id($i$).  If the array name alone
is given then this is treated as a list of variables of length the size of
the array.  This means that, given the ARRAY statement above, the 2 examples
below are equivalent.
\begin{verbatim}
FILE "filename",id(1),id(2),id(3),q(1),q(2),q(3),q(4)
FILE "filename",id,q
\end{verbatim}
Reference to a range of array elements is also allowed (spot the FORTRAN
influence):
\begin{verbatim}
FILE "filename",id,(q(i),i=1,3)
FILE "filename",id(1),(allele_a(i),allele_b(i),i=1,30)
\end{verbatim}
The first example reads the first 3 fields into id and the next 3 into the
first 3 elements of q.  The second example is a little more complicated, it
reads in a single id variable into q(1), and then 30 pairs of
allele\_a(i),allele\_b(i); this last form can be very useful for reading in
genotype data.  Multidimensional arrays are not supported at present.
\subsection{DO WHILE commands}
Very limited looping capability is provided; this is intended to help handle
large arrays of genotype data.  After reading in genotype data on large
numbers of makers, it is necessary to link the allele variables to the
appropriate markers using the MARKER LOCUS command, as described earlier.
This can be done as follows:
\begin{verbatim}
ARRAY id(3),allele_a(30),allele_b(30),D1G(30)
FILE "filename",id(1),(allele_a(i),allele_b(i),i=1,30)
i=1
DO
  MARKER LOCUS D1G(i)[allele_a(i),allele_b(i)]
  i=i+1
WHILE(i<=30)
\end{verbatim}
This example reads in 30 pairs of allele data into allele\_a and allele\_b,
and then assigns each pair to the corresponding marker in D1G.  Note that
the indentation for the commands is totally arbitrary.

\subsection{Environment variables}
It is possible to use environment variables within a control file.  The
following example shows how this could be used:
\begin{verbatim}
DataDir="$HOME"+"/data/gaw9/"
\end{verbatim}
%$
This is intended to simplify having multiple loki runs controlled by a shell
script.  If this does not make any sense to you, don't worry - you probably
won't miss it.
\subsection{IBD sharing analyses}
To perform and IBD sharing analysis or to estimate kinship coefficient
matrices, as discussed earlier, follow the instructions for \Prep as for a
parametric analysis, except that it is not necessary to specify a model.
For the IBD sharing analysis, it will be necessary to use the AFFECTED
command to indicate which individuals are affected.  In the parameter file
for \Loki, it will also be necessary to use the ESTIMATE IBD command
(Section \ref{estibd}), or the  ANALYSIS command (Section \ref{analysis1})
to instruct \Loki which analysis to perform. 
\subsection{Full example}
To finish this section, here is a somewhat complex control file I uses to
read in the simulated datafiles produced for the GAW9 workshop.  The
datafiles are the originals, except that they have been compressed to save
disk space.  For this example, FILE statements for all of the haplotype data
files are included, but the LINK statements are commented out for all
chromosome apart from chromosome 1.  This has the effect that only
markers on chromosome 1 will be analyzed and, therefore, only the files
which contain genotype data for chromosome 1 markers will be read in.
\begin{verbatim}
/******************************************************************
 *                                                                *
 *                Control file for the GAW9 dataset               *
 *                                                                *
 ******************************************************************
 *
 *
 * Define arrays for haplotypes, markers, traits and ids
 *
 * Each file has 15x2 columns of allele data
 * and there are 30 markers/chromosome
 * 
 */

Array hap1_left(30), hap1_right(30), hap2_left(30), hap2_right(30),
      hap3_left(30), hap3_right(30), hap4_left(30), hap4_right(30),
      hap5_left(30), hap5_right(30), hap6_left(30), hap6_right(30),
      D1G(30), D2G(30), D3G(30), D4G(30), D5G(30), D6G(30), 
      q(4), id(3)

Filter "gzip"            # Compress binary output files
DataDir = "../data/gaw9" # Where the data files are located
Missing "0",id           # Define "0" as a missing value for id 

/* Phenotype datafiles */
file [5x,3(5),15x,2,3,6x,6,4(7)] 
   shell("zcat "+DataDir+"/phenotyp.gz"),id,sx,age,ef,q
 
/* Genotype data files */
file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d1a1-15.gz"),
   id(1),(hap1_left(i),hap1_right(i),i=1,15)
file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d1a16-30.gz"),
   id(1),(hap1_left(i),hap1_right(i),i=16,30)

file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d2a1-15.gz"),
   id(1),(hap2_left(i),hap2_right(i),i=1,15)
file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d2a16-30.gz"),
   id(1),(hap2_left(i),hap2_right(i),i=16,30)

file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d3a1-15.gz"),
   id(1),(hap3_left(i),hap3_right(i),i=1,15)
file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d3a16-30.gz"),
   id(1),(hap3_left(i),hap3_right(i),i=16,30)

file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d4a1-15.gz"),
   id(1),(hap4_left(i),hap4_right(i),i=1,15)
file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d4a16-30.gz"),
   id(1),(hap4_left(i),hap4_right(i),i=16,30)

file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d5a1-15.gz"),
   id(1),(hap5_left(i),hap5_right(i),i=1,15)
file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d5a16-30.gz"),
   id(1),(hap5_left(i),hap5_right(i),i=16,30)

file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d6a1-15.gz"),
   id(1),(hap6_left(i),hap6_right(i),i=1,15)
file [5x,5,15(1x,1,1)] shell("zcat "+DataDir+"/d6a16-30.gz"),
   id(1),(hap6_left(i),hap6_right(i),i=16,30)

/* Link haplotypes to the appropriate markers */
i=1
do
  Marker Locus 
    D1G(i) [ hap1_left(i), hap1_right(i) ], 
    D2G(i) [ hap2_left(i), hap2_right(i) ],
    D3G(i) [ hap3_left(i), hap3_right(i) ], 
    D4G(i) [ hap4_left(i), hap4_right(i) ], 
    D5G(i) [ hap5_left(i), hap5_right(i) ], 
    D6G(i) [ hap6_left(i), hap6_right(i) ]
  i=i+1
while(i<=30)

/* Only genotype data only where phenotype data is available */
use
  hap1_left,hap1_right,hap2_left,hap2_right,hap3_left,hap3_right,
  hap4_left,hap4_right,hap5_left,hap5_right,hap6_left,hap6_right
where (q(1))

Trait Locus QTL     # Define QTL as 1 or more trait loci
pedigree id         # Declare pedigree variables
log "gaw9.log"      # Name for log file
sex sx 1,2          # sx determine sex: 1=Male, 2=Female

/* Set up model */
model q(1)=age+ef+sx+QTL

/* Define linkage groups - just analyze chromosome 1 this time */
Link "Chromosome 1",D1G

/*
Link "Chromosome 2",D2G
Link "Chromosome 3",D3G
Link "Chromosome 4",D4G
Link "Chromosome 5",D5G
Link "Chromosome 6",D6G
*/
\end{verbatim}
\section{Parameter file for \Loki}
\label{loki_param}
The parameter file for \Loki has many similarities to the control file for
\Prep in that the order of commands is generally arbitrary, and the case of
commands is ignored.  Comments can be added in the same way as for the
control file.
\subsection{Controlling the sampling run}\label{control_run}
The commands described here control how many iterations to run, how often to
output data to standard output (normally the screen) and the output file
(which is normally the file \verb+loki.out+, unless altered with OUTPUT
FILE), and when to start producing the output.  The example below sets the
number of iterations to 10000, with output starting at iteration 50, and
begin produced at every iteration after that, and sets the output file name
to \verb+loki.out1+.
\begin{verbatim}
ITERATIONS 10000
START OUTPUT 50
OUTPUT FREQUENCY 1
OUTPUT FILE "loki.out1"
\end{verbatim}
Separate starting times and output frequencies can be set for
standard output and the output file by specifying a second argument to
the relevant commands as in the following example:
\begin{verbatim}
ITERATIONS 10000
START OUTPUT 50,1
OUTPUT FREQUENCY 2,1
\end{verbatim}
In this example, output to file starts at iteration 50 and continues at
every second iteration.  Printing to standard output, however, starts at the
first iteration and carries on at every iteration after that. Specifying a 0
for an output frequency turns off the respective output stream.  The default
behaviour is to carry on forever, printing output at every iteration
starting from iteration 1.

The OUTPUT command can also be used to determine which model effects are
written out.  By default only the overall mean and the trait loci effects
are output.  If the effects of other covariates or candidate loci are
required, then this can be done by:
\begin{verbatim}
OUTPUT sx,age
OUTPUT D1G2
\end{verbatim}
Interpretation of which column of output refers to which covariate effect is
explained in the Analysis of Output section.  The format for the output file
has changed from version to version of Loki.  It is understood that these
changes can cause difficulties for users of earlier versions who have
written tools to interpret the output.  For this reason, it is possible to
instruct Loki to use the output format of previous versions, as in the
following example:
\begin{verbatim}
OUTPUT TYPE 1
\end{verbatim}
Output type 0 gives the original output format, as described in the first
version of this document for Loki version 2.0. Output type 1 gives the
default output for version 2.1, output type 2 gives the default output for
version 2.2, and type 3 gives the output for version 2.3/2.4 (this is the system
default).  Note that the current correspondence between version minor number
and output type may not continue.  The currently defined types (0,1,2,3)
however, should not change in future versions (instead a new type 4 will be
defined).  This should make switching between versions easier.  Note that
use of the older output types is deprecated.  Efforts should be made to move
to the newer output types which contain more information and are, generally,
more compact.  The scripts distributed with Loki will read all output types,
though they can't compensate for information missing in the earlier formats.
It is likely that the output format for Loki will continue to change in the
future as new features are added.  It is suggested, therefore, that use is
made of the supplied scripts which will be kept up to date.  More details
about the scripts is given in Section \ref{scripts}.
\subsection{Dumping and restarting}
When \Loki exits from a non--error condition (\IE it has finished the
required number of operations, or is interrupted with CTRL-C), it tries to
dump it's internal state to a file (by default called \verb+loki.dump+).  It
can then be restarted at a later date by passing the -r flag to \Loki
\IE\verb+loki -r parameter_file+.  \Loki can also be set to periodically
dump its internal state; this is very useful if there is a possibility of
the machine crashing or being switched off during a run, or for debugging
purposes if \Loki is crashing partway though a long run.  The dump frequency
and dump file are set as in the following example, which sets \Loki to dump
state every 50 iterations to the file \verb+loki.dump+.
frequency to 50
\begin{verbatim}
DUMP FREQUENCY 50
DUMP FILE "loki.dump1"
\end{verbatim}
If backups have been enabled then before writing a dumpfile, \Loki checks to
see whether one already exists. If so, it renames the old file according to
the backup schedule.  See \emph{Backup schedules} (Section \ref{bk1}) for
details.  To restart an old run, either the -r flag can be used, as
described before, or the -d filename option, which allows an alternative
dumpfile to be specified.  By default, the restarted run will create a new
output file.  To make \Loki append to an existing output file, pass the -a
flag to \Loki. For example, to get \Loki to restart using the dumpfile
\verb+loki.dump1+, and to append output to the existing outfile, use
\verb+loki -adloki.dump1 param_file+.

To restart a run, it is necessary to have the files \verb+loki.opt+,
\verb+loki.gen+ and \verb+loki.dat+ (and \verb+loki.nrm+ if a polygenic
effect is being fitted) in addition to the dumpfile.  It is not
necessary to keep the seedfile, as the seed state is stored in the dumpfile.
Normally all of these files can be exchanged between machines of different
types, so a run started on a Sun Sparc machine can be restarted on an Linux
Intel based machine, with the same results as if it had been restarted on
the same machine.  If this feature is to be used, it should be checked as
certain machines do not follow the established standards for
floating point arithmetic.  This behaviour can often be changed using
appropriate compiler flags - check your compiler manual.
\subsection{Controlling the number of trait loci}
By default, the prior on the the number of trait loci is uniform between 0 and
16, with a starting value of 0.  The starting number of loci and the prior can
be set with the TRAIT LOCI command as below:
\begin{verbatim}
TRAIT LOCI 0,10
TRAIT LOCI MEAN 2
START TRAIT LOCI 3
\end{verbatim}
This example sets the range for trait loci to be between 0 and 10, fits a
(truncated) Poisson prior with mean 2 (rather than a uniform prior), and sets
the starting number for trait loci to be 3.  These numbers are checked by \Loki
for consistency.
\subsection{Setting the seedfile}
Initialisation of the random number generator used by \Loki requires a
seedfile.  By default this is called \verb+seedfile+, if this is not found
then the generator is initialized using a fixed value.  If the program
finishes or is interrupted then the current state of the generator is
written out to the seedfile (a backup of the old seedfile is made first).
If \Loki crashes with an error then the seedfile is not updated, allowing
the run to be repeated (and hopefully determine what caused the crash).
This default behaviour can be changed using the SEEDFILE command, as in the
examples below:
\begin{verbatim}
SEEDFILE "test.seedfile"
SEEDFILE "my_seedfile",1
\end{verbatim}
The second argument of 1 to the SEEDFILE command in the last example prevent
the seedfile from being overwritten when \Loki finishes.  Using this form of
the command allows a series of runs to be repeated with the same seedfile.
\subsection{Setting the random number generator}
If multiple runs of \Loki are executed in parallel (on multiple machines for
example, of on different processors on the same machine), it is generally
desirable to use non-overlapping streams of random numbers for the different
jobs.  This is difficult to achieve when using the standard random number
generator.  It is therefore possible to select from a number (currently 60)
of independent random number generators using the SET RNG command, for
example:
\begin{verbatim}
SET RNG 0 # The default setting
SET RNG 2 # Switch to a different RNG 
\end{verbatim}
If two different \Loki runs use different values for the SET RNG command then
the random number streams supplied to the runs should be non-overlapping.
The RNG used is stored in the seedfile, and in the absence of a SET RNG
command, \Loki will continue to use the RNG that was used in during the
creation of the seedfile.
\subsection{Backup schedules}\label{bk1}
By default, before writing to a dumpfile or outputfile, \Loki checks whether
a file of the same name exists and, if it does, renames the old file by
adding a `\verb+~+' character onto the end of the filename.  This behaviour can be
changed using the command \verb+SET backups+\ $n$, where $n$ is an positive
integer (for more information about the SET command in \Loki, see Section
\ref{set1}).  If $n=0$ then backups are turned off, if $n=1$ then the
default behaviour, described above, is followed, and if $n>1$ then numbered
backups are produced.  In this case, a series of $n$ backup files are kept.
If the original file was called \verb+loki.dump+\ and $n>=3$, for example,
then the three most recent backup versions of this file would be
\verb+loki.dump~1~+, \verb+loki.dump~2~+,
\verb+loki.dump~3~+, with \verb+loki.dump~1~+ being the most recent.  Note
that for the seedfile, the default behaviour (when $n=1$) is always
followed, irrespective of any \verb+SET backups+ command.
\subsection{POSITION command}\label{pos1}
The map position of markers is set using the Position command.  The format
is very simple, as is shown by the examples below (map distances are given
in cM):
\begin{verbatim}
POSITION D1G1 21.3
POSITION D4G(3) 3.3,5.6
\end{verbatim}
Marker names are the ones used in the \Prep control file.  If a marker
specified in a POSITION command is not found then the statement will be
silently ignored. If two positions are given \emph{for any marker} (as in
the second example above) then it is assumed that a sex specific map is
being used, with the male map position coming first.  For more details, see
Section \ref{sex_spec}.  By default, the input map is assumed to be a
Haldane map.  Kosambi maps can be input using the MAP command
(Section \ref{mapfunctions})

\label{loki_mpos1}
The START keyword can be added at the beginning of a KEYWORD command to
indicate that the position should be taken as a starting value, and that
Loki should sample the marker position at each iteration, for example:
\begin{verbatim}
START POSITION D1G1 21.3
START POSITION D4G(3) 3.3,5.6
\end{verbatim}
The order of markers is assumed to be fixed and is \emph{not} sampled.  If
all markers in a linkage group have positions which can vary, then the
position of the first marker is arbitrarily fixed to provide a reference
point.  If any marker positions are sampled, then Loki will generate an
additional output file \verb+loki.mpos+ (Section \ref{loki_mpos}) which contains the sampled
positions.  The schedule for outputting to this file is the same as for
\verb+loki.out+ (Section \ref{control_run}).

\subsection{MAP command}\label{map1}
The map length of each linkage group can be set
using the MAP command; in addition, the total genome length can also be set.
These quantities are used to determine the prior probabilities of linkage to
a region, which is intended to be proportional to the length of that region.
Interpretation of the output from \Loki depends on how these are set.
\textbf{\textsc{It is, therefore, very important that these map lengths are
set to biologically realistic values}}.  For Human linkage analysis, I
typically use values for the total map length of around 3000-4000 cM.
Examples of the use of MAP commands follow:
\begin{verbatim}
MAP "chromosome 1" 0.0,54.0
MAP "chromosome 2" 0.0,76.2
TOTAL MAP 3600.0
\end{verbatim}
If a MAP command for a linkage group is not present, then the MAP range is
set to the range of marker positions within that group.  If a TOTAL MAP
command is not present (or is less than the sum of the map lengths of
linkage groups in the analysis) then the total genome length is set to the
sum of the map lengths of analyzed linkage groups.  This effectively means
that in this case trait loci have to be located in one of the analyzed
linkage groups, instead of there being some probability (as is normal) of a
trait locus being unlinked to any group.  If sex specific maps are required
then the following syntax should be used:
\begin{verbatim}
MALE MAP "chromosome 1" 0.0,54.0
FEMALE MAP "chromosome 1" 0.0,76.2
TOTAL MALE MAP 360.0
TOTAL FEMALE MAP 500.0
\end{verbatim}
For more details on sex specific maps, see Section \ref{sex_spec}.

\label{mapfunctions} The MAP command can also be used to set the input map
function to either Haldane or Kosambi.  By default, \Loki assumes that the
input map uses the Haldane map function.  In either case, the output map
always uses the Haldane map function.  Note that many maps which are
available over the web are Kosambi maps; a good example of this would be the
Marshfield genetic maps.  To use a Kosambi input map, the following command
should be used:
\begin{verbatim}
MAP FUNCTION KOSAMBI
\end{verbatim}

\subsection{FREQUENCY command}
This command allows the frequency of marker alleles to be set. Frequencies
of trait loci can not be set. A Dirichlet$(1,1,\ldots ,1)$ prior (which is
`uniform' in some sense) is put on the allele frequencies of alleles with
unset frequencies, and the frequencies are integrated over.  This default
option is often preferable unless very good frequency information is
available.  Some example FREQUENCY commands are given below:
\begin{verbatim}
FREQUENCY D1G1 A,0.2 B,0.25 C,0.4 D,0.15
FREQUENCY MK1 113,.6 117,.1 118,.1 119,.1 111,.05 109,.05
\end{verbatim}
Note that if there are several genetic groups (Section \ref{group1}) then
multiple frequencies will have to be specified for each allele.  To indicate
which group a particular frequency corresponds to, a GROUP ORDER command
(Section \ref{group2}) must appear before the first FREQUENCY command.

Not all alleles have to be set, in which case the frequencies of the
remaining alleles are integrated over.  If frequencies are given for alleles
\emph{not} segregating in the dataset being analyzed, then these alleles
will be lumped together.  If no `extra alleles' are specified then it is
assumed that the only alleles segregating in the population are those found
in the datafile (but see below).  If a frequency is given as \verb+*+ then
that frequency is regarded as being unset.

Starting values for allele frequencies can be set using the START FREQUENCY
command, with the same format as the basic FREQUENCY command.  This can be used
to allow for `extra alleles' in the analysis without having to know their
frequencies.
\subsection{Estimating marker allele frequencies}
\label{lab.freq1}
Normally the estimates of marker allele frequencies are not output.
Sometimes, however, it may be useful to have this information.  Loki can be
told to output the marker frequency information using a new form of the
OUTPUT FREQUENCY command as shown below:
\begin{verbatim}
OUTPUT FREQUENCY "freqfile"	 
\end{verbatim}
This will instruct Loki to output all marker allele frequencies to the file
freqfile.  Estimates will be output using the same schedule as for the main
Loki output file.  The structure of freqfile, and how to process the results
is discussed in Section \ref{lab.freq}
\subsection{GROUP ORDER command}\label{group2}
If the founder individuals come from multiple genetic groups, as specified
by a GROUP command in the control file (Section \ref{group1}), then separate
allele frequencies should be specified for each group.  This is done by
including multiple frequency values for each allele in the FREQUENCY command
(see above). Before the FREQUENCY command appears in the parameter file it
is necessary to specify the group order for the frequencies.  For example,
if there are 2 genetic groups, A and B, then each allele should have 2
frequencies specified for it, and the statement:
\begin{verbatim}
GROUP ORDER A,B
\end{verbatim}
indicates that the first frequency is for group A and the second for group
B.  A GROUP ORDER statement will affect all FREQUENCY commands that follow
it (until another GROUP ORDER statement appears).
\subsection{MEAN and VARIANCE commands}
It is possible to fix the values of the overall mean and residual variance, or
to set starting values for these parameters.  The format of the commands is
demonstrated below:
\begin{verbatim}
MEAN 10.0
RESIDUAL VARIANCE 100.0
\end{verbatim}
As with the FREQUENCY command, these can be preceeded by the START keyword
to set a starting value for the parameter.
\subsection{ESTIMATE IBD command}\label{estibd}
If \Loki is being used to estimated IBD relationships (realized kinship
coefficients), then this command is used to specify at which map positions
estimation should be made.  The output format and interpretation is
discussed in Section \ref{ibd_estimation}. The format of the command is
given below:
\begin{verbatim}
ESTIMATE IBD 0,10.0,13.2
ESTIMATE IBD Chrom1,0,10.0,13.2
ESTIMATE IBD MARKERS
ESTIMATE IBD MARKERS Chrom2
ESTIMATE IBD GRID 0,100,1
ESTIMATE IBD GRID Chrom7 0,80,.5
\end{verbatim}
Note that specification of a linkage group is required when there are more
than one linkage groups, and optional otherwise.  There is no benefit in
estimating IBD relationships for multiple chromosomes simultaneously.
However, specification of the linkage group is recommended because it
provides an easy way to have linkage group specific positions.  The MARKERS
and GRID keywords provide a shortcut for specifying positions either at each
markers position, or at regular grid positions.  For the GRID keyword, it is
necessary to supply the start and end points of the grid, as well as the
grid size.  Note - it is strongly recommended that the lm\_ratio is set to a high
value when multipoint IBD estimation is being performed \ref{set1}.

\subsection{ANALYSIS command}\label{analysis1}
The ANALYSIS command is used to instruct \Loki to use a non--standard
analysis (normally an IBD sharing analysis).  By default \Loki will perform
a quantitative trait locus analysis unless the ANALYSIS or ESTIMATE IBD
(Section \ref{estibd}) commands are used. Currently, the only usable forms
for this commands are:
\begin{verbatim}
ANALYZE AFFECTED,IBD
ANALYZE AFFECTED,IBD,NULL
\end{verbatim}
Note that the order of the parameters is not significant.  The first example
above will estimate IBD sharing amongst affecteds (as determined by the
AFFECTED command , Section \ref{aff1}).  The second example will estimate
the null distribution of the sharing statistic.  The use of these is
explained more in Section \ref{ibd_analysis}.
\subsection{SET command}\label{set1}
As with \Prep, \Loki has a number of internal variables which can be set to
influence the way it runs.  Some of these variables are mainly for
debugging, but several are for more general operations (for example, see
Sections \ref{bk1}, \ref{prior1}).  The more useful options are listed below:
\begin{itemize}
  \item backups\\
	 Set backup schedule - see Section \ref{bk1}.
  \item tau\_mode, tau\_beta\\
	 Set prior on trait loci effects - see Section \ref{prior1}.
  \item no\_overdominant 1\\
	 \label{lab_nooverdom}
	 Forces the heterozygote effect of all trait loci to be between the
	 homozygote effects.
  \item lm\_ratio\\
	 Sets the frequency of M steps \emph{vs.} L steps for the LM sampler.
	 This can greatly improve the mixing of the sampler and the use of high
values of the ratio ($>.5$) is recommended.  Note that the L sampler is
required to guarantee irreducibility, so setting the ratio to high can be
counterproductive.
	 Use of this option with the quantitative
	 trait analysis will often significantly slow down the sampler so it may
	 be better to use low values for the ratio ($<.2$) in this case
  \item debug\_level, peel\_trace\\
	 Provide debugging information on the internal workings of \Loki.  The
higher the setting, the more information given (upto a point{\ldots}). These
generally have no effect unless \Loki was compiled with the \verb+-DDEBUG+\
and/or \verb+-DTRACE_PEEL+ compiler flags set.  See \verb+include/config.h+\ 
for details.
\end{itemize}
\subsection{Sex specific maps}\label{sex_spec}
If the SEX command (Section \ref{sex1}) was specified in command file for
\Prep, then \Loki can use sex specific maps.  In this case, a sex specific
marker map is specified, and \Loki will then report a male and female map
position for each trait locus.  A check is made that map orders in both maps
are the same.  As would be expected, a trait locus will always be in the
same interval on both maps!  The use of sex specific maps is switched on by
using the sex specific forms of either the POSITION or MAP commands
(Sections \ref{pos1},\ref{map1}).  Note that if some POSITION or MAP
statements use the sex specific forms, and others do not, then sex specific
maps will still be used.  Where only a single figure has been provided for a
map length or marker position, then this figure will be used for both male
and female maps.
\subsection{Setting the prior on trait loci effects}\label{prior1}
The trait loci effects have a normal prior which by default is
$N(0,\tau)$, where $\tau$ is determined by an interaction between the
internal variables \verb+tau_beta+ and \verb+tau_mode+.  These default to
2 and 0 respectively, but can be changed using the commands SET
tau\_mode and SET tau\_beta (for more details on the SET command, see
Section \ref{set1}).  Depending on the value of tau\_mode, $\tau$ is
set as follows:
{\raggedright
\begin{itemize}
\item $\text{tau\_mode}=0$\ (default): $\tau$ is set equal to tau\_beta $\times$\ the \emph{starting}
residual variance.
\item $\text{tau\_mode}=1$: $\tau$ is set equal to tau\_beta.
\item $\text{tau\_mode}=2$: $\tau$ is set equal to tau\_beta $\times$\ the
\emph{current} residual variance at each iteration.
\end{itemize}}
\subsection{Estimation of Breeding Values}
If an unlinked polygenic effect is being fitted, the at each iteration Loki
will calculate the polygenic additive effect, or breeding value, for each
individual.  For some applications these values may be of interest in
themselves, so an option is provided to calculate the average value of the
breeding values and write them to a file.  This option is selected as follows:
\begin{verbatim}
OUTPUT POLYGENIC "filename"
\end{verbatim}
The estimates are only written once, at the end of the analysis run (either
the scheduled end, or when the run is interrupted.  The values are also
written to the dump file, so breeding value estimation runs can be
restarted.  The format of the output file is simply id followed by the mean
nad standard deviation of the breeding value for each individual.
\section{Analysis of output}
\label{loki_files}
Normally, output from \Loki is written to the outfiles (by default
\verb+loki.out+, \verb+loki.pos+ and \verb+loki.mpos+ if marker positions
are allowed to vary, but see OUTPUT FILE command, Section
\ref{control_run}). There may be additional output files produced depending
on the options set in the parameter file.  The form of the output from \Loki
depends on which analysis has been chosen and various parameter settings.
Unless the ESTIMATE IBD or ANALYSIS commands (Sections
\ref{estibd},\ref{analysis1}) have been specified in the parameter file, the
default analysis is the quantitative trait locus analysis, which is
discussed below.
\subsection{Quantitative trait locus analysis}
The output consists of a set of realizations of a subset of model parameters
from each sampling iteration that output is requested for (controlled by the
parameter file given to \Loki).  The main output is sent to the file
\verb+loki.out+.  More detailed information on QTL positions is sent to
\verb+loki.pos+, and information on marker positions when they are sampled
is sent to \verb+loki.mpos+ (Section \ref{loki_mpos}).  These last two files
are new to loki\_2.3.
\subsubsection{Output file \texttt{loki.out}}
At the start of \verb+loki.out+ is a header which gives information about
the file format.  First is displayed information on the version of Loki
used, the date and time the program was run, and the type of output format
requested.  This makes it possible for utilities such as \Lokiext (Section
\ref{scripts}) to process the output file irrespective of which Loki
version is used.  The exact output format depends on the output type
requested, on the model, on the markers included in the analysis, and on
various quantities set in the parameter file.  The examples given below
are for the default output type for loki\_2.4.  Previous versions of Loki
tended to give less information in the header, but were basically similar.
An example of the first two lines would be:
\begin{verbatim}
Created by loki 2.3.0: Fri Aug 18 07:51:14 2000
Output format: 2
\end{verbatim}
After this comes the model being used for the analysis:
\begin{verbatim}
Model: Q(1) = QTL + AGE_AT_EXAM + SX
\end{verbatim}
This is followed by information on any marker loci being modelled. Each linkage
group is listed in map order, with the name and map position of each marker
being displayed.
\begin{verbatim}
Linkage groups:
  1: chrom_1  Map range:  (70cM to 130cM)
    D1S2866 - 98.21
	 D1S1665 - 102.02
	 D1S1728 - 109.04
  2: chrom_2  Map range:  (0cM to 70cM)
    D2S1400 - 27.60
	 D2S305 - 38.87
	 D2S144 - 45.30
	 D2S1788 - 55.51
Total Map Length: 3500cM
\end{verbatim}
Note that if sex specific maps are being used (Section \ref{sex_spec}), then
each marker will be followed by two numbers, the male and female map positions.

After the marker listing comes information on the column identities.  This
is typically starts with the following two columns:
\begin{verbatim}
  1: Iteration count
  2: Residual variance
\end{verbatim}
If the residual variance is fixed, then the corresponding column will be
omitted.  If a polygenic effect
and/or additional random effects are fitted, their variances will appear next:
\begin{verbatim}
  3: Additive variance
  4: Additional variance for LITTER
\end{verbatim}
This is followed by information as to which covariate effects are
being output in which columns. The following is the output when a candidate
gene CG1 is being fitted along with the two level factor SX.  CG1 has three
alleles (A,B,C), and SX has two levels (M,F). Note that the first level of
each factor effect (A,A for CG1, M for SX) is not output as it is always set
equal to zero. In addition to the factor effects, candidate genes have
additional columns giving the allele frequencies (for all but 1 of the
alleles), and the size of the genes contribution.  For more explanation of
this, see the section on the QTL effect size below.
\begin{verbatim}
 covariate data:
  5: Grand mean
  6: CG1 freq. A
  7: CG1 freq. B
  8: CG1 effect A,B
  9: CG1 effect B,B
  10: CG1 effect A,C
  11: CG1 effect B,C
  12: CG1 effect C,C
  13: CG1 size
  14: SX effect F
\end{verbatim}
Again, if the grand mean is fixed then the corresponding column will be
omitted.  The `size' of a candidate locus is an empirical measure (not used by the sampling
process), which is an attempt to get get around some of the problems of
interpreting the gene effects directly.  These problems mainly come when the
gene frequency is extreme (near 1 or 0).  In these situations there is
little or no information about the effects of one of the homozygotes, so the
gene effects can fluctuate wildly, making interpretation difficult.  It is
also useful to have a single measure of gene size.  The measure used here is
the square root of the variance contributed by the trait locus, and is
calculated as below:
\begin{equation*}
s=\sqrt{(1/n)\sum_in_i(\mu_i-\mu)^2},\quad\mu=(1/n)\sum_in_i\mu_i,
\end{equation*}
where $n$\ is the total number of individuals observed for the trait, $n_i$
is the number of observed individuals with genotype $i$, and $\mu_i$ is the
effect of genotype $i$. The size is also calculated for QTLs (see below).

If genetic groups have been specified (Sections
\ref{group1},\ref{group2}), and the number of groups>1, then a separate
frequency will be output for each group.  The relevant section of the header
file will then look something like this:
\begin{verbatim}
 covariate data:
  6: CG1 [Genetic group 1] freq. A
  7: CG1 [Genetic group 1] freq. B
  8: CG1 [Genetic group 2] freq. A
  9: CG1 [Genetic group 2] freq. B
  10: CG1 effect A,B
  11: CG1 effect B,B
  12: CG1 effect A,C
  13: CG1 effect B,C
  14: CG1 effect C,C
  15: CG1 size
  16: SX effect F
\end{verbatim}
After the covariate effects have been described, there will be a description
of the QTL data blocks if any QTL are being fitted.  The number of such data
blocks depends on the number of QTL in the model at each iteration.  The
description in the header will look something like this:
\begin{verbatim}
QTL data blocks:
  linkage group
  [position if linked]
  freq. 2
  effect 1,2
  effect 2,2
  size
\end{verbatim}
The position field only appears if a QTL is linked.  If sex specific maps
Section \ref{sex_spec} are used then there will be 2 fields output for the
male and female map position.  The frequency, effect and size fields are the
same as for when a candidate gene is fitted (see above), with
the same changes occurring when genetic groups have been specified.

At the end of the header there are several lines describing the values for
tau\_mode and tau\_beta (Section \ref{prior1}), total number of covariate
columns, whether the \verb+no_overdominant+ flag is set\ref{lab_nooverdom},
the LM ratio if it is $>0$ (Section \ref{set1}), the number of genetic groups (Sections
\ref{group1},\ref{group2}), and whether sex specific maps are being used (Section
\ref{sex_spec}.  In addition, if the grand mean and/or the residual variance
have been fixed, their values will be printed here.  The end of the header
is then always marked by a line of at least 2 `\verb+_+' characters \IE
\begin{verbatim}
Tau Mode: 0
Tau Beta: 376.924
No. covariate columns: 11
No. genetic groups: 1
--------------------
\end{verbatim}
After the header comes a series of output records, each record describing a
single iteration.  Not all iterations may have records, this is controlled
using the START OUTPUT and OUTPUT FREQUENCY commands \ref{control_run}.
An output record consists of a number of space separated fields where the
number of fields depends on (a) how many covariate effects have been fitted,
(b) how many trait loci are currently in the model and (c) which output
format is being used.  The format of each record is as described in the
header above.
\subsubsection{Output file \texttt{loki.pos}}
The file \verb+loki.pos+ has a compact format which is more suitable for
being read by a program than by hand.  It lists the positions of all QTLs at
every iteration (after the first output iteration has been reached as defined by the
START OUTPUT command \ref{control_run}).  Loki goes through each linkage
group in turn (including the special unlinked group 0), and prints out the
number of QTL in that group and the positions of each QTL (if not in group
0).  At the end of the line is a colon followed by a number; this is the
number of iterations where the information in the line did not change.  By
only reporting when a line changes the output can be greatly compressed.
A typical file would look something like this:
\begin{verbatim}
0 3:99
0 4:4
0 3 1 1 34.3557:4
0 3 1 1 35.3592:2
\end{verbatim}
This says that there were 3 unlinked QTLs for 99 iterations followed by 4
unlinked QTLs for 4 iterations.  One of those QTLs became linked to linkage
group 1 at position 34.3557 for 4 iterations, and then moved to position
35.3592 for 2 iterations.  There are 2 supplied scripts, \verb+count.pl+ and
\verb+dist.pl+, which interpret this file (Section \ref{scripts}).
\subsubsection{Output file \texttt{loki.mpos}}
\label{loki_mpos}
This file is produced if marker positions are allowed to vary using the
START POSITION command (Section \ref{loki_mpos1}).  The file format is very
simple.  The first line gives the markers whose positions are being samples,
and subsequent lines give the iteration number and the marker positions in
the same order as the first line.  If sex specific maps are being used then
two positions (female,male) will be output for each marker.
\subsubsection{Marker allele frequency file}
\label{lab.freq}
The OUTPUT FREQUENCY "freqfile" command will cause estimates of the marker
allele frequencies to be output at each output iteration to freqfile (see
Section \ref{lab.freq1}.  This file has a header which describes the markers
and alleles which are being output.  Following the header are a series of
lines, 1 per output iteration, which have the iteration number and allele
frequency realizations.  An example of the file follows:
\begin{verbatim}
Created by loki 2.3.0: Fri Sep 22 11:31:32 2000
0 D1S1597: 6 7 3
1 D1S1612: 5 7 9 6
No. genetic groups: 1
--------------------
100 0.477894 0.230591 0.291515 0.0398045 0.0172819 0.00242949 0.940484
104 0.478429 0.255159 0.266412 0.0346967 0.0140646 0.00211458 0.949124
\end{verbatim}
The line lengths in the file can obviously get very long if many markers are
being analyzed.  Processing of this script to produce allele frequency
estimates can be done using the perl script \verb+freq.pl+ (Sections
\ref{lab.freq2}.  Note that if there are multiple genetic groups there will
be multiple estimates for each allele.  The number of columns will therefore
be the total number of alleles multiplied by the number of genetic groups.
\subsection{IBD sharing analysis}\label{ibd_analysis}
Not documented - not in a stable state.

\subsection{Estimation of IBD relationship matrices}\label{ibd_estimation}
This analysis is designed to estimate the IBD relationship matrices for
variance component analysis programs (such as SOLAR).  IBD calculations are
time consuming and expensive, and for large pedigrees the only feasible way
is often to use MCMC methods.  \Loki will supply the IBD matrices in various
formats, to make it easier to use the output with other analysis programs.

The IBD relationship between 2 individuals can be expressed as a vector of 9
probabilities giving the probabilities of the 9 condensed identity
states shown in figure \ref{fig:3}.

\begin{figure}
\begin{center}
\psfig{file=docfig1.eps,width=4in}
\end{center}
\caption{\label{fig:3}
The nine condensed identity states for 2 individuals}
\end{figure}

Let $\Delta_i=p(S_i)$, the probability of identity state $i$, and
$\boldsymbol{\Delta}$ be the vector of identity state probabilities. To save
space, \Loki does not store estimates of all 9 possible states for each
relative pair.  Instead it calculates two functions of
$\boldsymbol{\Delta}$, twice the kinship coefficient
$2\Phi=2(\Delta_1+\frac{1}{2}(\Delta_3+\Delta_5+\Delta_7)+\frac{1}{4}\Delta_8)$
\ and $\Delta_7+\Delta_1$, as these are the quantities required by most
variance component programs fitting additive and dominance variance
components. Note that these quantities are estimated correctly, even in the
case of inbreeding, but (It is not clear, however, if the
$\Delta_7+\Delta_1$ value is appropriate for the dominance component
in the case of inbreeding, but that is another question{\ldots})

If IBD estimates are requested at many positions (for example, every
centimorgan along a chromosome), the accumulation of the IBD sharing
statistics can account for a significant proportion of the running time.  It
can therefore be advantageous to request \Loki to only collect the sharing
statistics every $n$ iterations.  This has the advantage that the samples
that \textit{are}\ collected are less correlated than if collection is
performed at every iteration.  Obviously if IBD sharing is estimated at only
a few positions (or if unlinked markers are being used), the optimum
strategy may still be to collect estimates at every iteration.  As with
the normal \Loki analyses, the starting iteration and collection frequency
are specified using the START OUTPUT and OUTPUT FREQUENCY commands.

\textbf{N.B.}\ In most cases, pedigree pruning should be turned \emph{off}
in \Prep when IBD estimation is performed (\ref{set_command}).  If not then
individuals may be missing from the output matrices.  Pruning, though useful
in most other cases, should only be left on with IBD estimation if you know
exactly what you are doing!

\subsubsection{Output options}
In contrast to earlier versions of \Loki, only one set of estimates of the IBD
matrices are calculated (at the end of the sampling run).  If \Loki is
interrupted (by the receipt of a signal), it will generate a set of
estimates before quitting.  Note that this can take some time (as the files
can be very large), and it may not always be possible for \Loki to complete
this successfully in time (for example, if the host computer is shutting
down).

Three output formats are currently offered, the default \Loki format, a
MERLIN type format and a SOLAR type format.  The \Loki and MERLIN outputs
are written to a single file (default name \verb+loki.ibd+), which the SOLAR
output is written to a set of files, one file per position
estimated.  These files are written to a directory (default name
\verb+loki_ibd+).  Within this directory, the file names obey the SOLAR
naming scheme, so the file for position 13.4 on chromosome 11 will be called
\verb+mibd.11.13.4.gz+ (if compressed).  This naming scheme can not be
changed.  Note that to properly follow the SOLAR scheme, the chromosome
names should be numeric - this is not enforced by \Loki.

The default file names can be changed using the following commands:
\begin{verbatim}
OUTPUT IBD FILE "myibdfile" # For MERLIN format
OUTPUT IBD DIR "myibd_dir" # For SOLAR format
\end{verbatim}

To reduce the size of the output files, \Loki can compress the output (for
all formats) using gzip (highly recommended!).  This is
done using the following command:
\begin{verbatim}
COMPRESS IBD OUTPUT
COMPRESS OUTPUT IBD # Alternate command format
\end{verbatim}
When this option is used, \verb+.gz+ will be appended to the output file
name, unless the specified name already ends in \verb+.gz+.

\subsubsection{Output Formats}
The output format for the IBD estimates is selected as in the following
examples:
\begin{verbatim}
OUTPUT IBD LOKI # Default output format
OUTPUT IBD MERLIN
OUTPUT IBD SOLAR
\end{verbatim}

The main output to the screen consists only of the iterations count.  The
other output files (notable \verb+loki.out+) are not produced.  The trait
locus code is not invoked at all.

\subsubsection{Default (\Loki) format}
An example fragment of the default output format is shown below:
\begin{verbatim}
Iteration 200

**Linkage group chromosome 1:

**Position = 0
1 1 1 1
2 2 1 1
4 1 0.5 0
4 2 0.5 0
4 4 1 1
5 1 0.5 0
5 2 0.5 0
5 4 0.3975 0.15
5 5 1 1
6 6 1 1
7 1 0.5 0
7 2 0.5 0
7 4 0.485 0.23
7 5 0.4425 0.21
7 7 1 1
\end{verbatim}
As can be seen, the output starts with a small header giving the iteration
number, linkage group and position for which the estimates are for.  This is
then followed by $2\hat{\Phi}$ and $\hat{\Delta_7}$ for each pair of
individuals for whom $\hat{\Phi}>0$.

Note that if family ids are used (so the second form
of the PEDIGREE command is used with \Prep), then there will be an extra
column with the family name before the individual id columns.  

\subsubsection{MERLIN format}
An example fragment MERLIN output format is shown below:
\begin{verbatim}
1 1 0 0 0 1
2 1 0 1 0 0
2 2 0 0 0 1
4 1 0 0 1 0
4 2 0 0 1 0
4 4 0 0 0 1
5 1 0 0 1 0
5 2 0 0 1 0
5 4 0 0.24 0.485 0.275
5 5 0 0 0 1
6 1 0 1 0 0
6 2 0 1 0 0
6 4 0 1 0 0
6 5 0 1 0 0
6 6 0 0 0 1
7 1 0 0 1 0
7 2 0 0 1 0
7 4 0 0.165 0.565 0.27
7 5 0 0.265 0.48 0.255
\end{verbatim}
This format has no header, and the position is given in the $3_{rd}$ column
of the output.  If multiple families are present then the estimates for the
first family at each position are given, then the second family etc.  Unlike
the default format, an output line exists for every pair (even unrelated
pairs).  This can make the output much larger than the default output (even
if compressed).  Three columns of IBD estimates are given, which in the case
of no-inbreeding correspond to $\Delta_9, \Delta_8, \Delta_7$, the
probabilities of the pair sharing 0, 1 and 2 genes IBD.  In the case of
inbreeding, this scheme can not accurately portray the IBD relationships
(and MERLIN as of version 0.9.3 does not try to calculate sharing
probabilities correctly with inbreeding).  In the presence of inbreeding,
\Loki will output values such that the last column ($\Delta_7$) is correct,
and the other columns are adjusted so that if $\Phi$ is calculated using the
formula for non-inbred populations
($\Phi=\frac{1}{2}\Delta_7+\frac{1}{4}\Delta_8$), the correct value for
$\Phi$ (allowing for inbreeding) will be given.  Yes, this is confusing and
no, I don't recommend using this format for inbred families.

As before, if family IDs are used, an extra column of output with the
family name will be outputted.

\subsubsection{SOLAR format}
An example fragment of SOLAR output format is shown below:
\begin{verbatim}
  640   640  1.0000000  1.0000000
  641   641  1.0000000  1.0000000
  773   640  0.5000000  0.0000000
  773   641  0.5000000  0.0000000
  773   773  1.0000000  1.0000000
  643   643  1.0000000  1.0000000
  642   642  1.0000000  1.0000000
  774   643  0.5000000  0.0000000
  774   642  0.5000000  0.0000000
  774   774  1.0000000  1.0000000
  908   640  0.5000000  0.0000000
  908   641  0.0000000  0.0000000
\end{verbatim}
This format has no header, and the position is given in the output filename.
The first 2 columns are the SOLAR IBD IDs, or internal IDs for the pair,
followed by $2\Phi$\ and $\Delta_7$, as with the default format.  The IBD
estimates are always printed to the same precision.  To know the
correspondence between the original IDs and the SOLAR IBD IDs, \Loki needs to
read in two output files from SOLAR, \verb+pedindex.cde+\ and
\verb+pedindex.out+.  These should be put in the current directory.  With
these, \Loki will be able to automatically translate the IDs for SOLAR.
Output lines are only given for related individuals, so the files size tends
to be similar to the default output, and smaller than MERLIN output.
\section{Analysis Scripts and Utility Programs}
\label{scripts}
It is not straightforward to look directly at the output from Loki and make
conclusions about whether there is linkage to a region, how many QTLs are in
the model, what size of QTLs have been found etc.  For this reason, several
perl scripts and small utility programs in C have been supplied with Loki.
These can be used to read the output files from Loki, and produce
either a text report, a selection of columns of data or a graphical view of
the data.  The graphical view of the data requires that \verb+gnuplot+ be installed
on the system.  This is a widely available data plotting utility which
can be downloaded from ftp.dartmouth.edu/pub/gnuplot.  A web page for
\verb+gnuplot+ can be found at http://www.cs.dartmouth.edu/gnuplot.
\subsection{Extracting information on effects: \texttt{loki\_ext.pl}}
The perl script \verb+loki_ext.pl+ can be used to extract information from
the \verb+loki.out+ file such that it can be used by other programs and/or
scripts.  Probably the two most useful applications of this script are to
extract information about QTLs and to extract information about
covariate effects.  To do the former, the script is invoked like this:
\begin{verbatim}
loki_ext.pl -x
loki_ext.pl -x -c2
loki_ext.pl -x -c2 -r20.3:25.7
loki_ext.pl -x -c2 -r20.3
loki_ext.pl -x -c2 -r:25.7
loki_ext.pl -x -i2000
loki_ext.pl -x -i2000:5000
loki_ext.pl -x -i:5000
\end{verbatim}
The -x option specifies an extract operation.  The first example extracts
all QTLs linked to the first linkage group (this is the default) and outputs
the relevant columns to stdout.  The -c option in the second and third
examples indicate that QTLs linked to the second chromosome should be
extracted.  The -r option specifies a range of chromosome positions to look
for QTLs.  Note that either or both of a starting and stopping location can
be given.  The -i option specifies a range of iterations to consider.  As
with the -r option, either or both of a starting and stopping
iteration may be given.

The number of columns output depends on whether the specified linkage group
is 0 (meaning the `unlinked' group) in which case no QTL position
information is output, and whether sex specific maps are being used, in
which case both male and female map positions are output.  For a normal
chromosome with a sex averaged map, the output order is: iteration, map
position, frequency of `1' allele,effect of `12' genotype, effect of `22'
genotype, size of QTL, \% genetic variance due to QTL, \% total variance due
to QTL.  Note that the total variance used to calculate the last column is
the variance after adjusting for any environmental covariates.

To extract information about covariate effects the script is invoked as given below:
\begin{verbatim}
loki_ext.pl -C
loki_ext.pl -D
\end{verbatim}
Using the -C option, loki\_ext simply strips of any QTL columns and the
header from \verb+loki.out+, and outputs the remainder.  The column order is
therefore as given in the header to \verb+loki.out+, and will change
depending on which version of Loki and which output type is being used.
Using the -D option will give a `standardized' set of columns which should
be consistent across different versions and output types.  As before, the
-i option can be used to specify a range of iterations.  The output format
for the -D option is as follows:
\begin{enumerate}
  \item Iteration
  \item No. QTL in model
  \item No. linked QTL. in model
  \item Grand mean
  \item Residual variance
  \item Tau
  \item Total genetic variance
  \item{\ldots}Remainder of non QTL data columns
\end{enumerate}
The output from \verb+loki_ext.pl+ can either be directed to a file and further
analyzed or viewed by programs such as \verb+gnuplot+.  Alternatively the output
can be piped into a program such as \verb+qavg+ to give means and confidence
limits for the columns.
\subsection{Estimating probabilities of linkage to chromosomes: \texttt{count.pl}}
\label{ref_count}
The perl script \verb+count.pl+ reads the files \verb+loki.out+ and
\verb+loki.pos+, because the latter file is not produced by versions of loki
prior to loki\_2.3, this scripts only works with output produced by
loki\_2.3 or later.  The file is invoked as follows:
\begin{verbatim}
count.pl [-q] [-f outfile] [-p posfile] [-i start:stop] [-b bin-size] [-c step]
\end{verbatim}
where the -f and -p options allow the specification of
alternative input files rather than the defaults of \verb+loki.out+ and
\verb+loki.pos+.  By default the script displays on the screen the iteration count it has
reached, this can be turned off using the -q (for quiet) option.
As with \verb+loki_ext.pl+, a range of iterations can be
specified using the -i option.  Note that unlike \verb+loki_ext.pl+, the
iteration count is from the first iteration in the file, which may not be
the first sampling iteration.  The output of \verb+count.pl+ looks something
like this:
\begin{verbatim}
----------------------------------------------
Model: Q(1) = QTL + AGE_AT_EXAM + SX
Iterations: 9470

Linkage group    Count   Prop. linked    BF
----------------------------------------------
unlinked          9470     1.00000     1.00116
chrom16            173     0.01827     0.12015   132.23   24.53cM
chrom19           5022     0.53031     4.09343     1.02   90.32cM

QTL number       0      1      2      3      4      5      6
--------------------------------------------------------------
Overall       0.0000 0.0024 0.1122 0.4181 0.3219 0.1094 0.0314
unlinked      0.0000 0.0720 0.2492 0.3918 0.2270 0.0492 0.0108
chrom16       0.9817 0.0177 0.0005 0.0000 0.0000 0.0000 0.0000
chrom19       0.4697 0.5087 0.0207 0.0010 0.0000 0.0000 0.0000
\end{verbatim}
The script produces information about the model used and the total number of
iterations measured (this will be the total iteration count less the
iterations discarded at the beginning).  Then follows a table giving for
each linkage group the number of iterations when that linkage group contains
a QTL, the same as a proportion, and an estimate of the Bayes Factor (BF)
for linkage to the chromosome.  This is calculated in the following way. For
each linkage group, the prior probability $p$ of linkage for any single QTL
is $m/t$, where $m$ is the map length of the chromosome and $t$ is the total
map length of the genome.  These values should be set in the parameter file,
and are printed in the header of \verb+loki.out+.  If sex specific maps are
used (Section \ref{sex_spec}), then $p$ is calculated as
$(m_f+m_m)/(t_f+t_m)$, where $m_f$ and $m_m$ are the male and female
chromosome map lengths and $t_f$ and $t_m$ are the male and female total
genome lengths.  For a given iteration, the prior probability that at least
1 QTL is linked to a chromosome is $1-(1-p)^n$, where $n$ is the number of
QTLs in the model at that iteration.  The posterior probability is either $1$
or $0$ depending on whether or not a QTL is linked.  The posterior/prior
ratio for each linkage group is then averaged over iterations to give the
Bayes Factor. An equivalent calculation is performed for the unlinked
region, where its map length is defined as the amount of total genome
remaining after subtraction of all fitted chromosomes.  After this table is
printed the distribution of QTL number in the model, and the same
information calculated for each linkage group separately.  The last two
columns (or last four columns if a sex specific map is being used) give the
maximum Bayes Factor obtained when each chromosome is split into small bins,
and the location of the bin having the maximum Bayes Factor.  The bin size
defaults to 1cM, but this can be altered using the -b option.  This calculation
of the Bayes Factors for the small regions is the same as performed by
\verb+dist.pl+ described below.

\subsection{Estimating probabilities of linkage to small regions: \texttt{dist.pl}}
The perl script \verb+dist.pl+ reads the same output files as
\verb+count.pl+ (Section \ref{ref_count}) and produces a table with the
estimated Bayes Factor for linkage at a series of locations along the fitted
chromosomes.  If \verb+gnuplot+ is installed, then the script can directly produce
graphs on the screen or produce postscript versions of the graphs. For much
of what is described below, it is assumed that \verb+gnuplot+ is installed and your
PATH is set correctly so it can be found. The calculation of the Bayes
Factor estimate is performed in the same way as described for
\verb+count.pl+, but instead of treating each chromosome in its entirety
the chromosome is split into equally sized bins with the Bayes Factor being
estimated separately for each bin.  The script has numerous options which
are described below.  If run without options it will produce a graph on the
screen showing the estimated Bayes Factors along each chromosome in turn
using a bin size of 1cM.  Available options are:
\begin{verbatim}
  -c n           only consider the nth linkage group
  -b n           set bin size to n (cM)
  -d name        use name as the base name for temporary files.  Don't delete
                 temporary files after use.
  -f file        select which output file to read from (defaults to loki.out)
  -p file        select which position file to read from (defaults to loki.pos)
  -o file        write postscript output to file
  -q             do not invoke gnuplot
  -C             produces color postscript plots (only has effect with -o)
  -i start:stop  select range of iterations
\end{verbatim}
The temporary files, which can be examined by use of the \verb+-d+ option
consist of a \verb+gnuplot+ control file and a data file.  If the script was
invoked as \verb+dist.pl -d junk+, the control file would be called
\verb+junk+ and the data file \verb+junk.dat+.  Typing \verb+gnuplot junk+
should reproduce the graphs.
\subsection{Estimating marker allele frequencies: \texttt{freq.pl}}
\label{lab.freq2}
This script takes the raw marker allele frequency realizations produced
using the command OUTPUT FREQUENCY "freqfile" (Sections \ref{lab.freq1} and
\ref{lab.freq}), and produces an allele frequency estimate for each allele.
Optionally it also calculates confidence intervals for the estimates.
Example usages follow:
\begin{verbatim}
freq.pl freqfile
freq.pl -p 0.025 freq.dat
\end{verbatim}
The second form would instruct the script to produce the 2.5\% and 97.5\%
confidence limits for the estimates.  The output from the script is fairly
self explanatory; the second form of the script produces output like this:
\begin{verbatim}
D1S1597
   3        0.2372 (0.2048-0.2759) 
   4        0.0029 (0.0004-0.0084) 
   5        0.0128 (0.0058-0.0219) 
   6        0.4530 (0.4121-0.4888) 
   7        0.2512 (0.2186-0.2829) 
   8        0.0429 (0.0291-0.0579) 
D1S1612
   2        0.0027 (0.0003-0.0080) 
   4        0.1368 (0.1138-0.1669) 
   5        0.3374 (0.3030-0.3754) 
   6        0.1248 (0.1018-0.1485) 
   7        0.2568 (0.2235-0.2904) 
   8        0.0156 (0.0076-0.0261) 
   9        0.1190 (0.0974-0.1436) 
   10       0.0027 (0.0004-0.0068) 
   11       0.0041 (0.0009-0.0087) 
\end{verbatim}
Each marker is presented in turn, followed by the alleles, frequency
estimate and confidence intervals if requested.
\subsection{Obtaining column averages - \texttt{qavg}}
This is a small utility program which will quickly calculate averages and
provide summary statistics about column data.  If invoked with a file name
it will read the file and print summary information about each column in the
file, otherwise it will read from standard input.  The basic information it
provides is the mean, standard deviation, no. records and range of each
column.  An example output follows:
\begin{verbatim}
Col 1 - Mean = 51000 SD = 28869.7 n = 20001 Range 1000 -> 101000
Col 2 - Mean = 1.05272 SD = 0.250372 n = 20001 Range 0.202457 -> 1.96742
Col 3 - Mean = 3.27452 SD = 0.369947 n = 20001 Range 2.11931 -> 4.73246
Col 4 - Mean = 2.59041 SD = 0.448593 n = 20001 Range 0.48428 -> 4.04446
Col 5 - Mean = 65.2765 SD = 1.38383 n = 20001 Range 56.5785 -> 72.829
Col 6 - Mean = -0.412581 SD = 0.183528 n = 20001 Range -1.13458 -> 0.364772
Col 7 - Mean = -0.317538 SD = 0.186775 n = 20001 Range -1.07009 -> 0.483818
Col 8 - Mean = -1.04013 SD = 0.178181 n = 20001 Range -1.68759 -> -0.240899
\end{verbatim}
If the data are log transformed and it is wished that the averaging is
performed on the original scale (for example, if the data are log
likelihoods), then this can be achieved using the -l option \IE 
\verb+qavg -l+.

In addition, \verb+qavg+ can give empirical confidence limits using the -p
flag; the command \verb+qavg -p0.025+ will give the 2.5\% and 97.5\% limits.
This option will also make the program output the first, second and third
quartiles (Q1, Q2, Q3), and an empirical `p-value'.  This latter is
calculated as follows.  If the mean of a column is $>0$, then the p-value is
the proportion of data items which are $<=0$, and visa versa when the mean
is $<0$.  This measure is intended as a quick aid to assessing the
significance of covariate estimates.  An example output using this option
follows:
\begin{verbatim}
Col 1 - Mean = 51000 SD = 28869.7 n = 20001 Range 1000 -> 101000
     Q1 = 26000, Q2 = 51000, Q3 = 76000, 2.5% Lim = 3500 -> 98500, p = 0
Col 2 - Mean = 1.05272 SD = 0.250372 n = 20001 Range 0.202457 -> 1.96742
     Q1 = 0.882824, Q2 = 1.06423, Q3 = 1.22872, 2.5% Lim = 0.53308 -> 1.51554, p = 0
Col 3 - Mean = 3.27452 SD = 0.369947 n = 20001 Range 2.11931 -> 4.73246
     Q1 = 3.0134, Q2 = 3.25833, Q3 = 3.52037, 2.5% Lim = 2.59481 -> 4.03333, p = 0
Col 4 - Mean = 2.59041 SD = 0.448593 n = 20001 Range 0.48428 -> 4.04446
     Q1 = 2.34131, Q2 = 2.62509, Q3 = 2.88829, 2.5% Lim = 1.60588 -> 3.38831, p = 0
Col 5 - Mean = 65.2765 SD = 1.38383 n = 20001 Range 56.5785 -> 72.829
     Q1 = 65.0735, Q2 = 65.3966, Q3 = 65.6771, 2.5% Lim = 61.2552 -> 68.02, p = 0
Col 6 - Mean = -0.412581 SD = 0.183528 n = 20001 Range -1.13458 -> 0.364772
     Q1 = -0.537258, Q2 = -0.411998, Q3 = -0.29009, 2.5% Lim = -0.775722 -> -0.0502515, p = 0.0121494
Col 7 - Mean = -0.317538 SD = 0.186775 n = 20001 Range -1.07009 -> 0.483818
     Q1 = -0.443942, Q2 = -0.318493, Q3 = -0.190569, 2.5% Lim = -0.677856 -> 0.0481646, p = 0.0456977
Col 8 - Mean = -1.04013 SD = 0.178181 n = 20001 Range -1.68759 -> -0.240899
     Q1 = -1.16117, Q2 = -1.03919, Q3 = -0.919638, 2.5% Lim = -1.38943 -> -0.691617, p = 0
\end{verbatim}
The use of this option requires the program to hold all of the data in
memory and sort it which adds to the computational requirements of the
script, so it should not be used on very large datafiles.
\section{Acknowledgements}
The author is grateful to Warwick Daw for testing versions and for
helpful discussion and comments.  This work was previously supported by NIH Grant
GM46255 and is currently supported by NIH grant GM58757.
\bibliography{master}
\bibliographystyle{ajhg}
\end{document}