File: menu.html

package info (click to toggle)
lamarc 2.1.10.1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 77,052 kB
  • sloc: cpp: 112,339; xml: 16,769; sh: 3,528; makefile: 1,219; python: 420; perl: 260; ansic: 40
file content (1182 lines) | stat: -rw-r--r-- 66,961 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
<!-- header fragment for html documentation -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>

<META NAME="description" CONTENT="Estimation of population parameters using
genetic data using a maximum likelihood approach with Metropolis-Hastings
Monte Carlo Markov chain importance sampling">  
<META NAME="keywords" CONTENT="MCMC, Markov chain, Monte Carlo,
Metropolis-Hastings, populat ion, parameters, migration rate, population
size, recombination rate, maximum likelihood">

<TITLE>LAMARC Documentation: Menu</title>
</HEAD>


<BODY BGCOLOR="#FFFFFF"> <!-- coalescent, coalescence, Markov chain Monte
Carlo simulation, migration rate, effective population size, recombination
rate, maximum likelihood -->


<P>(<A HREF="xmlinput.html">Previous</A> | <A
HREF="index.html">Contents</A> | <A HREF="regions.html">Next</A>)</P>
<H2>The Interactive LAMARC menu system</H2>
<H3>Introduction</H3>
<p>
LAMARC's user interface is fairly awkward to use. This is because LAMARC is mainly a batch program and this interface reflects the way LAMARC does things internally, not the way users think about things. LAMARC is designed to run off XML input. What this interface does is read in and appropriately decorate the output of the <A HREF="converter.html">File Conversion Utilities</A> and allow users to tweak existing XML (often the output of this interface) to do different analyses than it was originally created to do. Note that there is almost no overlap between what can be edited in this interface and what can be edited in the File Converter. Our long term goal is to combine this interface with the File Converter and make LAMARC a purely batch program (which has many advantages such as more effective use of parallel computing).
</p>
<p>This interface reflects how LAMARC is organized internally, which is not necessarily obvious. We recommend that you take a tour through all the menus before you do anything to get a sense of where things are. For example, &quot;Migration&quot; is found under &quot;Analysis&quot;, which may seem odd until you realize that if &quot;Migration&quot; is on, the &quot;Analysis&quot; of the data changes. There are a lot of other examples of seemingly normal terms meaning slightly different things in the LAMARC context. A tour of the interface will give you an overview.
</p>

<H3><A NAME="conventions">General Conventions</A></H3>
<p>
LAMARC has essentially a command line interface, which is fairly uncommon in the modern era. So here are some general conventions that will help understand it:
<UL>
<LI>
When the menu redisplays, the old screen just scrolls up, so make sure you stay at the bottom of window you are working in or it will get confusing.
</LI>
<LI>
Each line you can interact with has a single character at the left side (for example <b>J</b>), some text to explain what that line is about, and the current value on the right side. In order to change the value on that line, enter that character <b>J</b> at the bottom of the screen. 
</LI>
<LI>
If there are multiple similar lines that can be edited individually, for example rows of a Migration Matrix, they will have numbers in the left column rather than letters.
</LI>
<LI>
Case does not matter. You can enter <b>J</b> or <b>j</b> in the above example and get the same effect.
</LI>
<LI>
Booleans (Yes/No or True/False items) are toggled by entering the character listed at the left. So if an item with an <b>A</b> on the left is "Yes" and you want "No", enter <b>A</b>, the screen will redisplay and <b>A</b> will now be "No".
</LI>
<LI>
Things can come and go in menus in logical, but not necessarily obvious ways. For example, in Forces, if you only have one population. Migration will not appear, because it cannot happen. This can get a bit confusing because when you change something on one screen it can cause something on another screen to appear or disappear. Until you get used to your analysis, it's wise to review everything using the "Overview" pages when you think you are ready to start a run, just to make sure nothing you have done had unexpected side-effects.
</LI>
</UL>
</p>

<H3>Start Up</H3>
<p>When you start up LAMARC the first thing you will be asked is what your output directory is. You will then be asked for your input file. This may seem a bit backwards, since usually you would want to read data in before putting it out, but reflects the internal workings of LAMARC. If you don't specify an input file, LAMARC will follow the <A HREF="http://evolution.genetics.washington.edu/phylip/">Phylip</A> conventions and look in the output directory for a file called &quot;infile&quot;.
</p>
<p>
The data in the input file defines the kinds of analyses which are possible.  If you don't see the kind of analysis you wish to do listed on the  &quot;Analysis&quot; menu, you will need to modify your input file so that kind of analysis is possible. For example, if you wish to study migration, you need at least two populations. If  &quot;Migration&quot; is not an option in &quot;Analysis&quot; you only have one population defined in your input file. You will need to fix that, either using the <A HREF="converter.html">file converter</A> or editing the XML directly, before LAMARC can analyze migration. 
</p> 
<p> Once the data have been located and processed (which may take several seconds), the first screen you see upon starting LAMARC is the top level menu: </p>
<p><img src="images/LamarcMainScreen.png" alt="LAMARC main screen"/></p>


<P> The menu may appear in a different form depending on your computer
system, but the basic ideas are always the same. You can now review and set values in the following areas:

<UL>
<LI><A HREF="menu.html#data">Data options</A></LI>
<LI><A HREF="menu.html#analysis">Analysis methods</A></LI>
<LI><A HREF="menu.html#search">Search Strategy menu</A></LI>
<LI><A HREF="menu.html#io">Input and Output related tasks</A></LI>
<LI><A HREF="menu.html#current">Overview of current settings</A></LI>
</UL>

<P>On all LAMARC menus, the bottom line will give two options: 
<UL>
<LI><b>Run</b> the program ('.')</LI>
<LI><b>Quit</b> ('q')</LI>
</UL>
</P>
<P> If you are viewing a
sub-menu, you will also have the option to: 
<UL>
<LI><b>Go Up</b> to a previous menu('&lt;return&gt;')</LI>
</UL>
</P>
</P>If you have performed any changes from the initial
setup within a submenu there will be the:
<UL>
<LI><b>Undo</b> option ('-') which will undo your last change</LI>
</UL>
</P>  If you have performed any Undo operations, there will be the:
<UL>
<LI><b>Redo</b> option ('+') which will redo your last change</LI>
</UL>
</P> Any time that you may create a new valid LAMARC input file based on the current menu
settings there will be the:
<UL>
<LI><b>Create</b> option ('&gt;').</LI>
</UL>
</P>

<P> <B>Warning 1:</B>  LAMARC's search defaults are fairly small.  This
allows a first-time user to get visible results right away, but for serious
publishable analysis you will almost surely want to increase some of the
settings in the <A HREF="menu.html#search">"Search Strategy menu"</A>.</P>

<P> <B>Warning 2:</B>  Once you have selected "Run" you will have no further
interaction with the program; you'll have to kill and restart it in order
to change its behavior.  However, Lamarc does save a modified version
of its input file, updated with any changes made via the menu, into
file "menusettings_infile.xml" when you exit the menu.  If you want to re-run LAMARC
starting with the same data and menu options as you last selected, choose
"menusettings_infile.xml" as your input file when restarting LAMARC.  </P> 


<hr>
<H3><A NAME="data">Data options</A></H3>
<p><img src="images/LamarcDataScreen.png" alt="LAMARC data screen"/></p>

<P> This menu allows you to define what your data is and how you want to model it.</P>

<P> The first two items (<b>C</b> and <b>S</b>) define the source of the random number seed used to start the analysis. Normally the seed is set
from the system clock so it is default set "<b>Yes</b>". To toggle it off and use the explicit seed type <b>C</b>. 
</P>

<P> A very few systems lack a system clock; on those you will need to set
this value by hand (either here or in the input file).</P>

<P>The explicit random seed is used if you wish to do exactly the same analysis
twice. You can hand-set the seed by entering <b>S</b>. You will be queried for the number to be used.
LAMARC will then find the closest integer of the form 4N+1 to the number
you entered. 

<P> The <b>E</b> option (Effective population size menu) will only appear if you have data from multiple regions. It provides a way to combine
data collected from different regions of the genome that have unique
effective population sizes.  For example, data
from nuclear chromosomes of a diploid organism reflect an effective
population size four times larger than
data from the mitochondrion of the same organism.  Data from sex
chromosomes also have unique effective population sizes--the relative
effective population size ratios for a non-sex chromosome to an X
chromosome to a Y chromosome is 4:3:1.  Selecting <b>E</b> takes you to a sub-menu
where you can select a particular genomic region and set its effective population
size.</P>

<P> The next set of menus allow you to modify the data-analysis model for each segment of your data. You can either modify the model for each segment
individually (<b>1</b>, ...), or you can modify a default model for the different types of
data LAMARC found in your infile.  If you have DNA or RNA or SNP data, <b>N</b> allows you to edit the default data model for all Nucleotide data. If you
have microsatellite data, <b>M</b> allows you to edit that data's default
model. If you have K-Allele data, <b>K</b> allows you to edit that
data's default model.  To assign the appropriate default data model to all
segments, select <b>D</b>. 
</P>

<P> For nucleotide data, you can either choose the Felsenstein '84 model
(F84) or the General Time Reversible model (GTR).  Microsatellite data may
take the Brownian, Stepwise, K-Allele, or Mixed K-Allele/Stepwise models
(MixedKS).  K-Allele data may only take the K-Allele model.</P>

<H4>Options common to all <b>data model</b> submenus</H4>

<P> Several menu options are common to all evolutionary models; for
conciseness these are described here.</P>

<P> If you are editing the data model for a particular segment (and
not for a default), the first line displays the type of data found in that
segment, and you are given the option (<b>D</b>) of using the appropriate
default data model for that segment.  The <b>M</b> option (Data Model) allows you
to cycle through the possible data models appropriate for that data type.
</P>

<P> The next two menu lines (<b>C</b> and <b>A</b>) describe the current state of the
categories model of variation in mutation rates among sites.  LAMARC uses
the Hidden Markov Model of Felsenstein and Churchill (1996).  In this model,
you must determine how many rate categories you wish to assume, then
provide the relative mutation rates for each category and the probability
that a site will fall into that category.  However, you do not need to
specify which category each site actually falls into.  The program will sum
over all possibilities.</P>

<P> If you choose to have multiple categories, select <b>C</b> (Number of
Categories), which will take you to a sub-menu.  Here, you can change the
number of categories with the <b>N</b> option, then select particular
rate/probability pairs to change their values on a further sub-menu.  For
example, if you wish to model two categories with 80% of sites evolving at
a base rate and the remaining 20% evolving ten times faster, you would set
the number of categories to 2, then set one rate/probability pair to 1 and
.8, and the second rate/probability pair to 10 and .2.</P>

<P> Internally, the program will normalize the rates so that the mean rate
across categories, weighted by the category probabilities, is 1.0. </P>

<P> In data modeled with categories of mutation rates, the "mu" value (a
component of various forces such as theta and M) is the weighted average of
the individual mutation rates.  In the above example, if you determine that
mu is 0.028, you can solve the equation:</P>

<P><center>0.028 = (0.8 * 1x) + (0.2 * 10x)<br>
x = 0.028 / 2.8 = 0.01<br>
10x = 0.1</center></P>

<P>and thus determine the two individual mutation rates are 0.01 and 0.1</P>

<P> The program will slow down approximately in proportion to the number of
rate categories.  We do not recommend using more than five as the gain in
accuracy is unlikely to be worth the loss in speed.  Do not use two
categories with the same rate:  this accomplishes nothing and slows the
program down.</P>

<P> A category rate of zero is legal and can be useful to represent
invariant sites; however, at least one rate must be non-zero.</P>

<P> If you wish to use the popular gamma distribution to guide your rates,
use another program to calculate a set of categories that will approximate
a gamma distribution, then enter the corresponding rates and probabilities
manually into LAMARC.  There is currently no provision to 
infer gamma distributed rate variation within a single segment. For
gamma distributed mutation rate variation across independent regions,
see the <a href="#gamma">gamma parameter</a>
of the <a href="#analysis">analysis menu.</a></P>

<P> The <b>A</b> (Auto-Correlation) option provides an autocorrelation 
coefficient which controls the tendency of rates to "clump".  The
coefficient can be interpreted as the average length of a run of sites with
the same rate.  If you believe there is no clumping (each site has an
independent rate), set this coefficient to 1.  If you believe that, for
example, groups of about 100 sites tend to have the same rate, set it to
100.</P>

<P> While auto-correlation values may be set for any model, it is likely to
make sense biologically only in the case of contiguous DNA or RNA data. It
is not sensible to use it for widely separated SNPs or microsatellites.</P>

<P> After other model-specific options, the <b>R</b> (Relative mutation rate)
option provides a coefficient which controls the comparison of mutations
rate (mu) between segments and/or data types.  If, for example, you have
both microsatellite data and nuclear chromosome data in your file, and you
know that changes accrue in your microsatellite data ten times faster than
changes accrue in the DNA, you can use this option to set the relative mu
rate for the microsat segment(s) to be 10, and the relative mu rate for the
DNA segment(s) to be 1.  Overall estimates of parameters with mu in them (like
Theta) will be reported relative to the DNA values.  If you want overall
estimates reported relative to the microsat values, you can set the
microsat mu rate to 1 and the DNA mu rate to 0.1.

<H4>Model-specific menus:  nucleotide data</H4>

<H5>F84 model </H5>

<P> The Felsenstein '84 (F84) model is a fairly general nucleotide
evolutionary model, able to accommodate unequal nucleotide frequencies and 
unequal rates of transition and transversion. It has the flexibility to
mimic simpler models such as Kimura's (set all nucleotide frequencies to
0.25) and Jukes and Cantor's (set all nucleotide frequencies to 0.25 and
the transition/transversion ratio to 0.500001).</P>

<P> The <b>T</b> option (TT Ratio) allows you to set the ratio between transitions
(A/G, C/T) and transversions (all other mutations). If bases mutated
completely at random this ratio would be 0.5 (1:2).  If you want a
random-mutation model (corresponding to the model of Jukes and Cantor)
LAMARC will use 0.500001 instead of 0.5, due to a limitation of the
algorithm used in LAMARC that would otherwise divide by zero.</P>

<P> Programs such as <A HREF="http://paup.csit.fsu.edu/">PAUP*</A> can be
used to estimate the transition/transversion ratio from your data.  In
practice it probably does not need to be very exact.</P>

<P> The <b>B</b> option (Base Frequencies) will take you to a submenu where you
can either tell LAMARC to calculate the base frequencies directly from the
data (the <b>F</b> option), or enter the values for the relative base frequencies
yourself. Unless your sequences are very short, it is probably best to
calculate these frequencies from the data.  If a particular nucleotide does
not exist in your data, you may set its frequency to a very small non-zero
value (0.00001 is probably low enough).</P>

<H5> General Time-Reversible (GTR) model </H5>

<P> The GTR model is the most general tractable model for nucleotide 
data.  It allows both unequal nucleotide frequencies and unequal rates for
each pair of nucleotides.  The most practical way to use GTR in LAMARC is
to estimate its rates with another program, such as <A
HREF="http://paup.csit.fsu.edu/">PAUP*</A>.  LAMARC does not have any
facility to estimate the GTR rates itself, but requires them to be
provided.</P>

<P> It is wasteful to use GTR when a simpler model is adequate, since it
runs rather slowly.  PAUP* with 
<a href="http://darwin.uvigo.es/software/modeltest.html">MODELTEST</a> 
can be used to assess the adequacy of simpler models. </P>

<P> The <b>G</b> option (GTR rates) requests input of the six base-specific
mutational rates.  These are symmetrical rates before consideration of
nucleotide frequencies, and can be obtained from PAUP*.  PAUP* may provide
only the first 5 rates, in which case the [GT] rate is always 1.0.</P>

<P> The <b>B</b> option (Base Frequencies) allows you to set the base frequencies
of the four nucleotides, in the order A, C, G, and T.  
The "Base frequencies computed from data" option is not
given here, since the third-party program you use to determine GTR rates
will also give you base frequencies, and you should use those.</P>

<H5><A NAME="data-uncertainty">Modeling data uncertainty in F84 and GTR models</A></H5>

<P>Lamarc runs on nucleotide data now accommodate modeling data uncertainty.
This is option <b>P</b> in both the F84 and GTR models.
The per-base error rate gives the rate at which each single instance of a
nucleotide should be assumed to have been miss-called. A value of 0 indicates
that all were sequenced correctly. A value of 0.001 indicates one in one
thousand is incorrect.
The default value is 0.
This feature is in beta test as of December, 2009.
</P>


<H4>Model-specific menus:  microsatellite data</H4>

<P> Apart from the choice of which model to use, the only choices for
the microsatellite models, except for the MixedKS model,
are those common to all models: handling of rate differences among
markers, and normalization.  These are discussed above.  It is not
meaningful or useful to ask for autocorrelation when analyzing only a
single microsatellite marker per segment.  </P>

<H5> Stepwise model </H5>

<P> The stepwise microsatellite model assumes that microsatellite mutations
are always single-step changes, so that the larger observed differences
have been built up via successive single-step mutations.</P>

<H5> Brownian-motion model </H5>

<P>  The Brownian-motion microsatellite model is an approximation of the
stepwise model.  Rather than a discrete model of single mutational steps,
we use a continuous Brownian-motion function and then truncate it to the
nearest step.  This is much faster than the full stepwise model and returns
comparable results on fairly polymorphic data, but is less accurate on
nearly invariant data. </P>

<H5> K-Allele model </H5>

<P> This model assumes that only the alleles detected in the data exist,
and that  mutation from any such allele to any other is equally likely. 
The Jukes-Cantor DNA model, for example, is a K-Allele model for k=4. </P>

<H5> Mixed K-Allele/Stepwise model </H5>

<P> The Mixed K-Allele/Stepwise model (MixedKS) considers both
mutation to adjacent states (like the Stepwise model) and mutation
to arbitrary states (like the K-Allele model).  The relative
frequency of the two is expressed by the proportion constant percent_stepwise,
available as menu option <b>L</b>.  It indicates the proportion
of changes which are stepwise, so that percent_stepwise=0 is K-Allele and
percent_stepwise=1 is Stepwise.  An initial value can be
set here, and either used throughout the run, or optimized at
the end of every chain if the Optimize (<b>O</b>) option is set.  The
program finds the value of percent_stepwise that maximizes the likelihood
of the data on the final genealogy of each chain, using a
bisection approach.</P>

<H4>Model-specific menus:  K-Allele data</H4>

<P> The single model available for K-Allele data is the K-Allele model. 
"K-allele data" is defined as any genetic data collected as discrete units,
such as electrophoretic data or phenotypic data.  As for microsatellite
data, the K-allele model assumes equally likely one-step mutations from any
state to any other state.</P>


<hr>
<H3><A NAME="analysis"> Analysis </A></H3>
<p><img src="images/LamarcAnalysisScreen.png" alt="LAMARC analysis screen"/></p>

<P> The Analysis option leads to a submenu that will allow you to specify
the evolutionary forces you're going to infer, as well as the starting
values, constraints, and profiling options for each force's parameters. 
More or less options will appear here depending on your data. If there is
more than one population, you will have an <b>M</b> option fo estimating Migration parameters. Similarly, if you have more than one region in your data, you can turn on or
off estimation of varying mutational rates over regions (gamma), and if you
have trait data, you can set up your mapping analysis.</P>

<P> Each force currently in effect is marked as Enabled, and forces not in
effect are marked as Disabled. If you wish to add or remove a force, or to
change the parameters associated with a force, enter that force's
submenu.</P>

<P> One point to bear in mind is that
for nucleotide data the mutation rate mu is always expressed as mutation
per site, not mutation per locus as in many studies.  You may need to do a
conversion in order to compare your results with those from other
studies.</P>

<P> Each force is explained below, and following that is a description of
the various options available on the submenus:  constraints, profiling, and
Bayesian priors.  For more information on evolutionary forces, consult the
<A HREF="forces.html"> forces </A> documentation.</P>

<H4> <A NAME="theta">Theta (Effective Population Size): the "Coalescence"
force </A></H4>

<P> Coalescence is obligatory on all data sets, so there is no provision
for turning it off.</P>

<P> The Theta submenu allows you to customize estimation of Theta, the
number of heritable units in the population times the neutral mutation rate
times two.  This is 4N<sub>e</sub>mu for ordinary diploid data, 
N<sub>e</sub>mu for mitochondrial data, and so forth. </P>

<P> Starting values of Theta cannot be less than or equal to zero.  They 
should not be tiny (less than 0.0001), because the program will take a long 
time to move away from a tiny starting value and explore larger values.</P>

<P> This program provides Watterson and FST estimates for use as starting
values.  It should never be quoted as a correct calculation of
Watterson or FST, because if it finds the result unsatisfactory as a
starting value, it will substitute a default.</P>

<P> The <b>G</b> option allows you to hand-set all of the Thetas to the same initial 
value.  The <b>W</b> option allows you to set all of them to the Watterson value. 
(This will cause re-computation of the Watterson value, and can take
several seconds with a large data set.)  The <b>F</b> option allows you to set all
of them to the FST value.  You can then fine-tune by using the numbered
options to hand-set the starting Thetas of individual populations.  The FST
option is only available if there is more than one population.</P>

<H4> <A NAME="growth">Growth parameters:  the "Growth" force  </A></H4>

<P> This submenu allows you to turn on and off estimation of population
growth rates, and to set starting parameters.  </P>

<P> If there is a single population in your data, Lamarc will estimate a
single growth rate for it.  If there are multiple populations, Lamarc will
estimate one independent growth rate per population.</P>

<P> If we label growth as <i>g</i>, then the relationship between Theta 
at a time <i>t</i> > 0 in the past and Theta at the present day (<i>t</i> = 0) 
is:</P>

    <center>Theta<sub><i>t</i></sub> = Theta<sub>present day</sub> e<sup>-<i>gt</i></sup></center>

<p>This means that a positive value of <i>g</i>
represents a growing population, and a negative value, a shrinking one. </P>

<P> Time is measured in units of mutations (i.e., 1 <i>t</i> is the average
number of generations it takes one site to accumulate one mutation), and
<i>g</i> is measured in the inverse units of time.  If mu is known, divide
generations by mu to get units of <i>t</i>, or conversely, multiply
<i>t</i>*mu to get a number of generations.</P>

<P> Starting parameter input for growth is similar to that for Theta, 
except that no quick pairwise calculators are available; you will have to 
either accept default values or enter values of your own.  Avoid highly
negative values (less than -10) as these have some risk of producing
infinitely long trees which must then be rejected.</P>

<H4> <A NAME="migration">Migration parameters and model:  the "Migration"
force </A></H4>

<P> This submenu allows you to customize estimation of the migration rates
among your populations.  The rates are reported as <i>M</i> = <i>m</i>/mu,
where <i>m</i> is the immigration rate per generation and mu is the neutral
mutation rate per site per generation.  Note that many other programs
compute 4<i>N<sub>e</sub>m</i> instead; be sure to convert units before
making comparisons with such results.</P>

<P> You do not have the option to turn migration on and off; if there is
only one population migration must be off, and if there is more than one
population then migration must be on.  (Otherwise there is no way for the
genealogy to connect to a common ancestor.) </P>

<P> The main choice for migration is how to set the starting values for 
the migration parameters.  You can use an <i>F<sub>ST</sub></i>-based
estimator or hand-set the values, either hand-setting all to the same
value, or setting each one individually. </P>

<P> The <i>F<sub>ST</sub></i> estimator does not always return a sensible 
result (for example, if there is more within-population than
between-population variability), and in those cases we substitute an
arbitrary default value. If you see strange <i>F<sub>ST</sub></i> 
estimates you may wish to hand-set those values.  Please do not quote
LAMARC as a source of reliable  <i>F<sub>ST</sub></i> estimates, since we
do not indicate which have been replaced by defaults.</P>

<P> The final menu entry sets the maximum number of migrations allowed in a
genealogy.  An otherwise normal run may occasionally propose a genealogy
with a huge number of migrations.  This could exhaust computer memory; in
any case it would slow the analysis down horribly. Therefore, we provide a
way to limit the maximum number of migrations.  This limit should be set
high enough that it disallows only a small proportion of genealogies, or
it will produce a downward bias in the estimate of <i>M</i>.</P>

<P> If you find that you are sometimes running out of memory late in a
program run that involves migration, try setting this limit a bit lower. 
If you find, on examining your runtime reports, that a huge number of
genealogies are being dropped due to excessive events, set it a bit
higher.  (The "runtime reports" are the messages displayed on the screen 
while the Markov chains are evolving; a copy of these messages is provided 
at the end of each output file.)  You may also want to try lower starting 
values if many genealogies are dropped in the early chains.</P>

<H4> <A NAME="recombination">Recombination parameter:  the "Recombination"
force </A></H4>

<P> This submenu allows you to customize estimation of the recombination
rate parameter <i>r</i>, defined as <i>C</i>/mu where <i>C</i> is the 
recombination rate per
site per generation and mu is the neutral mutation rate per site per
generation.  We do not currently allow segment-specific or
population-specific recombination rates; only one value of <i>r</i> will be
estimated.</P>

<P> The first menu line allows you to turn recombination estimation on and
off.  Estimating recombination slows the program down a great deal, but if
recombination is actually occurring in your data, allowing inference of
recombination will not only tell you about recombination, but may improve
inference of all other parameters.</P>

<P> You cannot estimate recombination rate if there is only one site, and
in practice you cannot estimate it unless there is some variation in your
data--at least two variable sites.  Your estimate will be very poor unless
there are many variable sites.</P>

<P> The <b>S</b> option allows you to set a starting value of <i>r</i>. No
pre-calculated value is available, so your choices are to set it yourself
or accept an arbitrary default.</P>

<P> Starting values of <i>r</i> should be greater than zero.  If you do not
want to infer recombination, turn the recombination force off completely
instead.  If you believe that <i>r</i> is zero, but wish to infer it to
test this belief, start with a small non-zero value such as 0.01. It is
unwise to set the starting value of <i>r</i> greater than 1.0, because the
program will probably bog down under huge numbers of recombinations as a
result. A rate of 1 would indicate that recombination is as frequent as
mutation, and this state of affairs cannot generally be distinguished from
complete lack of linkage.</P>

<P> The <b>M</b> option sets the maximum number of recombinations allowed
in a genealogy.  An otherwise normal run may occasionally propose a
genealogy with a huge number of recombinations.  This could exhaust
computer memory; in any case it would slow the analysis down horribly.
Therefore, we provide a way to limit the maximum number of recombinations. 
This limit should be set high enough that it disallows only a small
proportion of genealogies, or it will produce a downward bias in the
estimate of <i>r</i>.</P>

<P> If you find that you are sometimes running out of memory late in a
program run that involves recombination, try setting this limit a bit
lower.  If you find, on examining your runtime reports, that many
genealogies are being dropped due to excessive events, set it a bit
higher.  (You may also want to try lower starting values if many
genealogies are dropped in the early chains.)</P>

<H4> <A NAME="gamma">Gamma parameter: allowing the background mutation rate
to vary over regions</A></H4>

<P> If you suspect that the mutation rate varies between your genomic
regions, but do not know the specifics of how exactly it varies, you can
turn on estimation of this force to allow for gamma-distributed rate
variation.  The shape parameter of the gamma ('alpha') can be estimated, or you
can set it to a value you believe to be reasonable.  While the gamma
function is a convenient way to allow for different types of variation, it
is unlikely that the true variation is drawn from an actual gamma
distribution, so the important thing here is mainly that you allow mutation
rates to
vary, not necessarily which particular value is estimated for the shape
parameter.  For more information, see the section, <A
HREF="gamma.html">"Combining data with different mutation rates"</a>.</P>

<H4> <A NAME="trait"> Trait Data analysis</A></H4>

<P> This section provides the capability to map the location of a
measured trait within one of your genomic regions.  You will need to have
provided trait data in your input file.  For more details about trait
mapping, see the <A HREF="mapping.html">mapping documentation.</A> <P>

<P> The Trait Data analysis menu will show you all of the traits which
you have provided data for and can attempt to map, with an indication
of which genomic region each is in.  To modify the model for a trait,
choose that trait by number; you will be taken to a specific menu
for mapping that trait.  It will start by reminding you of the trait
name, and then show the type of analysis you are using.  The two
kinds of mapping analysis are discussed in more detail in <A HREF="mapping.html">
"Trait Mapping."</A>  As a brief reminder, a "float" analysis 
collects the trees without use of the trait data, and then finds the
best fit of trait data to trees after the fact.  A "jump" analysis
assigns a trial location to the trait and then allows it to be reconsidered
as the run progresses.</P>

<P> In this menu, you can also restrict the range of sites which you
are considering as potential locations for your trait.  For example,
you may be quite sure that the trait is not located in the first 100
sites, but you still wish to analyze them because they may add useful
information about Theta and recombination rate.  You can remove the
range 1-100 from consideration using the <b>R</b> option.  You can also
add sites to consideration using the <b>A</b> option:  for example, if you
know that your trait is either in sites 1-100 or 512-750, one approach
is to remove all sites, then add these two ranges specifically.</P>

<P> If you have turned on a "jump" analysis, the necessary rearrangement
strategies will appear in the Strategy:Rearrangement submenu.  You may
wish to inspect them and make sure that you like the proportion of
effort used for each strategy.</P>

<H4> <A NAME="divergence">Divergence parameters and model:  the "Divergence"
force </A></H4>
<p>
The only value that can be edited in Divergence is 
Epoch Boundary Time (scaled by the mutation rate) of each Divergence event.  You can
set starting values and priors for these as usual.  There are no
constraints available for these parameters.  If you wish 
to redefine the Ancestor/Descendent relationships you need to either return to the <A HREF="converter.html">file converter</A> or edit the input file XML.
</p>

<H4> <A NAME="divergencemigration">Divergence-Migration parameters and model:  the "Divergence-Migration"
force </A></H4>
<p>
This force is presented exactly the same way that a regular migration matrix is except there 
are also entries for Ancestor populations. Note that even though you can potentially enter 
migration rates between invalid population pairs (for example an ancestor and one of its children) 
these will be ignored by the calculation.   (Be warned that if you manage to create an XML input
file with values for migration rates between invalid pairs, for example by hand-editing your
XML, the program will produce confused and meaningless results.)  Also note that pairwise calculators
for starting values are not available for cases with divergence.
</p>

<H4> Options common to all Force submenus </H4>

<P> Three options are available on all Forces submenus (except
for Trait mapping and Divergence), and they all behave
in the same fashion.  Constraints allow you to hold parameters constant or
constrain them to be equal to other parameters.  Profiling affects the
reported support intervals around the estimates (and can affect how long it
takes the program to run).  If you are running LAMARC in <A
HREF="bayes.html">Bayesian mode</A>, the Bayesian Priors menu allows you to
set the priors for the parameters.</P>

<h5><A NAME="constraints">Constraints</A></h5>

<P> Beginning with version 2.0 of LAMARC, we allow constraints on all
parameters.  All parameters can be unconstrained (the default, and
equivalent to pre-2.0 runs), constant, or grouped.  Grouped parameters all
have the same starting value, and can either be constrained to be identical
(and vary together as a unit), or be set constant.  In addition, we allow
some parameters to be set 'invalid', which means 'constant, equal to zero,
and not reported on in the output'.</P>

<P> Say, for example, you know that the recombination rate for your
population is 0.03.  In this case, you can set the recombination starting
value to 0.03 and set the recombination constraint to 'constant'.  Or say
you have a set of populations from islands in a river; you may know that
all downstream migration rates will be equal to each other, and that all
upstream migration rates will be equal to each other.  In this case, you
can put all the downstream rates together in one group, all the upstream
rates together in another group, and set each group's constraint to
'identical'.  If you have another set of populations and know that
migration is impossible between two of them, you could set those migration
rates to be 'invalid' (or simply set them constant and set the starting
values to zero).</P>

<P> In general, a LAMARC run with constraints will be somewhat faster than
one without, since fewer parameters have to be estimated.  This can be
particularly helpful for complex systems where you already have some
information, and are interested in estimating just a few parameters. 
Unfortunately, constraints are not available at this time for the
Epoch Boundary Time parameters.</P>

<P> Select 'C' to go to the Constraints sub-menu for any force.  To change
the constraint on a particular parameter, enter that parameter's menu index
number.  To group parameters, pick one of them and enter 'A' (Add a
parameter), then the number of your parameter, then 'N' (for a new group). 
Then pick another parameter that should be grouped with the first one,
enter 'A' again, the number of your new parameter, then the group number of
the group you just created (probably 1).  Groups are created with the
automatic constraint of 'identical', meaning that they will vary, but be
co-estimated.  You may also set a group 'constant', which has the same
effect as setting the individual parameters constant, but guarantees they
will all have the same value.</P>



<h5><A NAME="profiling">Profiling</A></h5>

<P> Each force's Profiling option (<b>P</b>) takes you to a sub-menu where you
can adjust how LAMARC gives you feedback about the support intervals for
your data.  Setting the various profiling options is important in a
likelihood run, since it is the only way to obtain confidence limits of
your estimates, and can drastically affect total program time.  It is less
important in a Bayesian run, since the produced curvefiles have the same
information, and profiling simply reports key points from those curves (and
it takes essentially no time to calculate, as a result).  Profiling is
automatically turned on in a Bayesian run, and it doesn't make a lot of
difference which type of profiling is used in that instance, so most of the
discussion below will be most applicable to a likelihood run.</P>

<P> For each force, you can turn profiling on (<b>A</b>) or off (<b>X</b>) for all
parameters of a given force, though you cannot profile any parameter you
set constant.  The next option (<b>P</b>), toggles between percentile and
fixed-point profiling.  Selecting this option will cause all parameters
with some sort of profiling on to become either percentile or fixed.  You
can turn on and off profiling for individual parameters by selecting that
parameter's menu index number.</P>

<P> Both kinds of profiling try to give some information about the shape of
the likelihood (or posterior, in a Bayesian analysis) curve, including both
how accurate the estimates are, and how tightly correlated estimates for
the different parameters are.</P>

<P> Fixed-point profiling considers each parameter in turn.  For a variety
of values of that parameter (five times the MLE, ten times the MLE, etc.)
it computes the optimal values for all other parameters, and the log
likelihood value at that point.  This gives some indication of the shape of
the curve, but the fixed points are not always very informative.  In the
case of growth, some values are set to multiples of the MLE, while others
are set to some generally-useful values unrelated to the MLE, such as
0. </P>

<P> Percentile profiling, instead of using fixed points, gives you values
which the value of your parameter is X% likely to fall below.  A value
for theta of 0.00323 at the .05 percentile means that the true value of
theta has only a 5% chance of being less than or equal to 0.00323, and a
95% chance of being greater than 0.00323.  In a likelihood run, LAMARC will
then calculate the best values for all other parameters with the first
parameter fixed at that percentile.  If the above example was calculated in
a run estimating recombination and growth rates, for example, LAMARC will
calculate the best values for recombination and growth if theta had been
0.00323.  This gives a much nicer picture of the shape of the curve, but it
is very slow.  If you use percentile profiling for likelihood, expect 
it to take a significant proportion of your total run time.</P>

<P> The accuracy of the percentile profiling in a likelihood run is
dependent on the likelihood surface for your data fitting a Gaussian in
every dimension.  When the surface is Gaussian, the percentiles for each
parameter can be determined by finding the values which result in
particular log likelihoods.  For example, the .05 percentile is
mathematically defined to be the point at which the log likelihood is
exactly 1.35 less than the log likelihood for the MLE, while the .25
percentile can be found at the point where the log likelihood is exactly
0.23 less.  LAMARC finds the values for which these likelihoods can be
found, but cannot determine whether the actual likelihood surface for your
data has a truly Gaussian shape.  Hence, percentile profiling cannot be
used to report absolute confidence intervals, but it is at least a step in
that direction.</P>

<P> You may want to turn off profiling completely, or used fixed-point
profiling, for exploratory runs.  Percentile profiling gives the best
information for final runs, but may be too slow.  If you save your data to
a summary file (see <A HREF="menu.html#summary">summary files</a>), you can
go back and change the profiling options in a subsequent run, which then
won't have to recalculate the Markov chains; it will merely calculate new
profiles.</P>

<P> If you turn off profiling, you will lose both the profile tables
themselves and the approximate confidence intervals in the MLE tables.  A
good compromise is to set the <A HREF="menu.html#output">output file
verbosity</a> to "concise", which causes LAMARC to only calculate two
profiles (for percentile profiling, the 95% support intervals) instead of
about 10.</P>


<h5>Bayesian Priors</h5>

<P> If you are running LAMARC in Bayesian mode (see the <A
HREF="menu.html#search">Search Strategy</a> menu), each force will have the
option to edit the Bayesian priors (<b>B</b>) for that force.  A more detailed
discussion of a Bayesian run can be found <A
HREF="menu.html#search">below</a>, as well as in the <A
HREF="bayes.html">tutorial</a>.

    
<hr>
<H3><A NAME="search"> Search Strategy </A></H3>
<p><img src="images/LamarcSearchScreen.png" alt="LAMARC search strategy screen"/></p>

<P> This menu allows you to fine-tune your search strategy, to get
the best results from LAMARC with the minimal time.  Consider tuning these
settings if you are not satisfied with the performance of your searches. 
For advice on choosing the best settings here, see the article <A
HREF="search.html">"Search Strategies."</A> </P>

<P> The first option in the Search Strategy menu (<b>P</b>, 'Perform Bayesian or
Likelihood analysis') toggles your setup between a likelihood run and a
Bayesian run.  This choice can have a profound impact on the course of your
run, though hopefully both have a reasonable chance of arriving at the
truth at the end.  A likelihood run (the only option for versions of LAMARC
earlier than 2.0) searches tree-space with a fixed set of 'driving values'
per chain, and searches the resulting likelihood surface to find the best
set of parameter estimates.  A Bayesian run searches tree-space at the same
time as it searches parameter-space, then uses its parameter-space search as
a Bayesian posterior to report the best values for individual parameters. 
For more details about a Bayesian search with some comparisons to the
likelihood search, see the <A HREF="bayes.html">Bayesian tutorial</a>.

<h4><a NAME="priors">Bayesian priors menu</a></h4>
<P> If you have elected to run a Bayesian search, you will get the option
(<b>B</b>) to set the priors for the various forces in your data.  Selecting the
option will take you to a sub-menu listing all active forces and a summary
of the current priors for each force.  Once you select a particular force,
you get the option to edit the default prior for that force (<b>D</b>), and a
list of parameters, each of which may be selected to edit that parameter's
prior.</P>

<P> When editing the prior for a particular parameter, you may select
whether you wish to use the default prior with the <b>D</b> option, re-setting the
current prior to the default.  For all priors, you may then set three
options:  the shape of the prior (<b>S</b>), which may be linear or (natural)
logarithmic, and the upper (<b>U</b>) and lower (<b>L</b>) bounds of the prior.  There
is currently no provision for other prior shapes.</P>

<h4><a NAME="rearrangers">Rearrangers menu</a></h4>

<P> Selecting <b>R</b> from the Search Strategy menu takes you to a sub-menu where
you can set the relative frequencies of the various arrangers.  The main
arranger in a LAMARC run is the Topology rearranger (<b>T</b>), which is the
main tree rearranger.  This rearranger works by selecting and breaking a
branch of its current tree, then re-simulating that branch to add it back
to the tree.  It should almost always be set greater than the other tree
rearrangers (the size and hapolotype arrangers), and any decrease in its
relative frequency probably requires a concomitant increase in chain
length (see <A HREF="menu.html#chains">sampling strategy</a>, below).

<P> A new arranger for version 2.0 is the Tree-Size rearranger (<b>S</b>).  This
rearranger leaves the topology of the tree constant, but re-samples branch
lengths root-wards below a selected branch (biased with a triangular
distribution towards the root).  Our initial experiments with this
rearranger indicate that it is helpful in a variety of situations, but
particularly helpful for runs with growth and migration.  It should be used
sparingly, however:  we've found setting this rearranger's frequency to 1/5
that of the topology rearranger is a generally good ratio.</p>

<P> If your data appears to have phase-unknown sites,
you will have the option to set the relative frequency of the Haplotype
rearranger (<b>H</b>).  The haplotype rearranger considers new phase assignments
for a pair of haplotypes.  Like the tree-size rearranger, setting this
frequency to 1/5 that of the topology rearranger has been found to produce
good results. </p>

<P> If you have chosen to do a Bayesian run, you will have the option to
set the relative frequency of the Bayesian rearranger (<b>B</b>).  This
rearranger considers new parameter values on the current tree.  By default,
this is set to the same frequency as the topology rearranger, and this
seems to be adequate for runs with a small number of variable parameters. 
This can be increased for runs with a larger number of parameters, but you
probably don't want a relative frequency of more than 2-3 times that of the
topology arranger--increase the <A HREF="menu.html#chains">length of your
chains</a> instead.

<P> If you are doing trait mapping using the "jump" strategy (in which
the trait alleles are assigned a chromosomal location, and this location is
reconsidered during the run) two additional rearrangers become available.
The Trait haplotypes rearranger (<b>M</b>) allows reconsideration of 
ambiguous trait haplotypes:  for example, it can switch between
DD, Dd and dD as haplotype resolutions for someone showing a dominant
phenotype.  The Trait Location rearranger (<b>L</b>) moves the trait
position within the region.  We have little information about the
best proportions of effort to put into these two rearrangers, but the
Trait Location rearranger probably needs to get at least 20% effort
to function well.  These arrangers are not needed in "float" mapping
or in non-mapping runs and will not be shown.</P>

<H4> <A NAME="chains">Sampling Strategy (chains and replicates)</a></H4>

<P> This sub-menu allows you to adjust the time LAMARC spends sampling
trees.  It can (and should) be adjusted to reflect whether you want an
'exploratory' run vs. a 'production' run, how complicated your parameter
model is, and whether you are performing a likelihood or Bayesian run. 
Options germane to each of the above scenarios will be discussed in turn.

<P> The first option (<b>R</b>) allows you to use replication--repeating the
entire analysis of each genomic region a number of times, and consolidating the
results.  This is more accurate than running LAMARC several times and
attempting to fuse the results by hand, because LAMARC will compute profiles
over the entire likelihood surface, including replicates, instead of
individually.  It will, of course, increase the time taken by the current
run in proportion to the number of replicates chosen (both the time spent
generating chains and, typically, the time spent profiling the
results).  The minimum number of replicates is one, for a single run-through
of LAMARC.  A reasonable upper limit is 5 if your runs produce reasonably
consistent results, though you may want to use a higher number to try to
overcome more inconsistent runs.  Replication is useful for 'production'
runs more than exploratory runs, and can help both likelihood and Bayesian
searches.</P>

<P> LAMARC's search is divided into two phases.  First, the program will
run "initial" chains.  In a likelihood run it is useful to make 
these relatively numerous and short as they
serve mainly to get the driving values and starting genealogy into a
reasonable range.  When all the initial chains are done, the program will
run "final" chains.  These should generally be longer, and are used to
polish the final estimate.  Exploratory runs can have both short initial
and short (but somewhat longer) final chains, and production runs should have
longer sets of both chains.  Because a likelihood run is highly dependent
on the driving values, you will probably need several initial chains (10 is
a reasonable number), and a few final chains (the default of 2 usually
works). A Bayesian run is not at all dependent on the driving values, and
while you might use several initial chains for an exploratory run just to
see what's happening with the estimates, you should probably simply use
zero or one initial chains and one quite-long final chain to obtain your
production estimates.</P>

<P> For both initial and final chains, there are four parameters to set. 
"Number of chains" determines how many chains of that type will be run. 
"Number of recorded genealogies" determines how many genealogies (in a
likelihood run), while "Number of recorded parameter sets" determines how
many sets of parameters (in a Bayesian run) will actually be used to make the
parameter estimates.  "Interval between recorded items" determines how many
rearrangements will be performed between samples.  Thus, if you ask for 100
recorded items per chain, and set the interval between them to 20, the program will
perform a total of 2000 rearrangements, sampling 100 times to make the
parameter estimates.  The total number of samples will determine the length
of your run, and can be shorter for exploratory runs but should be long
enough to produce stable results for production runs.  In a Bayesian run,
as mentioned, you will want one, long chain for your production run.  If
you are seeing spurious spikes in your curvefiles, you probably need to
increase the sampling interval, too--because each rearrangement only
changes a single parameter (and also takes time to rearrange trees),
certain parameters can stay the same simply by neglect, and will end up
being oversampled in the output.  Increasing the sampling interval can
overcome this artifact.</P>

<P> "Number of samples to discard" controls the burn-in phase before
sampling begins.  LAMARC can be biased by its initial genealogy and initial
parameter values, and discarding the first several samples can help to
reduce this bias.  To continue with the example above, if you ask for 100
samples, an interval of 20, and 100 samples to be discarded, the program
will create a total of 2100 samples, throwing away the first 100 and
sampling every 20th one thereafter.  In a likelihood run, you want the
burn-in phase to be long enough to get away from your initial driving
values, which will be longer the more complex your model is.  In a Bayesian
run, you also want the burn-in phase to be long enough to get away from
your initial set of values and the initial genealogy.  Again, this will
need to be longer if you have a more complex model with lots of
parameters.</P>

<H4> <A NAME="heating">Multiple simultaneous searches with heating</a></H4>

<P> The last menu item on the Search Strategy menu allows you to help your
search explore a wider sampling of trees by setting multiple
"temperatures."  A search through possible trees at a higher temperature
accepts proportionally less likely trees, in the hopes that future
rearrangements will find a new, well-formed tree with a higher 
likelihood.  This approach will often rescue a search that otherwise 
becomes stuck in one group of trees and does not find other
alternatives.</P>

<P> (The reason that the word "temperature" is used here may be understood 
by means of an analogy.  Imagine, on a snowy winter day, that there are 
several snowmen on the lawn in front of a house, and you want to identify 
the tallest one; you do not want to determine the exact height, you just 
want to determine the tallest snowman.  One way of doing this would be to 
raise the temperature so that all of the snowmen melt; you could then 
identify the tallest snowman as the one that disappears last.  Using 
multiple "heated" Markov chains simultaneously provides smoothed-out, 
compressed views of the space of possible genealogy arrangements.)</P>

<P> To set multiple temperatures, select the <b>M</b> option (Multiple
simultaneous searches with heating) menu, then select <b>S</b> (Number of
Simultaneous Searches) and enter the number of temperatures you want.  You
will then get a list of new menu options, and be able to set the various
temperatures. For best results, temperatures should progress in value
pseudo-exponentially.  A typical series of temperatures might be "1 1.1 2 3
8", but different data sets might have different optimal magnitudes, from
"1 2 10 20 50" to "1 1.01 1.05 1.1 1.3".  Watching the Swap Rates between
temperatures during the run is important for determining an optimal series
here--rates should vary somewhere between 10 and 40 (the numbers give are
percents).  Below about 5% you are getting little return for a huge
increase in computation, and above 50% the two chains are so close to each
other that they are unlikely to be exploring significantly distinct areas
of parameter space (a process more efficiently handled by using <A
HREF="menu.html#chains">replicates</A>). </P>

<P> Should finding an optimal series of temperatures by hand become too
difficult, or if the optimal series of temperatures varies during a run,
LAMARC can be told to try to optimize the temperatures automatically, by
switching from "static" to "adaptive" heating (the <b>A</b> option that appears
if you have more than one temperature).  With static heating, the
temperatures you specify will be used throughout the run.  With adaptive
heating, the program will continually adjust the temperatures during the
run to keep swapping rates between 10% and 40%.  We make no guarantees that
adaptive heating will be superior to static heating, but it should at least
drive the values to the right magnitudes, and keep them there during the
course of the run.</P>

<P> A second option that appears if you have multiple temperatures is the
ability to set the swap interval for different temperatures (<b>I</b>).  The
default is "1", which means LAMARC picks two adjacent temperatures and
checks to see if the higher-temperature chain has a better likelihood than
the lower-temperature chain after each rearrangement.  To check less
frequently, set this value to a higher number (3 for every third
rearrangement, etc.).  A higher value will speed up the program incrementally,
but typically does not represent a significant gain.</P>

<P> In general, a run will increase in length proportionally to the number
of temperatures chosen, though the time spent profiling will be the same as
without.</P>

<hr>
<H3><A NAME="io"> Input and Output related tasks </A></H3>
<p><img src="images/LamarcIOScreen.png" alt="LAMARC I/O screen"/></p>

<P> This menu controls most all of the interactions between the program, the
computer, and the user.  You can use it to modify the names and content of
files LAMARC produces, as well as the information printed to the screen
during a LAMARC run.</P>

<h4> Verbosity of Progress Reports </h4>

<P> This first option on the input and output menu controls the reports
that appear on-screen as the program runs.  Type <b>V</b> to toggle among the
four options.  NONE suppresses all output, and might be used when running
LAMARC in the background, after you know what to expect. CONCISE only
periodically reports about LAMARC's progress, noting that a new region
has begun, or that profiling has started.  NORMAL adds some
real-time feedback on the program's progress and additionally guesses at
completion time for each program phase (the guesses are not always
completely reliable, however).  VERBOSE provides the maximum amount of
feedback, reporting additionally on some of the internal states of the
program, including overflow/underflow warnings and the like.  If something
has gone wrong with your LAMARC run, having this option set to VERBOSE is
your best chance at a diagnosis.</P>

<h4><A NAME="output">Output File Options</A></h4>

<P> The next menu item sends you to a submenu where you can set various
aspects of the final output file.  Select <b>O</b> to go to this menu.</P>

<P> Selecting <b>N</b> allows you to set the name of the output report file. 
Please be aware that if you specify an existing file, you will overwrite
(and destroy) its contents.</P>

<P> Selecting <b>V</b> allows you to toggle between the three levels of content
for the output file. VERBOSE will give a very lengthy report with
everything you might possibly want, including a copy of the input data (to
check to make sure the data were read correctly and are aligned).  NORMAL
will give a moderate report with less detail, and CONCISE will give an
extremely bare-bones report with just the parameter estimates and minimal
profiling.  We recommend trying VERBOSE first and going to NORMAL if you
know you don't need the extra information.  CONCISE is mainly useful if
your output file is going to be read by another computer program rather
than a human being, or if speed is of the essence, since it speeds up
profiling in a likelihood run by 5.</P>

<P> The "Profile likelihood settings" option (<b>P</b>) leads to a new sub-menu
that lists all forces and gives you an overview of how they are going to be
profiled.  You can turn on (<b>A</b>) or off (<b>X</b>) profiling for all parameters
from this menu, or set the type of profiling to percentile (<b>P</b>) or fixed
(<b>F</b>).  The other menu options take you to the force-specific profiling
menus discussed <A HREF="menu.html#profiling">above</a>. </P>

<h4> <A NAME="menuinfile">Name of menu-modified version of input file</A></h4>

<P> The "Name of menu-modified version of input file" option (<b>M</b>) allows
you to change the name of the automatically-generated file which will be
created by LAMARC when the menu is exited ("menusettings_infile.xml", by default).  This
file contains all the information in the infile, but also contains any
information that may have been set in the menu.  If you want to repeat
your run with exactly the same options that you chose from the menu this
time, you can rerun using this file as your input file.</P>

<h4> <A NAME="summary">Writing and Reading Summary Files</A></h4>

<P> The next two menu items on the "Input and Output Related Tasks" menu
are used to enable or disable reading and writing of summary files. If
summary file writing is enabled, LAMARC saves the information it calculates
as it goes. There is enough to be able to recover from a failed run, or to repeat
the numerical analysis of a successful run.  If a run fails while
generating chains, LAMARC will take the parameter estimates from the last
completed chain, use them to generate trees in a new chain, then use those
trees and the original estimates to start a new chain where it had crashed
before.  In this scenario, LAMARC cannot produce numerically identical
results to what it would have produced had the run finished, but should
produce similar results for non-fragile data sets.  However, if profiling
had begun in the failed run, the summary files do contain enough
information to produce numerically identical results, at least to a certain
degree of precision.</P>

<P> To turn on summary file writing, select <b>W</b> from the menu, then <b>X</b> to
toggle activation.  The name of the summary file may be changed with the
<b>N</b> menu option.  This will produce a summary file as LAMARC runs.  To then
read that summary file, turn on summary file reading the next time LAMARC
is run (from the same data set!) with the <b>R</b> option from this menu, then
<b>X</b> to toggle activation, and finally <b>N</b> to set the correct name of the
file.  Lamarc will then begin either where the previous run stopped, or, if
the previous run was successful, will start again from the Profiling
stage.</P>

<P> For particularly long runs on flaky systems, it may be necessary to
both read and write summary files.  If both reading and writing is on,
LAMARC will read from the first file, write that information to the second
file, and then proceed.  If that run is then stopped, the new file may be
used as input to start LAMARC further along its path then before.  If this
option is chosen, be sure to have different names for the input summary file
and the output summary file.</P>

<P> If reading from a summary file, most of the options set when writing
the summary file must remain the same, or unpredictable results may occur,
including LAMARC mysteriously crashing or producing unreliable results. 
However, since all profiling occurs after reading in the summary file,
options related to that may be changed freely.  For example, in order to
get preliminary results, you may run LAMARC with "fixed" profiling,
"concise" output reports, and writing summary files on.  Afterwards, if
more detail is needed about the statistical support of your estimates, 
you may run LAMARC again, this time
with summary file reading, "percentile" profiling, and "verbose" output
files.</P>

<H4><A NAME="tracer">Tracer output</A></H4>

<P> LAMARC will automatically write files for the utility <A
HREF="http://tree.bio.ed.ac.uk/software/tracer/">Tracer</a>
written by Drummond and Rambaut (see the <A HREF="tracer.html">"Using
Tracer with LAMARC"</A> documentation in this package).  
LAMARC's Tracer output files are named [prefix]_[region]_[replicate].txt.
You can turn on or off Tracer output and set the prefix here.</P>

<H4><A NAME="newick">NEWICK tree output</A></H4>

<P>If there is no migration or recombination, LAMARC can optionally
write out the tree of highest data likelihood it encounters for each
region, in Newick format.  Options in this menu control whether such
a Newick file will be written, and what its prefix will be.  This
option is not needed for normal use of the program, but it is sometimes
interesting to see what the best tree was, and how it compares with
the best tree found by phylogeny-inference programs.  (Sometimes,
surprisingly, LAMARC is able to outdo normal inference programs.)</P>

<h4> <A NAME="curvefiles">Bayesian curvefiles</A></h4>

<P> A Bayesian run of LAMARC will produce additional output for each
region/parameter combination that details the probability density curve for
that parameter.  Each file can be read into a spreadsheet program (like
Excel) to produce a graphic of your output.  If you decide you don't have
enough disk space for these files, or don't want them for some other
reason, you can turn off this feature by toggling the <b>U</b> option ('Write
Bayesian results to individual files').  You can change the prefix given to
all of these curvefiles with the <b>C</b> option ('Prefix to use for all
Bayesian curvefiles').  Curvefile names are of the format
[prefix]_[region]_[parameter].txt, where '[prefix]' is the option settable
here, '[region]' is the region number, and '[parameter]' is the parameter
in question.  More details about bayesian curvefiles are available in the
<A HREF="bayes.html#results">Bayesian tutorial</a>.

<H4><A NAME="reclocfiles">Recombination location output</A></H4>

<P> In runs modeling recombination, LAMARC can dump a file listing
each recombination location in every sampled tree in the last final chain.
A recombination between data position <tt>-13</tt> and <tt>-12</tt> is
recorded as <tt>-13</tt>, one between <tt>340</tt> and <tt>341</tt> is
recorded as <tt>340</tt>.
You can read the file into <tt>R</tt> or another statistical computing
tool, and plot a histogram to see where the recombinations are most
often accepted. Keep in mind that there is a slight bias to accept
recombinations near the ends of the input sequences, as there is less
data available to demonstrate a recombination there is unsupported.
</p>

<p>
These files are named [prefix]_[region]_[replicate].txt,
You can turn on or off this output and set the prefix here. (The default
prefix is 'reclocfile'.)
This option is off by default since they can produce a large amount of output.</P>


<hr>
<H3><A NAME="current"> Show current settings </A></H3>
<p><img src="images/LamarcOverviewScreen.png" alt="LAMARC overview screen"/></p>

<P> This menu option provides reports on all current settings, so that you
can see what you've done before starting the program.  You cannot change
the settings here, but each display will indicate which menu should be used
to change the displayed settings. </P>

<P>(<A HREF="xmlinput.html">Previous</A> | <A
HREF="index.html">Contents</A> | <A HREF="regions.html">Next</A>)</P>

<!--
//$Id: menu.html,v 1.52 2013/11/08 23:09:53 mkkuhner Exp $
-->
</BODY>
</HTML>