1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182
|
<!-- header fragment for html documentation -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="description" CONTENT="Estimation of population parameters using
genetic data using a maximum likelihood approach with Metropolis-Hastings
Monte Carlo Markov chain importance sampling">
<META NAME="keywords" CONTENT="MCMC, Markov chain, Monte Carlo,
Metropolis-Hastings, populat ion, parameters, migration rate, population
size, recombination rate, maximum likelihood">
<TITLE>LAMARC Documentation: Menu</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF"> <!-- coalescent, coalescence, Markov chain Monte
Carlo simulation, migration rate, effective population size, recombination
rate, maximum likelihood -->
<P>(<A HREF="xmlinput.html">Previous</A> | <A
HREF="index.html">Contents</A> | <A HREF="regions.html">Next</A>)</P>
<H2>The Interactive LAMARC menu system</H2>
<H3>Introduction</H3>
<p>
LAMARC's user interface is fairly awkward to use. This is because LAMARC is mainly a batch program and this interface reflects the way LAMARC does things internally, not the way users think about things. LAMARC is designed to run off XML input. What this interface does is read in and appropriately decorate the output of the <A HREF="converter.html">File Conversion Utilities</A> and allow users to tweak existing XML (often the output of this interface) to do different analyses than it was originally created to do. Note that there is almost no overlap between what can be edited in this interface and what can be edited in the File Converter. Our long term goal is to combine this interface with the File Converter and make LAMARC a purely batch program (which has many advantages such as more effective use of parallel computing).
</p>
<p>This interface reflects how LAMARC is organized internally, which is not necessarily obvious. We recommend that you take a tour through all the menus before you do anything to get a sense of where things are. For example, "Migration" is found under "Analysis", which may seem odd until you realize that if "Migration" is on, the "Analysis" of the data changes. There are a lot of other examples of seemingly normal terms meaning slightly different things in the LAMARC context. A tour of the interface will give you an overview.
</p>
<H3><A NAME="conventions">General Conventions</A></H3>
<p>
LAMARC has essentially a command line interface, which is fairly uncommon in the modern era. So here are some general conventions that will help understand it:
<UL>
<LI>
When the menu redisplays, the old screen just scrolls up, so make sure you stay at the bottom of window you are working in or it will get confusing.
</LI>
<LI>
Each line you can interact with has a single character at the left side (for example <b>J</b>), some text to explain what that line is about, and the current value on the right side. In order to change the value on that line, enter that character <b>J</b> at the bottom of the screen.
</LI>
<LI>
If there are multiple similar lines that can be edited individually, for example rows of a Migration Matrix, they will have numbers in the left column rather than letters.
</LI>
<LI>
Case does not matter. You can enter <b>J</b> or <b>j</b> in the above example and get the same effect.
</LI>
<LI>
Booleans (Yes/No or True/False items) are toggled by entering the character listed at the left. So if an item with an <b>A</b> on the left is "Yes" and you want "No", enter <b>A</b>, the screen will redisplay and <b>A</b> will now be "No".
</LI>
<LI>
Things can come and go in menus in logical, but not necessarily obvious ways. For example, in Forces, if you only have one population. Migration will not appear, because it cannot happen. This can get a bit confusing because when you change something on one screen it can cause something on another screen to appear or disappear. Until you get used to your analysis, it's wise to review everything using the "Overview" pages when you think you are ready to start a run, just to make sure nothing you have done had unexpected side-effects.
</LI>
</UL>
</p>
<H3>Start Up</H3>
<p>When you start up LAMARC the first thing you will be asked is what your output directory is. You will then be asked for your input file. This may seem a bit backwards, since usually you would want to read data in before putting it out, but reflects the internal workings of LAMARC. If you don't specify an input file, LAMARC will follow the <A HREF="http://evolution.genetics.washington.edu/phylip/">Phylip</A> conventions and look in the output directory for a file called "infile".
</p>
<p>
The data in the input file defines the kinds of analyses which are possible. If you don't see the kind of analysis you wish to do listed on the "Analysis" menu, you will need to modify your input file so that kind of analysis is possible. For example, if you wish to study migration, you need at least two populations. If "Migration" is not an option in "Analysis" you only have one population defined in your input file. You will need to fix that, either using the <A HREF="converter.html">file converter</A> or editing the XML directly, before LAMARC can analyze migration.
</p>
<p> Once the data have been located and processed (which may take several seconds), the first screen you see upon starting LAMARC is the top level menu: </p>
<p><img src="images/LamarcMainScreen.png" alt="LAMARC main screen"/></p>
<P> The menu may appear in a different form depending on your computer
system, but the basic ideas are always the same. You can now review and set values in the following areas:
<UL>
<LI><A HREF="menu.html#data">Data options</A></LI>
<LI><A HREF="menu.html#analysis">Analysis methods</A></LI>
<LI><A HREF="menu.html#search">Search Strategy menu</A></LI>
<LI><A HREF="menu.html#io">Input and Output related tasks</A></LI>
<LI><A HREF="menu.html#current">Overview of current settings</A></LI>
</UL>
<P>On all LAMARC menus, the bottom line will give two options:
<UL>
<LI><b>Run</b> the program ('.')</LI>
<LI><b>Quit</b> ('q')</LI>
</UL>
</P>
<P> If you are viewing a
sub-menu, you will also have the option to:
<UL>
<LI><b>Go Up</b> to a previous menu('<return>')</LI>
</UL>
</P>
</P>If you have performed any changes from the initial
setup within a submenu there will be the:
<UL>
<LI><b>Undo</b> option ('-') which will undo your last change</LI>
</UL>
</P> If you have performed any Undo operations, there will be the:
<UL>
<LI><b>Redo</b> option ('+') which will redo your last change</LI>
</UL>
</P> Any time that you may create a new valid LAMARC input file based on the current menu
settings there will be the:
<UL>
<LI><b>Create</b> option ('>').</LI>
</UL>
</P>
<P> <B>Warning 1:</B> LAMARC's search defaults are fairly small. This
allows a first-time user to get visible results right away, but for serious
publishable analysis you will almost surely want to increase some of the
settings in the <A HREF="menu.html#search">"Search Strategy menu"</A>.</P>
<P> <B>Warning 2:</B> Once you have selected "Run" you will have no further
interaction with the program; you'll have to kill and restart it in order
to change its behavior. However, Lamarc does save a modified version
of its input file, updated with any changes made via the menu, into
file "menusettings_infile.xml" when you exit the menu. If you want to re-run LAMARC
starting with the same data and menu options as you last selected, choose
"menusettings_infile.xml" as your input file when restarting LAMARC. </P>
<hr>
<H3><A NAME="data">Data options</A></H3>
<p><img src="images/LamarcDataScreen.png" alt="LAMARC data screen"/></p>
<P> This menu allows you to define what your data is and how you want to model it.</P>
<P> The first two items (<b>C</b> and <b>S</b>) define the source of the random number seed used to start the analysis. Normally the seed is set
from the system clock so it is default set "<b>Yes</b>". To toggle it off and use the explicit seed type <b>C</b>.
</P>
<P> A very few systems lack a system clock; on those you will need to set
this value by hand (either here or in the input file).</P>
<P>The explicit random seed is used if you wish to do exactly the same analysis
twice. You can hand-set the seed by entering <b>S</b>. You will be queried for the number to be used.
LAMARC will then find the closest integer of the form 4N+1 to the number
you entered.
<P> The <b>E</b> option (Effective population size menu) will only appear if you have data from multiple regions. It provides a way to combine
data collected from different regions of the genome that have unique
effective population sizes. For example, data
from nuclear chromosomes of a diploid organism reflect an effective
population size four times larger than
data from the mitochondrion of the same organism. Data from sex
chromosomes also have unique effective population sizes--the relative
effective population size ratios for a non-sex chromosome to an X
chromosome to a Y chromosome is 4:3:1. Selecting <b>E</b> takes you to a sub-menu
where you can select a particular genomic region and set its effective population
size.</P>
<P> The next set of menus allow you to modify the data-analysis model for each segment of your data. You can either modify the model for each segment
individually (<b>1</b>, ...), or you can modify a default model for the different types of
data LAMARC found in your infile. If you have DNA or RNA or SNP data, <b>N</b> allows you to edit the default data model for all Nucleotide data. If you
have microsatellite data, <b>M</b> allows you to edit that data's default
model. If you have K-Allele data, <b>K</b> allows you to edit that
data's default model. To assign the appropriate default data model to all
segments, select <b>D</b>.
</P>
<P> For nucleotide data, you can either choose the Felsenstein '84 model
(F84) or the General Time Reversible model (GTR). Microsatellite data may
take the Brownian, Stepwise, K-Allele, or Mixed K-Allele/Stepwise models
(MixedKS). K-Allele data may only take the K-Allele model.</P>
<H4>Options common to all <b>data model</b> submenus</H4>
<P> Several menu options are common to all evolutionary models; for
conciseness these are described here.</P>
<P> If you are editing the data model for a particular segment (and
not for a default), the first line displays the type of data found in that
segment, and you are given the option (<b>D</b>) of using the appropriate
default data model for that segment. The <b>M</b> option (Data Model) allows you
to cycle through the possible data models appropriate for that data type.
</P>
<P> The next two menu lines (<b>C</b> and <b>A</b>) describe the current state of the
categories model of variation in mutation rates among sites. LAMARC uses
the Hidden Markov Model of Felsenstein and Churchill (1996). In this model,
you must determine how many rate categories you wish to assume, then
provide the relative mutation rates for each category and the probability
that a site will fall into that category. However, you do not need to
specify which category each site actually falls into. The program will sum
over all possibilities.</P>
<P> If you choose to have multiple categories, select <b>C</b> (Number of
Categories), which will take you to a sub-menu. Here, you can change the
number of categories with the <b>N</b> option, then select particular
rate/probability pairs to change their values on a further sub-menu. For
example, if you wish to model two categories with 80% of sites evolving at
a base rate and the remaining 20% evolving ten times faster, you would set
the number of categories to 2, then set one rate/probability pair to 1 and
.8, and the second rate/probability pair to 10 and .2.</P>
<P> Internally, the program will normalize the rates so that the mean rate
across categories, weighted by the category probabilities, is 1.0. </P>
<P> In data modeled with categories of mutation rates, the "mu" value (a
component of various forces such as theta and M) is the weighted average of
the individual mutation rates. In the above example, if you determine that
mu is 0.028, you can solve the equation:</P>
<P><center>0.028 = (0.8 * 1x) + (0.2 * 10x)<br>
x = 0.028 / 2.8 = 0.01<br>
10x = 0.1</center></P>
<P>and thus determine the two individual mutation rates are 0.01 and 0.1</P>
<P> The program will slow down approximately in proportion to the number of
rate categories. We do not recommend using more than five as the gain in
accuracy is unlikely to be worth the loss in speed. Do not use two
categories with the same rate: this accomplishes nothing and slows the
program down.</P>
<P> A category rate of zero is legal and can be useful to represent
invariant sites; however, at least one rate must be non-zero.</P>
<P> If you wish to use the popular gamma distribution to guide your rates,
use another program to calculate a set of categories that will approximate
a gamma distribution, then enter the corresponding rates and probabilities
manually into LAMARC. There is currently no provision to
infer gamma distributed rate variation within a single segment. For
gamma distributed mutation rate variation across independent regions,
see the <a href="#gamma">gamma parameter</a>
of the <a href="#analysis">analysis menu.</a></P>
<P> The <b>A</b> (Auto-Correlation) option provides an autocorrelation
coefficient which controls the tendency of rates to "clump". The
coefficient can be interpreted as the average length of a run of sites with
the same rate. If you believe there is no clumping (each site has an
independent rate), set this coefficient to 1. If you believe that, for
example, groups of about 100 sites tend to have the same rate, set it to
100.</P>
<P> While auto-correlation values may be set for any model, it is likely to
make sense biologically only in the case of contiguous DNA or RNA data. It
is not sensible to use it for widely separated SNPs or microsatellites.</P>
<P> After other model-specific options, the <b>R</b> (Relative mutation rate)
option provides a coefficient which controls the comparison of mutations
rate (mu) between segments and/or data types. If, for example, you have
both microsatellite data and nuclear chromosome data in your file, and you
know that changes accrue in your microsatellite data ten times faster than
changes accrue in the DNA, you can use this option to set the relative mu
rate for the microsat segment(s) to be 10, and the relative mu rate for the
DNA segment(s) to be 1. Overall estimates of parameters with mu in them (like
Theta) will be reported relative to the DNA values. If you want overall
estimates reported relative to the microsat values, you can set the
microsat mu rate to 1 and the DNA mu rate to 0.1.
<H4>Model-specific menus: nucleotide data</H4>
<H5>F84 model </H5>
<P> The Felsenstein '84 (F84) model is a fairly general nucleotide
evolutionary model, able to accommodate unequal nucleotide frequencies and
unequal rates of transition and transversion. It has the flexibility to
mimic simpler models such as Kimura's (set all nucleotide frequencies to
0.25) and Jukes and Cantor's (set all nucleotide frequencies to 0.25 and
the transition/transversion ratio to 0.500001).</P>
<P> The <b>T</b> option (TT Ratio) allows you to set the ratio between transitions
(A/G, C/T) and transversions (all other mutations). If bases mutated
completely at random this ratio would be 0.5 (1:2). If you want a
random-mutation model (corresponding to the model of Jukes and Cantor)
LAMARC will use 0.500001 instead of 0.5, due to a limitation of the
algorithm used in LAMARC that would otherwise divide by zero.</P>
<P> Programs such as <A HREF="http://paup.csit.fsu.edu/">PAUP*</A> can be
used to estimate the transition/transversion ratio from your data. In
practice it probably does not need to be very exact.</P>
<P> The <b>B</b> option (Base Frequencies) will take you to a submenu where you
can either tell LAMARC to calculate the base frequencies directly from the
data (the <b>F</b> option), or enter the values for the relative base frequencies
yourself. Unless your sequences are very short, it is probably best to
calculate these frequencies from the data. If a particular nucleotide does
not exist in your data, you may set its frequency to a very small non-zero
value (0.00001 is probably low enough).</P>
<H5> General Time-Reversible (GTR) model </H5>
<P> The GTR model is the most general tractable model for nucleotide
data. It allows both unequal nucleotide frequencies and unequal rates for
each pair of nucleotides. The most practical way to use GTR in LAMARC is
to estimate its rates with another program, such as <A
HREF="http://paup.csit.fsu.edu/">PAUP*</A>. LAMARC does not have any
facility to estimate the GTR rates itself, but requires them to be
provided.</P>
<P> It is wasteful to use GTR when a simpler model is adequate, since it
runs rather slowly. PAUP* with
<a href="http://darwin.uvigo.es/software/modeltest.html">MODELTEST</a>
can be used to assess the adequacy of simpler models. </P>
<P> The <b>G</b> option (GTR rates) requests input of the six base-specific
mutational rates. These are symmetrical rates before consideration of
nucleotide frequencies, and can be obtained from PAUP*. PAUP* may provide
only the first 5 rates, in which case the [GT] rate is always 1.0.</P>
<P> The <b>B</b> option (Base Frequencies) allows you to set the base frequencies
of the four nucleotides, in the order A, C, G, and T.
The "Base frequencies computed from data" option is not
given here, since the third-party program you use to determine GTR rates
will also give you base frequencies, and you should use those.</P>
<H5><A NAME="data-uncertainty">Modeling data uncertainty in F84 and GTR models</A></H5>
<P>Lamarc runs on nucleotide data now accommodate modeling data uncertainty.
This is option <b>P</b> in both the F84 and GTR models.
The per-base error rate gives the rate at which each single instance of a
nucleotide should be assumed to have been miss-called. A value of 0 indicates
that all were sequenced correctly. A value of 0.001 indicates one in one
thousand is incorrect.
The default value is 0.
This feature is in beta test as of December, 2009.
</P>
<H4>Model-specific menus: microsatellite data</H4>
<P> Apart from the choice of which model to use, the only choices for
the microsatellite models, except for the MixedKS model,
are those common to all models: handling of rate differences among
markers, and normalization. These are discussed above. It is not
meaningful or useful to ask for autocorrelation when analyzing only a
single microsatellite marker per segment. </P>
<H5> Stepwise model </H5>
<P> The stepwise microsatellite model assumes that microsatellite mutations
are always single-step changes, so that the larger observed differences
have been built up via successive single-step mutations.</P>
<H5> Brownian-motion model </H5>
<P> The Brownian-motion microsatellite model is an approximation of the
stepwise model. Rather than a discrete model of single mutational steps,
we use a continuous Brownian-motion function and then truncate it to the
nearest step. This is much faster than the full stepwise model and returns
comparable results on fairly polymorphic data, but is less accurate on
nearly invariant data. </P>
<H5> K-Allele model </H5>
<P> This model assumes that only the alleles detected in the data exist,
and that mutation from any such allele to any other is equally likely.
The Jukes-Cantor DNA model, for example, is a K-Allele model for k=4. </P>
<H5> Mixed K-Allele/Stepwise model </H5>
<P> The Mixed K-Allele/Stepwise model (MixedKS) considers both
mutation to adjacent states (like the Stepwise model) and mutation
to arbitrary states (like the K-Allele model). The relative
frequency of the two is expressed by the proportion constant percent_stepwise,
available as menu option <b>L</b>. It indicates the proportion
of changes which are stepwise, so that percent_stepwise=0 is K-Allele and
percent_stepwise=1 is Stepwise. An initial value can be
set here, and either used throughout the run, or optimized at
the end of every chain if the Optimize (<b>O</b>) option is set. The
program finds the value of percent_stepwise that maximizes the likelihood
of the data on the final genealogy of each chain, using a
bisection approach.</P>
<H4>Model-specific menus: K-Allele data</H4>
<P> The single model available for K-Allele data is the K-Allele model.
"K-allele data" is defined as any genetic data collected as discrete units,
such as electrophoretic data or phenotypic data. As for microsatellite
data, the K-allele model assumes equally likely one-step mutations from any
state to any other state.</P>
<hr>
<H3><A NAME="analysis"> Analysis </A></H3>
<p><img src="images/LamarcAnalysisScreen.png" alt="LAMARC analysis screen"/></p>
<P> The Analysis option leads to a submenu that will allow you to specify
the evolutionary forces you're going to infer, as well as the starting
values, constraints, and profiling options for each force's parameters.
More or less options will appear here depending on your data. If there is
more than one population, you will have an <b>M</b> option fo estimating Migration parameters. Similarly, if you have more than one region in your data, you can turn on or
off estimation of varying mutational rates over regions (gamma), and if you
have trait data, you can set up your mapping analysis.</P>
<P> Each force currently in effect is marked as Enabled, and forces not in
effect are marked as Disabled. If you wish to add or remove a force, or to
change the parameters associated with a force, enter that force's
submenu.</P>
<P> One point to bear in mind is that
for nucleotide data the mutation rate mu is always expressed as mutation
per site, not mutation per locus as in many studies. You may need to do a
conversion in order to compare your results with those from other
studies.</P>
<P> Each force is explained below, and following that is a description of
the various options available on the submenus: constraints, profiling, and
Bayesian priors. For more information on evolutionary forces, consult the
<A HREF="forces.html"> forces </A> documentation.</P>
<H4> <A NAME="theta">Theta (Effective Population Size): the "Coalescence"
force </A></H4>
<P> Coalescence is obligatory on all data sets, so there is no provision
for turning it off.</P>
<P> The Theta submenu allows you to customize estimation of Theta, the
number of heritable units in the population times the neutral mutation rate
times two. This is 4N<sub>e</sub>mu for ordinary diploid data,
N<sub>e</sub>mu for mitochondrial data, and so forth. </P>
<P> Starting values of Theta cannot be less than or equal to zero. They
should not be tiny (less than 0.0001), because the program will take a long
time to move away from a tiny starting value and explore larger values.</P>
<P> This program provides Watterson and FST estimates for use as starting
values. It should never be quoted as a correct calculation of
Watterson or FST, because if it finds the result unsatisfactory as a
starting value, it will substitute a default.</P>
<P> The <b>G</b> option allows you to hand-set all of the Thetas to the same initial
value. The <b>W</b> option allows you to set all of them to the Watterson value.
(This will cause re-computation of the Watterson value, and can take
several seconds with a large data set.) The <b>F</b> option allows you to set all
of them to the FST value. You can then fine-tune by using the numbered
options to hand-set the starting Thetas of individual populations. The FST
option is only available if there is more than one population.</P>
<H4> <A NAME="growth">Growth parameters: the "Growth" force </A></H4>
<P> This submenu allows you to turn on and off estimation of population
growth rates, and to set starting parameters. </P>
<P> If there is a single population in your data, Lamarc will estimate a
single growth rate for it. If there are multiple populations, Lamarc will
estimate one independent growth rate per population.</P>
<P> If we label growth as <i>g</i>, then the relationship between Theta
at a time <i>t</i> > 0 in the past and Theta at the present day (<i>t</i> = 0)
is:</P>
<center>Theta<sub><i>t</i></sub> = Theta<sub>present day</sub> e<sup>-<i>gt</i></sup></center>
<p>This means that a positive value of <i>g</i>
represents a growing population, and a negative value, a shrinking one. </P>
<P> Time is measured in units of mutations (i.e., 1 <i>t</i> is the average
number of generations it takes one site to accumulate one mutation), and
<i>g</i> is measured in the inverse units of time. If mu is known, divide
generations by mu to get units of <i>t</i>, or conversely, multiply
<i>t</i>*mu to get a number of generations.</P>
<P> Starting parameter input for growth is similar to that for Theta,
except that no quick pairwise calculators are available; you will have to
either accept default values or enter values of your own. Avoid highly
negative values (less than -10) as these have some risk of producing
infinitely long trees which must then be rejected.</P>
<H4> <A NAME="migration">Migration parameters and model: the "Migration"
force </A></H4>
<P> This submenu allows you to customize estimation of the migration rates
among your populations. The rates are reported as <i>M</i> = <i>m</i>/mu,
where <i>m</i> is the immigration rate per generation and mu is the neutral
mutation rate per site per generation. Note that many other programs
compute 4<i>N<sub>e</sub>m</i> instead; be sure to convert units before
making comparisons with such results.</P>
<P> You do not have the option to turn migration on and off; if there is
only one population migration must be off, and if there is more than one
population then migration must be on. (Otherwise there is no way for the
genealogy to connect to a common ancestor.) </P>
<P> The main choice for migration is how to set the starting values for
the migration parameters. You can use an <i>F<sub>ST</sub></i>-based
estimator or hand-set the values, either hand-setting all to the same
value, or setting each one individually. </P>
<P> The <i>F<sub>ST</sub></i> estimator does not always return a sensible
result (for example, if there is more within-population than
between-population variability), and in those cases we substitute an
arbitrary default value. If you see strange <i>F<sub>ST</sub></i>
estimates you may wish to hand-set those values. Please do not quote
LAMARC as a source of reliable <i>F<sub>ST</sub></i> estimates, since we
do not indicate which have been replaced by defaults.</P>
<P> The final menu entry sets the maximum number of migrations allowed in a
genealogy. An otherwise normal run may occasionally propose a genealogy
with a huge number of migrations. This could exhaust computer memory; in
any case it would slow the analysis down horribly. Therefore, we provide a
way to limit the maximum number of migrations. This limit should be set
high enough that it disallows only a small proportion of genealogies, or
it will produce a downward bias in the estimate of <i>M</i>.</P>
<P> If you find that you are sometimes running out of memory late in a
program run that involves migration, try setting this limit a bit lower.
If you find, on examining your runtime reports, that a huge number of
genealogies are being dropped due to excessive events, set it a bit
higher. (The "runtime reports" are the messages displayed on the screen
while the Markov chains are evolving; a copy of these messages is provided
at the end of each output file.) You may also want to try lower starting
values if many genealogies are dropped in the early chains.</P>
<H4> <A NAME="recombination">Recombination parameter: the "Recombination"
force </A></H4>
<P> This submenu allows you to customize estimation of the recombination
rate parameter <i>r</i>, defined as <i>C</i>/mu where <i>C</i> is the
recombination rate per
site per generation and mu is the neutral mutation rate per site per
generation. We do not currently allow segment-specific or
population-specific recombination rates; only one value of <i>r</i> will be
estimated.</P>
<P> The first menu line allows you to turn recombination estimation on and
off. Estimating recombination slows the program down a great deal, but if
recombination is actually occurring in your data, allowing inference of
recombination will not only tell you about recombination, but may improve
inference of all other parameters.</P>
<P> You cannot estimate recombination rate if there is only one site, and
in practice you cannot estimate it unless there is some variation in your
data--at least two variable sites. Your estimate will be very poor unless
there are many variable sites.</P>
<P> The <b>S</b> option allows you to set a starting value of <i>r</i>. No
pre-calculated value is available, so your choices are to set it yourself
or accept an arbitrary default.</P>
<P> Starting values of <i>r</i> should be greater than zero. If you do not
want to infer recombination, turn the recombination force off completely
instead. If you believe that <i>r</i> is zero, but wish to infer it to
test this belief, start with a small non-zero value such as 0.01. It is
unwise to set the starting value of <i>r</i> greater than 1.0, because the
program will probably bog down under huge numbers of recombinations as a
result. A rate of 1 would indicate that recombination is as frequent as
mutation, and this state of affairs cannot generally be distinguished from
complete lack of linkage.</P>
<P> The <b>M</b> option sets the maximum number of recombinations allowed
in a genealogy. An otherwise normal run may occasionally propose a
genealogy with a huge number of recombinations. This could exhaust
computer memory; in any case it would slow the analysis down horribly.
Therefore, we provide a way to limit the maximum number of recombinations.
This limit should be set high enough that it disallows only a small
proportion of genealogies, or it will produce a downward bias in the
estimate of <i>r</i>.</P>
<P> If you find that you are sometimes running out of memory late in a
program run that involves recombination, try setting this limit a bit
lower. If you find, on examining your runtime reports, that many
genealogies are being dropped due to excessive events, set it a bit
higher. (You may also want to try lower starting values if many
genealogies are dropped in the early chains.)</P>
<H4> <A NAME="gamma">Gamma parameter: allowing the background mutation rate
to vary over regions</A></H4>
<P> If you suspect that the mutation rate varies between your genomic
regions, but do not know the specifics of how exactly it varies, you can
turn on estimation of this force to allow for gamma-distributed rate
variation. The shape parameter of the gamma ('alpha') can be estimated, or you
can set it to a value you believe to be reasonable. While the gamma
function is a convenient way to allow for different types of variation, it
is unlikely that the true variation is drawn from an actual gamma
distribution, so the important thing here is mainly that you allow mutation
rates to
vary, not necessarily which particular value is estimated for the shape
parameter. For more information, see the section, <A
HREF="gamma.html">"Combining data with different mutation rates"</a>.</P>
<H4> <A NAME="trait"> Trait Data analysis</A></H4>
<P> This section provides the capability to map the location of a
measured trait within one of your genomic regions. You will need to have
provided trait data in your input file. For more details about trait
mapping, see the <A HREF="mapping.html">mapping documentation.</A> <P>
<P> The Trait Data analysis menu will show you all of the traits which
you have provided data for and can attempt to map, with an indication
of which genomic region each is in. To modify the model for a trait,
choose that trait by number; you will be taken to a specific menu
for mapping that trait. It will start by reminding you of the trait
name, and then show the type of analysis you are using. The two
kinds of mapping analysis are discussed in more detail in <A HREF="mapping.html">
"Trait Mapping."</A> As a brief reminder, a "float" analysis
collects the trees without use of the trait data, and then finds the
best fit of trait data to trees after the fact. A "jump" analysis
assigns a trial location to the trait and then allows it to be reconsidered
as the run progresses.</P>
<P> In this menu, you can also restrict the range of sites which you
are considering as potential locations for your trait. For example,
you may be quite sure that the trait is not located in the first 100
sites, but you still wish to analyze them because they may add useful
information about Theta and recombination rate. You can remove the
range 1-100 from consideration using the <b>R</b> option. You can also
add sites to consideration using the <b>A</b> option: for example, if you
know that your trait is either in sites 1-100 or 512-750, one approach
is to remove all sites, then add these two ranges specifically.</P>
<P> If you have turned on a "jump" analysis, the necessary rearrangement
strategies will appear in the Strategy:Rearrangement submenu. You may
wish to inspect them and make sure that you like the proportion of
effort used for each strategy.</P>
<H4> <A NAME="divergence">Divergence parameters and model: the "Divergence"
force </A></H4>
<p>
The only value that can be edited in Divergence is
Epoch Boundary Time (scaled by the mutation rate) of each Divergence event. You can
set starting values and priors for these as usual. There are no
constraints available for these parameters. If you wish
to redefine the Ancestor/Descendent relationships you need to either return to the <A HREF="converter.html">file converter</A> or edit the input file XML.
</p>
<H4> <A NAME="divergencemigration">Divergence-Migration parameters and model: the "Divergence-Migration"
force </A></H4>
<p>
This force is presented exactly the same way that a regular migration matrix is except there
are also entries for Ancestor populations. Note that even though you can potentially enter
migration rates between invalid population pairs (for example an ancestor and one of its children)
these will be ignored by the calculation. (Be warned that if you manage to create an XML input
file with values for migration rates between invalid pairs, for example by hand-editing your
XML, the program will produce confused and meaningless results.) Also note that pairwise calculators
for starting values are not available for cases with divergence.
</p>
<H4> Options common to all Force submenus </H4>
<P> Three options are available on all Forces submenus (except
for Trait mapping and Divergence), and they all behave
in the same fashion. Constraints allow you to hold parameters constant or
constrain them to be equal to other parameters. Profiling affects the
reported support intervals around the estimates (and can affect how long it
takes the program to run). If you are running LAMARC in <A
HREF="bayes.html">Bayesian mode</A>, the Bayesian Priors menu allows you to
set the priors for the parameters.</P>
<h5><A NAME="constraints">Constraints</A></h5>
<P> Beginning with version 2.0 of LAMARC, we allow constraints on all
parameters. All parameters can be unconstrained (the default, and
equivalent to pre-2.0 runs), constant, or grouped. Grouped parameters all
have the same starting value, and can either be constrained to be identical
(and vary together as a unit), or be set constant. In addition, we allow
some parameters to be set 'invalid', which means 'constant, equal to zero,
and not reported on in the output'.</P>
<P> Say, for example, you know that the recombination rate for your
population is 0.03. In this case, you can set the recombination starting
value to 0.03 and set the recombination constraint to 'constant'. Or say
you have a set of populations from islands in a river; you may know that
all downstream migration rates will be equal to each other, and that all
upstream migration rates will be equal to each other. In this case, you
can put all the downstream rates together in one group, all the upstream
rates together in another group, and set each group's constraint to
'identical'. If you have another set of populations and know that
migration is impossible between two of them, you could set those migration
rates to be 'invalid' (or simply set them constant and set the starting
values to zero).</P>
<P> In general, a LAMARC run with constraints will be somewhat faster than
one without, since fewer parameters have to be estimated. This can be
particularly helpful for complex systems where you already have some
information, and are interested in estimating just a few parameters.
Unfortunately, constraints are not available at this time for the
Epoch Boundary Time parameters.</P>
<P> Select 'C' to go to the Constraints sub-menu for any force. To change
the constraint on a particular parameter, enter that parameter's menu index
number. To group parameters, pick one of them and enter 'A' (Add a
parameter), then the number of your parameter, then 'N' (for a new group).
Then pick another parameter that should be grouped with the first one,
enter 'A' again, the number of your new parameter, then the group number of
the group you just created (probably 1). Groups are created with the
automatic constraint of 'identical', meaning that they will vary, but be
co-estimated. You may also set a group 'constant', which has the same
effect as setting the individual parameters constant, but guarantees they
will all have the same value.</P>
<h5><A NAME="profiling">Profiling</A></h5>
<P> Each force's Profiling option (<b>P</b>) takes you to a sub-menu where you
can adjust how LAMARC gives you feedback about the support intervals for
your data. Setting the various profiling options is important in a
likelihood run, since it is the only way to obtain confidence limits of
your estimates, and can drastically affect total program time. It is less
important in a Bayesian run, since the produced curvefiles have the same
information, and profiling simply reports key points from those curves (and
it takes essentially no time to calculate, as a result). Profiling is
automatically turned on in a Bayesian run, and it doesn't make a lot of
difference which type of profiling is used in that instance, so most of the
discussion below will be most applicable to a likelihood run.</P>
<P> For each force, you can turn profiling on (<b>A</b>) or off (<b>X</b>) for all
parameters of a given force, though you cannot profile any parameter you
set constant. The next option (<b>P</b>), toggles between percentile and
fixed-point profiling. Selecting this option will cause all parameters
with some sort of profiling on to become either percentile or fixed. You
can turn on and off profiling for individual parameters by selecting that
parameter's menu index number.</P>
<P> Both kinds of profiling try to give some information about the shape of
the likelihood (or posterior, in a Bayesian analysis) curve, including both
how accurate the estimates are, and how tightly correlated estimates for
the different parameters are.</P>
<P> Fixed-point profiling considers each parameter in turn. For a variety
of values of that parameter (five times the MLE, ten times the MLE, etc.)
it computes the optimal values for all other parameters, and the log
likelihood value at that point. This gives some indication of the shape of
the curve, but the fixed points are not always very informative. In the
case of growth, some values are set to multiples of the MLE, while others
are set to some generally-useful values unrelated to the MLE, such as
0. </P>
<P> Percentile profiling, instead of using fixed points, gives you values
which the value of your parameter is X% likely to fall below. A value
for theta of 0.00323 at the .05 percentile means that the true value of
theta has only a 5% chance of being less than or equal to 0.00323, and a
95% chance of being greater than 0.00323. In a likelihood run, LAMARC will
then calculate the best values for all other parameters with the first
parameter fixed at that percentile. If the above example was calculated in
a run estimating recombination and growth rates, for example, LAMARC will
calculate the best values for recombination and growth if theta had been
0.00323. This gives a much nicer picture of the shape of the curve, but it
is very slow. If you use percentile profiling for likelihood, expect
it to take a significant proportion of your total run time.</P>
<P> The accuracy of the percentile profiling in a likelihood run is
dependent on the likelihood surface for your data fitting a Gaussian in
every dimension. When the surface is Gaussian, the percentiles for each
parameter can be determined by finding the values which result in
particular log likelihoods. For example, the .05 percentile is
mathematically defined to be the point at which the log likelihood is
exactly 1.35 less than the log likelihood for the MLE, while the .25
percentile can be found at the point where the log likelihood is exactly
0.23 less. LAMARC finds the values for which these likelihoods can be
found, but cannot determine whether the actual likelihood surface for your
data has a truly Gaussian shape. Hence, percentile profiling cannot be
used to report absolute confidence intervals, but it is at least a step in
that direction.</P>
<P> You may want to turn off profiling completely, or used fixed-point
profiling, for exploratory runs. Percentile profiling gives the best
information for final runs, but may be too slow. If you save your data to
a summary file (see <A HREF="menu.html#summary">summary files</a>), you can
go back and change the profiling options in a subsequent run, which then
won't have to recalculate the Markov chains; it will merely calculate new
profiles.</P>
<P> If you turn off profiling, you will lose both the profile tables
themselves and the approximate confidence intervals in the MLE tables. A
good compromise is to set the <A HREF="menu.html#output">output file
verbosity</a> to "concise", which causes LAMARC to only calculate two
profiles (for percentile profiling, the 95% support intervals) instead of
about 10.</P>
<h5>Bayesian Priors</h5>
<P> If you are running LAMARC in Bayesian mode (see the <A
HREF="menu.html#search">Search Strategy</a> menu), each force will have the
option to edit the Bayesian priors (<b>B</b>) for that force. A more detailed
discussion of a Bayesian run can be found <A
HREF="menu.html#search">below</a>, as well as in the <A
HREF="bayes.html">tutorial</a>.
<hr>
<H3><A NAME="search"> Search Strategy </A></H3>
<p><img src="images/LamarcSearchScreen.png" alt="LAMARC search strategy screen"/></p>
<P> This menu allows you to fine-tune your search strategy, to get
the best results from LAMARC with the minimal time. Consider tuning these
settings if you are not satisfied with the performance of your searches.
For advice on choosing the best settings here, see the article <A
HREF="search.html">"Search Strategies."</A> </P>
<P> The first option in the Search Strategy menu (<b>P</b>, 'Perform Bayesian or
Likelihood analysis') toggles your setup between a likelihood run and a
Bayesian run. This choice can have a profound impact on the course of your
run, though hopefully both have a reasonable chance of arriving at the
truth at the end. A likelihood run (the only option for versions of LAMARC
earlier than 2.0) searches tree-space with a fixed set of 'driving values'
per chain, and searches the resulting likelihood surface to find the best
set of parameter estimates. A Bayesian run searches tree-space at the same
time as it searches parameter-space, then uses its parameter-space search as
a Bayesian posterior to report the best values for individual parameters.
For more details about a Bayesian search with some comparisons to the
likelihood search, see the <A HREF="bayes.html">Bayesian tutorial</a>.
<h4><a NAME="priors">Bayesian priors menu</a></h4>
<P> If you have elected to run a Bayesian search, you will get the option
(<b>B</b>) to set the priors for the various forces in your data. Selecting the
option will take you to a sub-menu listing all active forces and a summary
of the current priors for each force. Once you select a particular force,
you get the option to edit the default prior for that force (<b>D</b>), and a
list of parameters, each of which may be selected to edit that parameter's
prior.</P>
<P> When editing the prior for a particular parameter, you may select
whether you wish to use the default prior with the <b>D</b> option, re-setting the
current prior to the default. For all priors, you may then set three
options: the shape of the prior (<b>S</b>), which may be linear or (natural)
logarithmic, and the upper (<b>U</b>) and lower (<b>L</b>) bounds of the prior. There
is currently no provision for other prior shapes.</P>
<h4><a NAME="rearrangers">Rearrangers menu</a></h4>
<P> Selecting <b>R</b> from the Search Strategy menu takes you to a sub-menu where
you can set the relative frequencies of the various arrangers. The main
arranger in a LAMARC run is the Topology rearranger (<b>T</b>), which is the
main tree rearranger. This rearranger works by selecting and breaking a
branch of its current tree, then re-simulating that branch to add it back
to the tree. It should almost always be set greater than the other tree
rearrangers (the size and hapolotype arrangers), and any decrease in its
relative frequency probably requires a concomitant increase in chain
length (see <A HREF="menu.html#chains">sampling strategy</a>, below).
<P> A new arranger for version 2.0 is the Tree-Size rearranger (<b>S</b>). This
rearranger leaves the topology of the tree constant, but re-samples branch
lengths root-wards below a selected branch (biased with a triangular
distribution towards the root). Our initial experiments with this
rearranger indicate that it is helpful in a variety of situations, but
particularly helpful for runs with growth and migration. It should be used
sparingly, however: we've found setting this rearranger's frequency to 1/5
that of the topology rearranger is a generally good ratio.</p>
<P> If your data appears to have phase-unknown sites,
you will have the option to set the relative frequency of the Haplotype
rearranger (<b>H</b>). The haplotype rearranger considers new phase assignments
for a pair of haplotypes. Like the tree-size rearranger, setting this
frequency to 1/5 that of the topology rearranger has been found to produce
good results. </p>
<P> If you have chosen to do a Bayesian run, you will have the option to
set the relative frequency of the Bayesian rearranger (<b>B</b>). This
rearranger considers new parameter values on the current tree. By default,
this is set to the same frequency as the topology rearranger, and this
seems to be adequate for runs with a small number of variable parameters.
This can be increased for runs with a larger number of parameters, but you
probably don't want a relative frequency of more than 2-3 times that of the
topology arranger--increase the <A HREF="menu.html#chains">length of your
chains</a> instead.
<P> If you are doing trait mapping using the "jump" strategy (in which
the trait alleles are assigned a chromosomal location, and this location is
reconsidered during the run) two additional rearrangers become available.
The Trait haplotypes rearranger (<b>M</b>) allows reconsideration of
ambiguous trait haplotypes: for example, it can switch between
DD, Dd and dD as haplotype resolutions for someone showing a dominant
phenotype. The Trait Location rearranger (<b>L</b>) moves the trait
position within the region. We have little information about the
best proportions of effort to put into these two rearrangers, but the
Trait Location rearranger probably needs to get at least 20% effort
to function well. These arrangers are not needed in "float" mapping
or in non-mapping runs and will not be shown.</P>
<H4> <A NAME="chains">Sampling Strategy (chains and replicates)</a></H4>
<P> This sub-menu allows you to adjust the time LAMARC spends sampling
trees. It can (and should) be adjusted to reflect whether you want an
'exploratory' run vs. a 'production' run, how complicated your parameter
model is, and whether you are performing a likelihood or Bayesian run.
Options germane to each of the above scenarios will be discussed in turn.
<P> The first option (<b>R</b>) allows you to use replication--repeating the
entire analysis of each genomic region a number of times, and consolidating the
results. This is more accurate than running LAMARC several times and
attempting to fuse the results by hand, because LAMARC will compute profiles
over the entire likelihood surface, including replicates, instead of
individually. It will, of course, increase the time taken by the current
run in proportion to the number of replicates chosen (both the time spent
generating chains and, typically, the time spent profiling the
results). The minimum number of replicates is one, for a single run-through
of LAMARC. A reasonable upper limit is 5 if your runs produce reasonably
consistent results, though you may want to use a higher number to try to
overcome more inconsistent runs. Replication is useful for 'production'
runs more than exploratory runs, and can help both likelihood and Bayesian
searches.</P>
<P> LAMARC's search is divided into two phases. First, the program will
run "initial" chains. In a likelihood run it is useful to make
these relatively numerous and short as they
serve mainly to get the driving values and starting genealogy into a
reasonable range. When all the initial chains are done, the program will
run "final" chains. These should generally be longer, and are used to
polish the final estimate. Exploratory runs can have both short initial
and short (but somewhat longer) final chains, and production runs should have
longer sets of both chains. Because a likelihood run is highly dependent
on the driving values, you will probably need several initial chains (10 is
a reasonable number), and a few final chains (the default of 2 usually
works). A Bayesian run is not at all dependent on the driving values, and
while you might use several initial chains for an exploratory run just to
see what's happening with the estimates, you should probably simply use
zero or one initial chains and one quite-long final chain to obtain your
production estimates.</P>
<P> For both initial and final chains, there are four parameters to set.
"Number of chains" determines how many chains of that type will be run.
"Number of recorded genealogies" determines how many genealogies (in a
likelihood run), while "Number of recorded parameter sets" determines how
many sets of parameters (in a Bayesian run) will actually be used to make the
parameter estimates. "Interval between recorded items" determines how many
rearrangements will be performed between samples. Thus, if you ask for 100
recorded items per chain, and set the interval between them to 20, the program will
perform a total of 2000 rearrangements, sampling 100 times to make the
parameter estimates. The total number of samples will determine the length
of your run, and can be shorter for exploratory runs but should be long
enough to produce stable results for production runs. In a Bayesian run,
as mentioned, you will want one, long chain for your production run. If
you are seeing spurious spikes in your curvefiles, you probably need to
increase the sampling interval, too--because each rearrangement only
changes a single parameter (and also takes time to rearrange trees),
certain parameters can stay the same simply by neglect, and will end up
being oversampled in the output. Increasing the sampling interval can
overcome this artifact.</P>
<P> "Number of samples to discard" controls the burn-in phase before
sampling begins. LAMARC can be biased by its initial genealogy and initial
parameter values, and discarding the first several samples can help to
reduce this bias. To continue with the example above, if you ask for 100
samples, an interval of 20, and 100 samples to be discarded, the program
will create a total of 2100 samples, throwing away the first 100 and
sampling every 20th one thereafter. In a likelihood run, you want the
burn-in phase to be long enough to get away from your initial driving
values, which will be longer the more complex your model is. In a Bayesian
run, you also want the burn-in phase to be long enough to get away from
your initial set of values and the initial genealogy. Again, this will
need to be longer if you have a more complex model with lots of
parameters.</P>
<H4> <A NAME="heating">Multiple simultaneous searches with heating</a></H4>
<P> The last menu item on the Search Strategy menu allows you to help your
search explore a wider sampling of trees by setting multiple
"temperatures." A search through possible trees at a higher temperature
accepts proportionally less likely trees, in the hopes that future
rearrangements will find a new, well-formed tree with a higher
likelihood. This approach will often rescue a search that otherwise
becomes stuck in one group of trees and does not find other
alternatives.</P>
<P> (The reason that the word "temperature" is used here may be understood
by means of an analogy. Imagine, on a snowy winter day, that there are
several snowmen on the lawn in front of a house, and you want to identify
the tallest one; you do not want to determine the exact height, you just
want to determine the tallest snowman. One way of doing this would be to
raise the temperature so that all of the snowmen melt; you could then
identify the tallest snowman as the one that disappears last. Using
multiple "heated" Markov chains simultaneously provides smoothed-out,
compressed views of the space of possible genealogy arrangements.)</P>
<P> To set multiple temperatures, select the <b>M</b> option (Multiple
simultaneous searches with heating) menu, then select <b>S</b> (Number of
Simultaneous Searches) and enter the number of temperatures you want. You
will then get a list of new menu options, and be able to set the various
temperatures. For best results, temperatures should progress in value
pseudo-exponentially. A typical series of temperatures might be "1 1.1 2 3
8", but different data sets might have different optimal magnitudes, from
"1 2 10 20 50" to "1 1.01 1.05 1.1 1.3". Watching the Swap Rates between
temperatures during the run is important for determining an optimal series
here--rates should vary somewhere between 10 and 40 (the numbers give are
percents). Below about 5% you are getting little return for a huge
increase in computation, and above 50% the two chains are so close to each
other that they are unlikely to be exploring significantly distinct areas
of parameter space (a process more efficiently handled by using <A
HREF="menu.html#chains">replicates</A>). </P>
<P> Should finding an optimal series of temperatures by hand become too
difficult, or if the optimal series of temperatures varies during a run,
LAMARC can be told to try to optimize the temperatures automatically, by
switching from "static" to "adaptive" heating (the <b>A</b> option that appears
if you have more than one temperature). With static heating, the
temperatures you specify will be used throughout the run. With adaptive
heating, the program will continually adjust the temperatures during the
run to keep swapping rates between 10% and 40%. We make no guarantees that
adaptive heating will be superior to static heating, but it should at least
drive the values to the right magnitudes, and keep them there during the
course of the run.</P>
<P> A second option that appears if you have multiple temperatures is the
ability to set the swap interval for different temperatures (<b>I</b>). The
default is "1", which means LAMARC picks two adjacent temperatures and
checks to see if the higher-temperature chain has a better likelihood than
the lower-temperature chain after each rearrangement. To check less
frequently, set this value to a higher number (3 for every third
rearrangement, etc.). A higher value will speed up the program incrementally,
but typically does not represent a significant gain.</P>
<P> In general, a run will increase in length proportionally to the number
of temperatures chosen, though the time spent profiling will be the same as
without.</P>
<hr>
<H3><A NAME="io"> Input and Output related tasks </A></H3>
<p><img src="images/LamarcIOScreen.png" alt="LAMARC I/O screen"/></p>
<P> This menu controls most all of the interactions between the program, the
computer, and the user. You can use it to modify the names and content of
files LAMARC produces, as well as the information printed to the screen
during a LAMARC run.</P>
<h4> Verbosity of Progress Reports </h4>
<P> This first option on the input and output menu controls the reports
that appear on-screen as the program runs. Type <b>V</b> to toggle among the
four options. NONE suppresses all output, and might be used when running
LAMARC in the background, after you know what to expect. CONCISE only
periodically reports about LAMARC's progress, noting that a new region
has begun, or that profiling has started. NORMAL adds some
real-time feedback on the program's progress and additionally guesses at
completion time for each program phase (the guesses are not always
completely reliable, however). VERBOSE provides the maximum amount of
feedback, reporting additionally on some of the internal states of the
program, including overflow/underflow warnings and the like. If something
has gone wrong with your LAMARC run, having this option set to VERBOSE is
your best chance at a diagnosis.</P>
<h4><A NAME="output">Output File Options</A></h4>
<P> The next menu item sends you to a submenu where you can set various
aspects of the final output file. Select <b>O</b> to go to this menu.</P>
<P> Selecting <b>N</b> allows you to set the name of the output report file.
Please be aware that if you specify an existing file, you will overwrite
(and destroy) its contents.</P>
<P> Selecting <b>V</b> allows you to toggle between the three levels of content
for the output file. VERBOSE will give a very lengthy report with
everything you might possibly want, including a copy of the input data (to
check to make sure the data were read correctly and are aligned). NORMAL
will give a moderate report with less detail, and CONCISE will give an
extremely bare-bones report with just the parameter estimates and minimal
profiling. We recommend trying VERBOSE first and going to NORMAL if you
know you don't need the extra information. CONCISE is mainly useful if
your output file is going to be read by another computer program rather
than a human being, or if speed is of the essence, since it speeds up
profiling in a likelihood run by 5.</P>
<P> The "Profile likelihood settings" option (<b>P</b>) leads to a new sub-menu
that lists all forces and gives you an overview of how they are going to be
profiled. You can turn on (<b>A</b>) or off (<b>X</b>) profiling for all parameters
from this menu, or set the type of profiling to percentile (<b>P</b>) or fixed
(<b>F</b>). The other menu options take you to the force-specific profiling
menus discussed <A HREF="menu.html#profiling">above</a>. </P>
<h4> <A NAME="menuinfile">Name of menu-modified version of input file</A></h4>
<P> The "Name of menu-modified version of input file" option (<b>M</b>) allows
you to change the name of the automatically-generated file which will be
created by LAMARC when the menu is exited ("menusettings_infile.xml", by default). This
file contains all the information in the infile, but also contains any
information that may have been set in the menu. If you want to repeat
your run with exactly the same options that you chose from the menu this
time, you can rerun using this file as your input file.</P>
<h4> <A NAME="summary">Writing and Reading Summary Files</A></h4>
<P> The next two menu items on the "Input and Output Related Tasks" menu
are used to enable or disable reading and writing of summary files. If
summary file writing is enabled, LAMARC saves the information it calculates
as it goes. There is enough to be able to recover from a failed run, or to repeat
the numerical analysis of a successful run. If a run fails while
generating chains, LAMARC will take the parameter estimates from the last
completed chain, use them to generate trees in a new chain, then use those
trees and the original estimates to start a new chain where it had crashed
before. In this scenario, LAMARC cannot produce numerically identical
results to what it would have produced had the run finished, but should
produce similar results for non-fragile data sets. However, if profiling
had begun in the failed run, the summary files do contain enough
information to produce numerically identical results, at least to a certain
degree of precision.</P>
<P> To turn on summary file writing, select <b>W</b> from the menu, then <b>X</b> to
toggle activation. The name of the summary file may be changed with the
<b>N</b> menu option. This will produce a summary file as LAMARC runs. To then
read that summary file, turn on summary file reading the next time LAMARC
is run (from the same data set!) with the <b>R</b> option from this menu, then
<b>X</b> to toggle activation, and finally <b>N</b> to set the correct name of the
file. Lamarc will then begin either where the previous run stopped, or, if
the previous run was successful, will start again from the Profiling
stage.</P>
<P> For particularly long runs on flaky systems, it may be necessary to
both read and write summary files. If both reading and writing is on,
LAMARC will read from the first file, write that information to the second
file, and then proceed. If that run is then stopped, the new file may be
used as input to start LAMARC further along its path then before. If this
option is chosen, be sure to have different names for the input summary file
and the output summary file.</P>
<P> If reading from a summary file, most of the options set when writing
the summary file must remain the same, or unpredictable results may occur,
including LAMARC mysteriously crashing or producing unreliable results.
However, since all profiling occurs after reading in the summary file,
options related to that may be changed freely. For example, in order to
get preliminary results, you may run LAMARC with "fixed" profiling,
"concise" output reports, and writing summary files on. Afterwards, if
more detail is needed about the statistical support of your estimates,
you may run LAMARC again, this time
with summary file reading, "percentile" profiling, and "verbose" output
files.</P>
<H4><A NAME="tracer">Tracer output</A></H4>
<P> LAMARC will automatically write files for the utility <A
HREF="http://tree.bio.ed.ac.uk/software/tracer/">Tracer</a>
written by Drummond and Rambaut (see the <A HREF="tracer.html">"Using
Tracer with LAMARC"</A> documentation in this package).
LAMARC's Tracer output files are named [prefix]_[region]_[replicate].txt.
You can turn on or off Tracer output and set the prefix here.</P>
<H4><A NAME="newick">NEWICK tree output</A></H4>
<P>If there is no migration or recombination, LAMARC can optionally
write out the tree of highest data likelihood it encounters for each
region, in Newick format. Options in this menu control whether such
a Newick file will be written, and what its prefix will be. This
option is not needed for normal use of the program, but it is sometimes
interesting to see what the best tree was, and how it compares with
the best tree found by phylogeny-inference programs. (Sometimes,
surprisingly, LAMARC is able to outdo normal inference programs.)</P>
<h4> <A NAME="curvefiles">Bayesian curvefiles</A></h4>
<P> A Bayesian run of LAMARC will produce additional output for each
region/parameter combination that details the probability density curve for
that parameter. Each file can be read into a spreadsheet program (like
Excel) to produce a graphic of your output. If you decide you don't have
enough disk space for these files, or don't want them for some other
reason, you can turn off this feature by toggling the <b>U</b> option ('Write
Bayesian results to individual files'). You can change the prefix given to
all of these curvefiles with the <b>C</b> option ('Prefix to use for all
Bayesian curvefiles'). Curvefile names are of the format
[prefix]_[region]_[parameter].txt, where '[prefix]' is the option settable
here, '[region]' is the region number, and '[parameter]' is the parameter
in question. More details about bayesian curvefiles are available in the
<A HREF="bayes.html#results">Bayesian tutorial</a>.
<H4><A NAME="reclocfiles">Recombination location output</A></H4>
<P> In runs modeling recombination, LAMARC can dump a file listing
each recombination location in every sampled tree in the last final chain.
A recombination between data position <tt>-13</tt> and <tt>-12</tt> is
recorded as <tt>-13</tt>, one between <tt>340</tt> and <tt>341</tt> is
recorded as <tt>340</tt>.
You can read the file into <tt>R</tt> or another statistical computing
tool, and plot a histogram to see where the recombinations are most
often accepted. Keep in mind that there is a slight bias to accept
recombinations near the ends of the input sequences, as there is less
data available to demonstrate a recombination there is unsupported.
</p>
<p>
These files are named [prefix]_[region]_[replicate].txt,
You can turn on or off this output and set the prefix here. (The default
prefix is 'reclocfile'.)
This option is off by default since they can produce a large amount of output.</P>
<hr>
<H3><A NAME="current"> Show current settings </A></H3>
<p><img src="images/LamarcOverviewScreen.png" alt="LAMARC overview screen"/></p>
<P> This menu option provides reports on all current settings, so that you
can see what you've done before starting the program. You cannot change
the settings here, but each display will indicate which menu should be used
to change the displayed settings. </P>
<P>(<A HREF="xmlinput.html">Previous</A> | <A
HREF="index.html">Contents</A> | <A HREF="regions.html">Next</A>)</P>
<!--
//$Id: menu.html,v 1.52 2013/11/08 23:09:53 mkkuhner Exp $
-->
</BODY>
</HTML>
|