File: troubleshooting.html

package info (click to toggle)
lamarc 2.1.10.1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 77,052 kB
  • sloc: cpp: 112,339; xml: 16,769; sh: 3,528; makefile: 1,219; python: 420; perl: 260; ansic: 40
file content (575 lines) | stat: -rw-r--r-- 29,781 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
<!-- header fragment for html documentation -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>

<META NAME="description" CONTENT="Estimation of population parameters using genetic data usi
ng a maximum likelihood approach with Metropolis-Hastings Monte Carlo Markov chain importanc
e sampling">
<META NAME="keywords" CONTENT="MCMC, Markov chain, Monte Carlo, Metropolis-Hastings, populat
ion, parameters, migration rate, population size, recombination rate, maximum likelihood">

<TITLE>LAMARC Documentation: Frequently asked Questions and Answers</title>
</HEAD>


<BODY BGCULOR="#FFFFFF" TEXT="#000000">
<!-- coalescent, coalescence, Markov chain Monte Carlo simulation, migration rate, effective
 population size, recombination rate, maximum likelihood -->


<P>(<A HREF="upcoming.html">Previous</A> | <A HREF="index.html">Contents</A>
| <A HREF="messages.html">Next</A>)</P>

<H2>Troubleshooting LAMARC </H2>

<P> This article lists some common sources of trouble, and suggestions on
how to fix them.</P>

<H3> LIST OF FAQS: </H3>
<OL>
<LI> <A HREF="troubleshooting.html#Q1"> The program will not compile on my machine.</A></LI>
<LI> <A HREF="troubleshooting.html#Q2"> The program says it can't find my data file, but it's right here.</A></LI>
<LI> <A HREF="troubleshooting.html#Q3"> My data file can't be read at all.</A></LI>
<LI> <A HREF="troubleshooting.html#Q3.1"> The data converter mangles my file.
</A></LI>
<LI> <A HREF="troubleshooting.html#Q4"> The program crashes early or complains about lack of memory.</A></LI>
<LI> <A HREF="troubleshooting.html#Q5"> The program runs much too slowly.</A></LI>
<LI> <A HREF="troubleshooting.html#Q6"> How can I tell if I've run the program long enough?</A></LI>
<LI> <A HREF="troubleshooting.html#Q7"> Some of my parameter estimates are ridiculously high--ten or twenty digits.  This can't be right.</A></LI>
<LI> <A HREF="troubleshooting.html#Q8"> My estimates have enormous error bars.</A></LI>
<LI> <A HREF="troubleshooting.html#Q9"> What does Theta mean if I have mtDNA (mitochondrial DNA) instead of nuclear DNA?  Do I need to divide it by four?</A></LI>
<LI> <A HREF="troubleshooting.html#Q10"> The program works when I use a small or low-polymorphism data set, but crashes on a larger or higher-polymorphism data set.</A></LI>
<LI> <A HREF="troubleshooting.html#Q10.0.1"> I get a long warning message stating that my 'data may be difficult to model'--what does this mean?</A></LI>
<LI> <A HREF="troubleshooting.html#Q10.0.2"> My profile likelihood tables look ragged rather than smoothly curved--is something wrong?</A></LI>
<LI> <A HREF="troubleshooting.html#Q10.1"> The program stops with a message 'Unable to create initial tree. Starting parameter values may be too extreme;'--what does this mean?</A></LI>
<LI><A HREF="troubleshooting.html#Q10.2"> Which microsatellite model should I use?</A></LI>
<LI> <A HREF="troubleshooting.html#Q11"> How can I do a likelihood ratio test using LAMARC?</A></LI>
<LI> <A HREF="troubleshooting.html#Q12"> Why can't the program use other data types, other data models, or other evolutionary forces?</A></LI>
<LI> <A HREF="troubleshooting.html#Q13"> What happened to the 'Normalize' option in previous versions of LAMARC?</A></LI>
<LI> <A HREF="troubleshooting.html#Q14"> Does LAMARC use 'site 0'? Do I?</A></LI>
<LI> <A HREF="troubleshooting.html#QLAST"> How can I report a bug or inadequacy of the program or documentation?</A></LI>
</OL>

<OL>
<LI> <A NAME="Q1"><B> "The program will not compile on my machine."</B></A> </LI>

<P> This is covered in a separate article, "<A
HREF="compiling.html">Compiling LAMARC</A>".  You may also want to see if
one of our <A
HREF="http://evolution.gs.washington.edu/lamarc/download.html">pre-made
executables</A> will work for you.</P>

<LI> <A NAME="Q2"><B> "The program says it can't find my data file, but it's right here."
</B></A></LI>

<P> Check to see if your filename has an invisible extension.  LAMARC
does not think that "infile.xml" and "infile" are the same.  Also
check to make sure your file is in the folder or directory you think
it is.</P>

<LI> <A NAME="Q3"><B> "My data file can't be read at all; the program
crashes immediately or prints errors that have nothing to do with anything
in my file."</B></A></LI>

<P> Did you save your input file as a Word document, RTF, or some other
fancy format?  It needs to be plain unformatted text.</P>

<P> An early crash may also be a symptom of lack of memory; see the
<A HREF="troubleshooting.html#Q4"> out of memory</A> section.</P>

<LI> <A NAME="Q3.1"><B> "The data converter mangles my file." </B></A> </LI>

<P>Check to see if you are using the correct option for "interleaved"
versus "sequential" data in conversion.  Interleaved data presents
the first line of sequence 1, then the first line of sequence 2...
and eventually the second line of sequence 1, sequence 2, etc.
Sequential data presents all of sequence 1, then all of sequence
2, and so forth.  Misrepresenting one as the other will cause your
sequence names to be treated as nucleotides and vice versa, with
disastrous results.</P>

<LI> <A NAME="Q4"><B> "The program crashes early or complains about lack of
memory." </B></A></LI>

<P> On many Macintosh systems you can use the Finder to allocate more
memory to a specific program, and you'll probably need to do this
for LAMARC; the defaults are too low.</P>

<P> In general, if you suspect that there's not enough memory, try a
smaller subset of your data for a trial run.  <B>Important:</B>  if you
decide that you need to produce your final results based on a
subsample of your data, the subsample <B>must</B> be random.  It is
allowable to leave out whole genetic regions or populations, but
if you decide instead to leave out individual sequences or sites,
choose them randomly.  Leaving out the "boring" invariant sites
or identical sequences will severely distort the results.  Similarly,
if you leave out genetic regions, choose them at random; don't
preferentially choose the least polymorphic ones.</P>

<P> Decreasing the number of sampled genealogies will reduce memory
demands somewhat, at a cost in accuracy.  You will want to
increase the interval between samples at the same time, so as to
make each sample as independent (and thus informative) as possible.</P>

<P> LAMARC is a large program and realistic cases will require a computer
with generous memory.  Our development machines have about 2 gigabytes
of RAM.  Probably under about 500 megabytes the program
will not work except for toy cases.</P>

<P> You may also want to consider whether you are asking for too many
populations and parameters; see below.</P>

<LI><A NAME="Q5"><B> "The program runs much too slowly."</B></A></LI>

<P> If you compiled LAMARC yourself from source code, optimization may
help (though some optimizers produce buggy code, so use at your
own risk).  The executables we supply are optimized to the best
of our ability.</P>

<P> Running a smaller case may help.  Please note that you cannot safely
leave out "boring" data such as invariant sites or identical individuals <A
HREF="troubleshooting.html#Q4">(details here)</A>.  We find that the
information value of additional individuals is quite low beyond twenty
individuals, so if you are running 50 individuals per population you can
probably cut them randomly down to 20 and get good results.</P>

<P> If the program has barely enough memory it may "thrash", wasting
a lot of time on memory management.  (You can often tell if
thrashing is occurring by listening to your computer; many will
whirr or rattle from the constant hard disc access.)  Adding
more memory may help.</P>

<P> If you are estimating recombination, and the program runs well
at first but then slows down, it may be adding more and more
recombinations to the genealogies.  You can set the
"maximum number of events" parameter lower, but doing so risks
a downward bias in your estimate.  It's a good solution to rare
slow-down periods, but not a good solution to a whole run full
of highly recombinant trees.  The latter may indicate that
your sequence spans too long a region and the ends are essentially
unlinked.  LAMARC is not able to meaningfully estimate the
recombination rate across unlinked sequences, and will bog down
if asked to try.  You can diagnose this problem by noticing
high "dropped" counts in the runtime reports.  (The "runtime 
reports" are given at the very end of your output file.  These 
contain information about possibly interesting things that 
happened while the program was running.)</P>

<P> Similarly, if you are estimating migration and the program bogs
down, you may have identified two groups as separate populations
which are really one panmictic population.  LAMARC cannot usefully
estimate the migration rate in this situation, and will bog down
trying.  Consolidating the problematic populations together may
get better results.  The program 
<a href="http://pritch.bsd.uchicago.edu/software.html">STRUCTURE</a> 
can be useful for
detecting non-differentiated populations.</P>

<P> Profiling is expensive, and switching to fixed-point rather than
percentile profiles, or eliminating profiles for some or all
parameters, will help considerably.  (But be sure you aren't
eliminating information that you really need.)  You should also
be aware that some profiles take longer than others, and the
estimate of time to finish profiling is very rough--it is not
unusual for profiling to take two or three times as long as
predicted, if the prediction happens to come from an easy
profile and there are several hard profiles in the set.</P>

<P> Setting the output file verbosity to "concise" should drastically
reduce the amount of time profiling takes, since the number of
profiles calculated for each parameter is two instead of eleven.  If you 
are writing a tree summary file, you will be able to re-load that file
and run with different profiling options later.</P>

<P> LAMARC is a computationally intensive approach and simply won't
succeed with really complex problems.  For example, if you have
twenty populations all exchanging migrants, you are trying to
estimate 400 parameters.  The amount of data required to do this
would be very high; the amount of computation would be staggering.
Try breaking your problem into subproblems.  Constraining sets
of these parameters to be zero, or to be identical, can greatly
reduce the complexity of the problem and increase your chance of
a good solution.  </P>

<P> Finally, it's worth asking yourself how long the data took to
collect.  If they took several years to collect, an analysis which
takes several weeks shouldn't seem too long.  Run small pilot
cases first to get an idea of the scale of the problem.</P>

<P> Some useful rules of thumb:</P>

<P> Adding more sequence length slows the program down, but less than
linearly with the amount of sequence.  This is the best way to
refine an estimate of recombination rate in a single region.</P>

<P> Adding more individuals slows the program down linearly with the
number of individuals, and you will also need to run more steps in
your chains to get equivalently refined results, as the search
space is bigger.  We find that 20 individuals per population is
usually enough, and we have never seen a use for more than 100.</P>

<P> Adding more genetic regions (loci) slows the program down linearly
with the number of regions.  This is far and away the most effective
at improving estimation of Theta or migration.  If you can choose
between adding more individuals or adding more regions, always add
more regions once you have 20 individuals per population.</P>

<P>If you have microsatellite data, the Brownian-motion approximation
is much faster than the stepwise model.  It is also a very good 
approximation except when population size is low.  The usual symptom
of breakdown in the Brownian model is data log-likelihood estimates
of 0.0.  If you see many of these, especially in the final chains of
your search, the Brownian approximation is not safe for your data and
will produce an upwards bias.  In all other cases, however, we
recommend it.</P>

<LI><A NAME="Q6"><B> "How can I tell if I've run the program long enough?" 
</B></A></LI>

<P> This is covered in a separate article, <A HREF="search.html">
"Search Strategy."</A></P>

<LI><A NAME="Q7"><B> "Some of my parameter estimates are ridiculously high--ten
or twenty digits.  This can't be right." </B></A></LI>

<P> It is possible for a data set to be so uninformative with
regard to migration (or, more rarely, recombination) that
the likelihood surface is flat, or almost flat.  This can
lead to an almost infinite estimate of the
parameter.</P>

<P> This is particularly common in migration cases where you are
trying to estimate too many parameters from a small amount of
data.  Consider a case where you have only 1 individual from
a certain population, and he turns out to have been a recent
migrant.  How big is that population?  What are its migration
rates to other populations?  LAMARC really can't tell, and this
is reflected by a flat likelihood surface.  You can verify
this by examining the profiling results.</P>

<P> If you think that some parameter really cannot be estimated,
holding it fixed at a reasonable value can rescue your ability
to estimate other parameters.</P>

<P> A second possible explanation is that you've run too few chains
or chains that are too short.  You can try running longer ones.</P>

<P> A third explanation, particularly for huge estimates of Theta,
is that your data aren't correctly aligned and so appear
much more variable than they should.  It can be helpful to
ask the program to echo back the input data, and examine it
for alignment problems.</P>

<P> If some of your estimates are huge, the rest may be all right,
but it is not wise to rely on this.  It's better to reduce
the problem until all estimates are reasonable.</P>

<P> LAMARC's strange behavior with inadequate data is not a program
bug; if the likelihood surface for the given data really is
flat, there's nothing the program can do to get an intelligent
estimate.  Running LAMARC in Bayesian analysis mode will produce text
files containing <A HREF="bayes.html#LnLpictures">portraits</A> of the likelihood surface; these files
can confirm whether the surface is flat.</P>

<LI><A NAME="Q8"> <B>"My estimates have enormous error bars."</B></A></LI>

<P> While this might possibly improve with a longer run, it is usually
an accurate reflection of your data.  (In fact, a too-short run
more often produces error bars that are narrower than they should
be.)  You might also try re-running with multiple replicates or
heating.</P>

<P> If possible, add more genetic regions.  If you can't do that, add
additional data to the regions (longer sequences) or more individuals.
In some cases (e.g. HIV sequences) additional individuals are the
only possible way to improve your data set, and you'll have to
be aware that you may never be able to get a really tight estimate.</P>

<P> Please do not ignore the error bars.  They are there for a reason.</P>

<LI> <A NAME="Q9"><B> "What does Theta mean if I have mtDNA (mitochondrial
DNA) instead of nuclear DNA?  Do I need to divide it by four?"</B></A></LI>

<P> Theta is always "number of heritable copies in the population * 2 * mu".
If you put in mtDNA, the value that comes out will be 2N<sub>f</sub> * mu,
where N<sub>f</sub> is the effective number of females.
You do <B>not</B> need to divide it by four.  A similar argument applies
to Y chromosome DNA.</P>

<P> If you have both mtDNA and nuclear DNA, be sure to indicate
to the program that they have different effective population sizes, either
by setting the effective population size of the mtDNA region to 1 and of the
nuclear DNA region(s) to 4, or by setting the effective population size of the
mtDNA region to .25 and of the nuclear DNA region(s) to 1.</P>

<P>Also note that if you collected data from different sections of the
mitochondrion, all data should be put in the same genomic region.  If the
relative mutation rates are different, you can put them in different
segments, but then put both segments together in the same region.  You will
seriously underestimate your support intervals if you claim that each
section is its own region.</P>

<LI><A NAME="Q10"><B> "The program works when I use a small or low-polymorphism data
set, but crashes on a larger or higher-polymorphism data set." </B></A></LI>

<P> This may be a symptom of running out of memory (see previous
questions).  You should also check whether your data are aligned correctly;
improperly aligned data will look like excessive polymorphism.</P>

<LI><A NAME="Q10.0.1"><B> "I get a long warning message stating that my
'data may be difficult to model'--what does this mean?"</B></A></LI>

<P>Some of the above items have discussed consequences of
<A HREF="troubleshooting.html#Q7">telling LAMARC that your data comes from
two populations when it really comes from one</A>,
<A HREF="troubleshooting.html#Q7">providing LAMARC with an inadequate amount
of data</A>, and <A HREF="troubleshooting.html#Q10">analyzing highly
polymorphic data</A>. These high-level, big-concept problems can trigger
low-level numerical problems which the program cannot relate to the
big picture; the best it can do is describe the low-level problem.</P>

<P>When performing a maximum-likelihood analysis, 
LAMARC searches the likelihood surface for its maximum height.
It does this once after each Markov chain, and several times 
more if parameter profiles are turned on.  In rare cases,
two shapes of surface can arise that are intractable and lead to warning
messages.</P>

<P>One problem case is a flat surface (discussed
<A HREF="troubleshooting.html#Q7">above</A>), or a surface that continues 
to rise beyond a reasonable value for one or more parameters.  This 
implies that your data has insufficient power to accurately estimate
the population parameters.  The following warning message may appear:
</P>

<P><PRE>
Warning!  Encountered a region of the log-likelihood surface in which the
log-likelihood increases steadily, seemingly without an upper bound.
This implies that your data is difficult to model.  The problematic
parameter is parameter &lt;your parameter name&gt;; it has been increased
or decreased to a value of  &lt;some number&gt;, and the maximum lnL,
if one exists, seems to lie beyond this value.  The maximization routine
is terminating....
</PRE></P>

<P>Another type of problem surface is very spiky with multiple
peaks and valleys.
This can result when combinations of
parameter values exceed some machine-specific threshold; for example, their
product can become too large to store in the allotted amount of memory, or
their quotient or difference can become too small to be distinguishable
from zero.  The following warning message
may appear:</P>

<P><PRE>
Warning!  Calculated a log-likelihood of &lt;some number&gt; for the
parameter vector p = (&lt;some numbers...&gt;), and determined that
a greater log-likelihood could be found in a certain direction, but
no greater log-likelihood was found in that direction.  This implies
that your data may be difficult to model, or that there is a problem
with lamarc.  The maximization routine is terminating....
</PRE></P>

<P>(Those interested in the math may like to know that the problem is
detected when the surface's gradient becomes inconsistent with the
surface's height.)</P>

<P>If you receive either of these warning messages, or a message very
similar to these, you may be able to ignore it if it only within one or two
of the earlier Markov chains in your series of chains.  The more reasonable
the ultimate results are, the safer it is to ignore warnings appearing early
or infrequently in your run.  If you receive this type of message late or
frequently in a run, then the ultimate results should be considered
dubious.</P>

<P>If you receive the "... increases steadily, seemingly without an
upper bound ..." warning, then you may be able to achieve better
results by reducing the number of parameters you are estimating,
or analyzing a subset of your data.  If you receive the "... no greater
log-likelihood was found in that direction ..." warning, then you can
try proceeding in the same manner, but troubleshooting is much more
difficult in this case.  We encourage you to contact us
and provide us with a copy of your data if you encounter this latter
warning: doing so would help us as we continue to research ways of
cleanly coping with these computational challenges.</P>

<LI><A NAME="Q10.0.2"><B> "My profile likelihood tables look ragged rather than smoothly curved--is something wrong?"</B></A></LI>

<P> Occasionally LAMARC, run in likelihood mode, encounters a likelihood
surface it simply can't maximize reliably, often because it has more
than one maximum.  One symptom of this is ragged profile tables where
the values of the parameters jump around from line to line rather
than increasing or decreasing smoothly.  When you see this, none of
your estimates, even the MLE, are completely reliable.  Ideas for
improving the situation include running the program longer (more
chains, longer chains or both) or reducing the number of parameters
you are trying to estimate.</P>

<P> The Bayesian mode of LAMARC, which maximizes its parameters one
at a time rather than jointly, is less prone to this but you may
see the very similiar symptom of curvefiles with multiple spikes in
them.  Again, collecting more samples by running LAMARC longer, or
simplifying the problem so that fewer samples are needed, are your
best bets.</P> 

<LI><A NAME="Q10.1"><B> "The program stops with a message
'Unable to create initial tree.  Starting parameter values may be too extreme; try using more conservative ones.'
--what does this mean?"</B></A></LI>


<P>The initial tree for the search (also called the "de novo tree")
is created based on the starting parameters (either calculated from
the data or provided by the user).  Attempts to make a de novo
tree may fail because too many migrations or recombinations are
put in.  The program will try 100 times to make a de novo tree,
but if every one of them has too many events it will give up in
order to avoid an infinite loop.</P>

<P>This error suggests that the current starting values for recombination
or migration are far too high, given the currently specified upper limits
on recombination or migration events.  A common cause is breakdown
of the FST calculation for migration rate.  Check the starting
values and make sure they are reasonable.  When in doubt, try
a slightly lower value; the program can adjust it upwards if
necessary.  Don't use extremely low values for migration (below 0.001) however;
these can cause the program to become stuck at low values for
a very long time.</P>

<LI><A NAME="Q10.2"><B> "Which microsatellite model should I use?" </B></A></LI>

<P> Try the Brownian-motion model first, since it is much faster.
Consider switching to the stepwise-mutation model if you see signs,
in the runtime reports, of failure of the Brownian approximation.
These take the form of data log-likelihoods of 0.0.  If many of these
appear, or any appear in the final chains, switch to the stepwise model.
You may want to start with the stepwise model if you know that your
population size(s) are very small, since this is the weak point of
the Brownian approximation.</P>

<LI><A NAME="Q11"><B> "How can I do a likelihood ratio test using LAMARC?" </B></A></LI>

<P> The short answer is that you can't.  The "likelihoods" produced
by the program are relative likelihoods, and they are meaningful
only within one run--there is no way to compare them across runs.
(They represent the answer to the question "How much better do
the sampled trees fit the maximum likelihood values than the
values they were sampled from?")</P>

<P> However, approximate confidence intervals based on the shape of
the curve are possible.  LAMARC presents these in two ways:  as
the percentile profiling in the MLE tables, and as full
profile-likelihood tables (if requested).  These should enable you
to get a picture of the uncertainty in your analysis.</P>

<LI><A NAME="Q12"><B> "Why can't the program use other data types, other data
models, or other evolutionary forces?" </B></A></LI>

<P> For version 2.0 we have included almost all of the 
commonly available mutational models.  We do not have provision for
RFLP or protein sequence data, because the existing maximum 
likelihood algorithms for these are agonizingly slow, or for 
AFLP data, because no one has
yet developed an AFLP maximum likelihood algorithm.  (If you succeed
in doing so, and it runs at a reasonable speed, we will be happy
to add it to LAMARC.)  Most other data
types can be accomodated with the K-Allele model.</P>

<P> New evolutionary forces are more difficult, but we will
be slowly increasing the number of forces supported.   Our
next major project is natural selection.  </P>

<P> If you are a programmer, you may also want to consider adding
new data types or models yourself.  We have tried to write LAMARC
in a modular fashion that will accommodate additions fairly well.
Only time will tell if we've succeeded.  Feel free to write
and ask questions about possible additions.  </P>

<LI><A NAME="Q13"><B> "What happened to the 'Normalize' option in previous
versions of LAMARC?"</B></A></LI>

<P> The program now automatically checks to see if normalization is needed,
and turns it on if so.  Normalization will not be needed for the majority of
data sets, and since it causes a significant decrease in speed if on (and
because the option was confusing to many of our users), we made control of
this option automatic.  If "verbose" runtime reports are selected, LAMARC
will note when this occurs.  If you feel that normalization is necessary
for your data, the option remains to turn it on in the <A HREF="xmlinput.html#normalize">XML</A>.</P>

<LI><A NAME="Q14"><B> "Does LAMARC use 'site 0'?  Do I?" </B></A></LI>

<P> To our consternation, we recently discovered that the common biological
naming convention is to call the site that's to the left of site 1, "site
-1" instead of site 0.  All versions of LAMARC prior to v2.1 do *not* follow
this convention, so if you claimed that one of your SNPs was at site -5, and
another SNP was at site 5, LAMARC would assume those SNPs were 11
nucleotides apart, and not 10.  This probably doesn't make a huge
difference, but it's probably worth fixing once you know.</P>

<P>As of version 2.1, the converter program lam_conv examines your data, and
if you never use a '0' for a 'map position' or a 'first position scanned'
(aka 'offset'), it assumes that you fall in the majority case, and that all
your negative sites are one base closer to the positive ones than we
previously believed.  When it creates a LAMARC input file, it adds one to
all your negative numbers, so if you tell the converter you have a SNP at
site -5, and then examine the LAMARC input file, you will see '-4' in the
list instead.</P>

<P>Because LAMARC usually doesn't report its results in terms of actual
sites, this change will likely be invisible to you, and the only difference
will be that LAMARC will now be a bit more precise.</P>

<P>However, if you're using our 2.1-introduced mapping feature, these
results are reported in terms of the sites where the trait has been mapped. 
As such, it's more important to know whether there is a 'site 0' or not,
assuming you have any negative map positions.  Here, we let you have it both
ways:  in the XML, under the 'format' tag, the converter writes a
'convert-output-to-eliminate-zero' tag, which is set to 'true' unless (as
noted) you ever used a '0' for a map position or first-position-scanned. 
When this is set 'true', LAMARC will assume you are following traditional
biologist convention, and convert its values to the 'non-zero' scale before
displaying them.  This means that if it tells you that your trait might be
mapped to sites "-1:1", it is talking about two sites, and not three.  It
also means that the final list of sites in the output file will skip
directly from -1 to 1:</P>

<pre>
-3         0.00079395
-2         0.00079395
-1         0.00078690
1          0.00078688
2          0.00078688
</pre>

<P>So, how can you tell if you yourself are using a system that includes a 0
or not?  If all you have are positive numbers, it makes no difference, and
you can safely ignore it.  If you got your numbers from a genome browser or
the like, it probably does not include a 0.  In fact, you probably only have
0's in your site lists if a) you made a mistake, b) you made up your own
system, or c) you are a tireless crusader for the forces of justice, with a
penchant for attaching yourself to Sisyphean challenges.  If you fall in the
latter category, we'd love to hear from you, if only to commiserate.  Which
brings us to...</P>


<LI><A NAME="QLAST"><B> "How can I report a bug or inadequacy of the program
or documentation?"
</B></A></LI>

<P> The easiest method is email to <A
HREF="mailto:lamarc@u.washington.edu">lamarc@u.washington.edu</a>. Please
tell us the exact symptoms of the bug, the operating system you're using,
and if possible, send a copy of the data file that produces the problem.  We
also appreciate questions that the documentation doesn't adequately address
or is unclear or hard to find, as this allows us to improve the
documentation for the next release.
</P>

</OL> <P>(<A HREF="upcoming.html">Previous</A> | <A
HREF="index.html">Contents</A> | <A HREF="messages.html">Next</A>)</P>

<!--
//$Id: troubleshooting.html,v 1.34 2012/05/14 19:55:38 ewalkup Exp $
-->
</BODY>
</HTML>