File: converter.html

package info (click to toggle)
lamarc 2.1.10.1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 77,052 kB
  • sloc: cpp: 112,339; xml: 16,769; sh: 3,528; makefile: 1,219; python: 420; perl: 260; ansic: 40
file content (441 lines) | stat: -rw-r--r-- 17,119 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
<!-- header fragment for html documentation -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>

<META NAME="description" CONTENT="Estimation of population parameters using genetic data usi
ng a maximum likelihood approach with Metropolis-Hastings Monte Carlo Markov chain importanc
e sampling">
<META NAME="keywords" CONTENT="MCMC, Markov chain, Monte Carlo, Metropolis-Hastings, populat
ion, parameters, migration rate, population size, recombination rate, maximum likelihood">

<TITLE>LAMARC Documentation: Data file conversion</title>
</HEAD>


<BODY BGCOLOR="#FFFFFF">
<!-- coalescent, coalescence, Markov chain Monte Carlo simulation, migration rate, effective
 population size, recombination rate, maximum likelihood -->

<P>(<A HREF="compiling.html">Back</A> | <A HREF="index.html">Contents</A>
| <A HREF="genetic_map.html">Next</A>)</P>
<H2>LAMARC File Converter</H2>

<p>
The converter uses a GUI, but can be used non-interactively in 
<a href="#batch-mode">batch mode</a>
with command-line arguments and a supplementary 
<a href="converter_cmd.html">converter command file</a> containing
information about your data.</P>

<P>Below you will find some basic issues to keep in mind when using the
converter, followed by instructions on how to use the converter in both
<a href="#batch-mode">batch</a>
and <a href="#gui-mode">GUI</a> modes.
</P>

<P><B>Important:</B>  If you wish to 
<a href="mapping.html">map trait alleles</a>,
you will need to write a <a href="converter_cmd.html">converter
command file</a> and <a href="#batch-mode">run the converter in 
batch mode</a>.  The capacity to read
and output trait data has not yet been added to the GUI. </P>


<h3>LAMARC File Converter Overview</h3>
<P> Topics covered in this article:</P>

<UL>
<LI><A HREF="converter.html#gettingReady">Getting Data Ready for Conversion </A></LI>
<LI> <A HREF="converter.html#running">Running the converter</a></LI>
<LI> <A HREF="converter.html#simple-example">A Simple Example</a></LI>
</UL>

<H3><A NAME="gettingReady">Getting Your Data Ready for Conversion</A></H3>

<p>
There are several things you must do
before you can produce a lamarc input file that will run successfully: 
<ul>
<li> <a href="converter.html#goodData">Start with good data</a>,</li>
<li> <a href="converter.html#converterCanRead">Use file formats the converter can read</a>,</li>
<li> <a href="converter.html#interleavingStatus">Know how your data files are interleaved</a>, </li>
<li> <a href="converter.html#handleHaploidMsatsRight">Verify haploid data is correctly represented</a>, and</li>
<li> <a href="converter.html#spatialProperties">Be able to model linkage properties and relative mutation rates of your data.</a></li>
</ul>
</p>

<h4><a name="goodData">Start with good data</a></h4>

The quality of a LAMARC analysis is dependent on the quality of
data that is input to it. Of particular import are:
<ul>
<li>Collecting data from different portions of the organism's genome,</li>
<li>Collecting an appropriate number of samples (more is not always better), and</li>
<li>Ensuring random sample selection (duplicates and invariant sequences are meaningful -- don't throw them out) </li>
</ul>

The section
<a href="data_required.html">Suitable data for LAMARC</a>
gives more information on this topic.

<h4><a name="converterCanRead">Use file formats the converter can read</a></h4>


<p> The converter can convert PHYLIP, RECOMBINE and MIGRATE files to the
format used by the LAMARC program. With a tiny amount of hand-editing it can
also convert COALESCE and FLUCTUATE files.
</p>

<p>
If you have a COALESCE or FLUCTUATE file, you will need to edit it
slightly in a text editor first.  
<ul>
<li>If the file has only one chromosomal
region in it, delete the first line (which will say "1").  Then  treat it as
a PHYLIP file.</li>
<li> If the file has multiple chromosomal regions in it, delete the first line
(which will give the number of regions) and divide the file into several
different files, one per region.  Then treat them as PHYLIP files.</li>
</ul>
</P>

<P> If your data file is not in any of these formats, check to see if
the software which produced it has an option to write PHYLIP files.
For example, PAUP* can convert many types of files to the PHYLIP format.</P>

<P> Currently the converter can handle DNA, RNA, and SNP sequence files
in PHYLIP or MIGRATE format, and microsatellite and K-Allele files in 
MIGRATE format.</P>

<h4><a name="interleavingStatus">Know how your data files are interleaved</a></h4>

<P>
PHYLIP, MIGRATE and the LAMARC's predecessor programs
(COALESCE and RECOMBINE) store nucleotide
sequence data in either interleaved or sequential form.  Unfortunately,
these formats don't always contain enough internal evidence for the computer
to guess correctly whether the data are interleaved or not, so you may have
to provide this information.  (The problem is that strings like "CAT" are
both legal sequence names and legal DNA.)</P>

<p><ul>
<li> The data section of
an interleaved file will have the first line of sequence 1, then
the first line of sequence 2, etc, followed by the second line of
all sequences, and so forth.  This format is often produced
by alignment algorithms.</li>
<li> A sequential file will have all of sequence 1, then all of sequence
2, etc.  This format is often produced by database programs.</li>
</ul>
A simple example showing both interleaved and sequential arrangement
in a Phylip input file is given in the table below.
</p>

<center>
<table border="2">
<tr>
<th>Phylip example file with interleaved sequences</th>
<th>Phylip example file with sequential sequences</th>
</tr>
<tr>
<td>
<pre>
3 10
cat       acttg
dog       acttg
pigeon    acttg
gtGca
gtGcT
gAtca
</pre>
</td>
<td>
<pre>
3 10
cat       acttg
gtGca
dog       acttg
gtGcT
pigeon    acttg
gAtca
</pre>
</td>
</tr>
</table>
</center>

<P> The formats should be relatively easy to distinguish if you look at them
yourself. While the converter will try to figure it out algorithmically,
in some cases it will give up, and you must tell it which format is present.</P>

<h4><a name="handleHaploidMsatsRight">Verify haploid data is correctly represented</a></h4>

<P> In the absence of a defined delimiter character,
the MIGRATE file format for microsatellite or elecrophoretic
data assumes that you have collected two alleles per marker per individual.
If this is not the case (perhaps your organism is haploid, or you
collected your data in a way that produces only one allele per marker
per individual) you can make a MIGRATE file with one, rather than two,
entries.  However, you <b>must</b> specify the delimiter character
(even though you will not be using it!) at the top of your MIGRATE
file.  If you don't, your single allele may be interpreted as two
alleles (i.e. a microsatellite of "27" may turn into "2" and "7").
Give an explicit delimiter to avoid this problem.</P>

<p>
If you have diploid (or more) data and some of your data has unresolved phase,
you may need to include <a href="converter_cmd.html#phase">phase resolution</a>
information via a <a href="converter_cmd.html">converter command file</a>.
</p>

<h4><a name="spatialProperties">
Be able to model linkage properties and relative mutation rates of your data</a></h4>

<p>
Previous LAMARC releases allowed combined analyses of data samples
from different regions of an organism's genome only when these
regions were either on separate chromosomes, or far enough separated
on a single chromosome that each
data sample was completely unlinked to the others.
It was also not possible to explicitly represent known variations
in relative mutation rate within a data sample.
</p>

<p>
As of LAMARC 2.1, we have relaxed this restriction, allowing you to
mix and match different data types even when they are linked.  So, for
example, the increasingly-popular data type of 
<a href="genetic_map.html#microsat-snp">microsatellite next to a SNP</a>
may now be modeled in LAMARC and will be analyzed appropriately.</P>

<p>If you have any of the following types of data, 
<ul>
<li>Samples with know differences in relative mutation rates, such as a
sequence containing introns and exons,</li>
<li>Overlapping data, such as fully sequenced DNA within a broader
collection of SNPs, or </li>
<li>Samples with different data types in close proximity, such as a 
microsatellite next to a SNP,</li>
</ul>
you will want to read the section entitled
<a href="genetic_map.html">Modeling Linkage Properties and Relative
Mutation Rates of Your Data</a> before attempting to create a
LAMARC infile that contains all your data.
It is recommended that you first continue reading this section and
make a test LAMARC run using one of your data samples to familiarize
yourself with the converter and the LAMARC program itself.
</p>

<p>
If you have a more straightforward data set such as 
a single DNA sequence or a set of unlinked microsatellites,
reading and following this section should be sufficient to
get you up to speed with the converter.
</p>


<h3> <A NAME="running">Running the converter</a></h3>

<h4>Incompatibilities with earlier versions</h4>

<P> The majority of LAMARC 2.0 and earlier input files should work
unmodified in version 2.1, with the sole exception of those 1.1.1 files with
the "&lt;map_position&gt;" tag, which must be changed to
"&lt;map-position&gt;" (see the <A HREF="changes.html">changes</A>
documentation).</P>

<h4>Settings not handled by the converter</h4>

<P> The file conversion process creates a lamarc input file with just the
data, so when it is read in by LAMARC, defaults will be used for all
parameter estimations.  To get LAMARC to estimate what you want it to
estimate, use the <A HREF="menu.html">LAMARC menu</a>, or <A
HREF="xmlinput.html">edit the XML itself</a>
after you have produced your LAMARC infile.
</P>

<h4>Reverse conversion</h4>

<P>
If you ever
need to get a PHYLIP file from a LAMARC XML file, one way to do so is to run
LAMARC with "normal" or "verbose" output (see the <A
HREF="menu.html#io">menu</A> documentation).  This will cause the input data
to be printed into the output file in a very PHYLIP-like format. You can use
a text editor to move this into a file of its own and make the minor changes
needed to create a PHYLIP file.</P>

<h4><a name="gui-mode">Running the converter in GUI mode<a></h4>
<p>
To run the lamarc file converter in GUI mode on a Linux or Unix system,
the command is:
</p>
<p>
<pre>
  lam_conv [-c &lt;commandfile&gt;] [ &lt;datafile&gt;... ]
</pre>
</p>
<p>
On Windows or a Mac simply double click on the application icon. You can
open command files and data files using the <tt>File</tt> menu of the
application.
</p>

<h4><a name="batch-mode">Running the converter in batch mode<a></h4>

<P> To run on a Linux, Unix, or Mac
in batch mode, you'll need to add command-line options:</P>
<pre>
  lam_conv -b -c &lt;commandfile&gt;
</pre>
</p>
<p>
(The Mac executable can be found in <tt>lam_conv.app/Contents/MacOS/lam_conv</tt>.)
</p>

<P>On a Windows system, use</P>
<pre>
  lam_conv.exe -b -c &lt;commandfile&gt;
</pre>
</p>

<P>The '-b' option tells the converter to run in batch mode, and the '-c'
option tells the converter the name of an XML file you have created that
tells the converter where your data is, and what to do with it.  If you
wish, you can use the "<tt>-c [filename]</tt>" option without the -b option,
and the converter will load in the data according to that file, and then let
you further modify things within the GUI.</P>

<P>The command file is not needed to run the converter in interactive mode;
the single exception (at present) is if you wish to include trait data for
<A HREF="mapping.html">mapping</a>.
</p>


<h3> <A NAME="simple-example">A Simple Example</a></h3>


<p>
This section demonstrates the converter in use on a simple DNA file
in migrate format,
<a href="batch_converter/chrom1.mig">chrom1.mig</a>.
This file is not real data, but was instead constructed to be 
easily inspected and checked for correctness.
</P>

<pre>
   2 1  Example: chromosome 1 with single dna segment
9
6    North
n_ind0_a  ccccccAcc
n_ind0_b  TcccccAcc
n_ind1_a  ccccccTcc
n_ind1_b  TcccccTcc
n_ind2_a  ccccccGcc
n_ind2_b  TcccccGcc
4    South
s_ind0_a  cTccTcccc
s_ind0_b  ccccTcccc
s_ind1_a  cTccccccc
s_ind1_b  ccccccccc
</pre>

<p>Upon reading in the file, the GUI looks like this:
</p>
<p><img src="batch_converter/images/DataPartitionsMigTab.png" alt="GUI converter after reading file chrom1.mig"/></p>
<p>
Note that there are 4 tabs:
<ul>
<li>The <tt>Data Partitions</tt> tab which contains the information read in from the file. </li>
<li>The <tt>Migration Matrix</tt> tab which contains the default migration matrix.</li>
<li>The <tt>Data Files</tt> tab which contains the information about what data files(s) have been read and what was found in them. </li>
<li>The <tt>Debug Log</tt> tab which contains debugging and logging information (not usually very interesting but somtimes useful). </li>
</ul>
</p>
<p>
There are also 2 buttons which are discussed later:
<ul>
<li><tt>Divergence</tt> which turns <A HREF="divergence.html">Divergence</A> off and on (not present if you only have one population). </li>
<li><tt>Use Panels</tt> which turns the <A HREF="panels.html">Panel Correction</A> method off and on.</li>
</ul>
</p>
<p>
Focusing first on the <tt>Data Partitions</tt> the following things are worth noting:
<ul>
<li>The <tt>Divergence</tt> button is &quot;Off&quot;. Unless you are studying Divergence (discussed below) this is correct. Divergence greatly slows down execution of LAMARC so do not turn this on without a good reason. If you have only one population in your data, this button will not be visible.</li>
<li>The box labeled <tt>contiguous segment</tt> in the <tt>Data Partitions</tt> information panel
has data type of <tt>???</tt>, indicating that you must select the specific data type
(in this case, SNP or DNA).
<li>The converter was able to create population names (&quot;North&quot; and &quot;South&quot;)
from the comments in the migrate file.</li>
<li>No panels are defined for any population. This is correct unless you are doing SNP panel correction. Do not define panel member counts unless you need them as they markedly slow down LAMARC.</li>
</p>
The <tt>Migration Matrix</tt> tab contains the default migration matrix 
between each of you populations (if you have only one population, there will be 
no migrations allowed). What these values mean and how they are edited is discussed on a <a href="migration_matrix.html">subsequent page</a>.
</p>
<p><img src="batch_converter/images/MigrationOnlyMatrixTab.png" alt="DataFiles Tab after reading file chrom1.mig"/></p>
<p>
The <tt>Data Files</tt> tab shows that the converter was
able to determine that the data in file chrom1.mig is of type DNA or SNP, </p>
<p><img src="batch_converter/images/DataFilesTab.png" alt="DataFiles Tab after reading file chrom1.mig"/></p>
<p>
<p>
The <tt>Debug Log</tt> tab shows whatever housekeeping comments the software has
output. This is usually not very interesting, but if we have to debug something for
you, it will be vital.</p>
<p><img src="batch_converter/images/DebugLogTab.png" alt="DebugLog Tab after reading file chrom1.mig"/></p>
<p>
<p>
Returning to the <tt>Data Partitions</tt> tab, if you try to convert the file (using <tt>File &gt; Write Lamarc File</tt> from the GUI
menu), you will see the following error message:
</p>
<p><img src="batch_converter/images/lam_conv_chrom1_export_warn_1.png" alt="GUI converter warning: needs data type"/></p>

<p>
The problem is that the converter needs you to tell it whether this
is DNA or SNP data.
To fix this problem, double click on the text inside
<tt>contiguous segment</tt> box in the <tt>Data Partitions</tt> panel.
You will see a new window that looks something like this:
</p>

<p><img src="batch_converter/images/lam_conv_chrom1_segment_panel.png" alt="GUI converter segment panel"/></p>

<p>
Select the <tt>DNA</tt> check box and click <tt>Apply</tt>. Now when you
choose <tt> File &gt; Write Lamarc File </tt> from the file menu, you will get
a directory browser window like this one in Linux:
</p>

<p><img src="batch_converter/images/lam_conv_chrom1_export_file_selection.png" alt="GUI converter export file selection"/></p>

<p>
and like this under OS X:
</p>

<p><img src="batch_converter/images/lam_conv_export_file_mac_minimal.png" alt="GUI converter export file selection"/></p>

<p>
Click the button with the triangle at the top for more complete navigation
through your directory system.
</p>

<p><img src="batch_converter/images/lam_conv_export_file_mac_expanded.png" alt="GUI converter export file selection"/></p>


<p>
The resulting lamarc file will look like <a href="batch_converter/chrom1_lamarc.html">this</a> 
(the actual xml is <a href="batch_converter/chrom1_lamarc.xml">here</a>).
</p>


<P>(<A HREF="compiling.html">Back</A> | <A HREF="index.html">Contents</A>
| <A HREF="genetic_map.html">Next</A>)</P>

<!--
//$Id: converter.html,v 1.46 2012/02/16 17:13:31 jmcgill Exp $
-->
</BODY>
</HTML>