File: FAQ.txt

package info (click to toggle)
last-align 199-1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 1,840 kB
  • sloc: cpp: 19,270; python: 1,561; ansic: 639; makefile: 132; sh: 79
file content (57 lines) | stat: -rw-r--r-- 2,281 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Q: Is there a multi-core / multi-threaded version of LAST?

A: No, but you can easily do it yourself.  Suppose you want to compare
   sequences in three files (dna1.fa, dna2.fa, dna3.fa) to a lastdb
   database called "mydb".  You can do something like this:

     lastal mydb dna1.fa > out1.maf &
     lastal mydb dna2.fa > out2.maf &
     lastal mydb dna3.fa > out3.maf &

  This will process the files in parallel.  (It will not need three
  times as much memory, because the lastal processes will use shared
  memory for mydb.)


Q: Does it matter which sequence is used as the "reference" (given to
   lastdb) and which is used as the "query" (given to lastal)?

A: It may do.  In short, LAST tries hard to find alignments for every
   position in the query.  When mapping reads to a genome, you
   probably want the genome to be the reference, and the reads to be
   the query.  That way, for each read, it will search for several
   most-similar locations in the genome.  The other way, for each
   location in the genome, it will search for several most-similar
   reads.  As another example, if you compare a genome to a library of
   repeat sequences, you probably want the genome to be the query and
   the repeat library to be the reference.


Q: Why is LAST so slow at reading/writing files?

A: Probably because the files are huge and your disk is slow (or lots
   of people are using it).  Try to use a reasonably fast disk.


Q: How does LAST get the sequence names?  How can I get nice, short,
   unique names?

A: The first whitespace-delimited word in the sequence header line is
   used as the name.  You can arbitrarily customise the names using
   standard Unix tools.  For example, this will replace each name with
   a unique serial number:

     awk '/>/ {$0 = ">" ++n} {print}' queries.fa | lastal myDb -

   Sometimes you can make LAST's output significantly smaller by
   shortening the names.


Q: How can I find alignments with > 95% identity?

A: One way is to use a scoring scheme like this: +5 for a match, and
   -95 for a mismatch or a gap.  You'll also need to set the alignment
   score threshold to a reasonable value.  In this example we set it
   to 150, which means that we require at least 30 matches:

     lastal -r5 -q95 -a0 -b95 -e150