1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
|
Q: Is there a multi-core / multi-threaded version of LAST?
A: No, but you can easily do it yourself. Suppose you want to compare
sequences in three files (dna1.fa, dna2.fa, dna3.fa) to a lastdb
database called "mydb". You can do something like this:
lastal mydb dna1.fa > out1.maf &
lastal mydb dna2.fa > out2.maf &
lastal mydb dna3.fa > out3.maf &
This will process the files in parallel. (It will not need three
times as much memory, because the lastal processes will use shared
memory for mydb.)
Q: Does it matter which sequence is used as the "reference" (given to
lastdb) and which is used as the "query" (given to lastal)?
A: It may do. In short, LAST tries hard to find alignments for every
position in the query. When mapping reads to a genome, you
probably want the genome to be the reference, and the reads to be
the query. That way, for each read, it will search for several
most-similar locations in the genome. The other way, for each
location in the genome, it will search for several most-similar
reads. As another example, if you compare a genome to a library of
repeat sequences, you probably want the genome to be the query and
the repeat library to be the reference.
Q: Why is LAST so slow at reading/writing files?
A: Probably because the files are huge and your disk is slow (or lots
of people are using it). Try to use a reasonably fast disk.
Q: How does LAST get the sequence names? How can I get nice, short,
unique names?
A: The first whitespace-delimited word in the sequence header line is
used as the name. You can arbitrarily customise the names using
standard Unix tools. For example, this will replace each name with
a unique serial number:
awk '/>/ {$0 = ">" ++n} {print}' queries.fa | lastal myDb -
Sometimes you can make LAST's output significantly smaller by
shortening the names.
Q: How can I find alignments with > 95% identity?
A: One way is to use a scoring scheme like this: +5 for a match, and
-95 for a mismatch or a gap. You'll also need to set the alignment
score threshold to a reasonable value. In this example we set it
to 150, which means that we require at least 30 matches:
lastal -r5 -q95 -a0 -b95 -e150
|