File: last-scripts.txt

package info (click to toggle)
last-align 199-1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 1,840 kB
  • sloc: cpp: 19,270; python: 1,561; ansic: 639; makefile: 132; sh: 79
file content (122 lines) | stat: -rw-r--r-- 4,239 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
Description of scripts that accompany LAST
==========================================

last-dotplot.py
---------------

This script makes a dotplot, a.k.a. Oxford Grid, of alignments in LAST
tabular format.  It requires the Python Imaging Library to be
installed.  To get a usage message::

  last-dotplot.py --help

To make a png-format dotplot of alignments in a file called "al"::

  last-dotplot.py al al.png

To get a nicer font, try something like::

  last-dotplot.py -f /usr/share/fonts/truetype/freefont/FreeSans.ttf al al.png

If the fonts are located somewhere different on your computer, change
this as appropriate.  To turn off the text and margins completely::

  last-dotplot.py -s0 al al.png

To limit the plot to 500x500 pixels::

  last-dotplot.py -x500 -y500 al al.png

If there are too many chromosomes, the dotplot will be very cluttered,
or the script might give up with an error message.  So you may want to
remove alignments involving fragmentary chromosomes first.  For
example, you could use "grep -v" to remove alignments involving
chromosomes with names like "chr1_random"::

  grep -v 'random' al > plotme
  last-dotplot.py plotme plotme.png


last-reduce-alignments.sh
-------------------------

This script removes "uninteresting" alignments from LAST genome
comparisons in MAF format.  Roughly speaking, it removes paralog
alignments and keeps ortholog alignments.  More precisely, if region A
in genome 1 aligns with region B in genome 2, but if A also aligns
more strongly with a different region X and B aligns more strongly
with a different region Y, then the alignment of A with B is removed.
This procedure is conservative: it is unlikely to remove one-to-one
orthologs, but it may keep some paralogs, e.g. if the ortholog is
(wholly or partially) deleted in one genome.  The usage is simple::

  last-reduce-alignments.sh my-alignments.maf > reduced-alignments.maf

There is also an option to remove alignments more aggressively: if A
aligns more strongly with X *or* B aligns more strongly with Y, then
the alignment of A with B is removed.  This is still unlikely to
remove one-to-one orthologs, but it may cause some regions that are
alignable to something to be aligned to nothing.  This option is
selected with "-d"::

  last-reduce-alignments.sh -d my-alignments.maf > reduced-alignments.maf


maf-join.py
-----------

This script joins two or more sets of pairwise (or multiple)
alignments into multiple alignments::

  maf-join.py aln1.maf aln2.maf aln3.maf > joined.maf

The top genome in each input file should be the same, and the script
simply joins alignment columns that are at the same position in the
top genome.  IMPORTANT LIMITATION: alignment columns with gaps in the
top sequence get joined arbitrarily, and probably wrongly.  Please
disregard such columns in downstream analyses.  Each input file must
have been sorted using maf-sort.sh (but the output of
last-reduce-alignments.sh is already in the right order).  For an
example of using LAST and maf-join.py, see multiMito.sh in the
examples directory.


maf-swap.py
-----------

This script changes the order of the sequences in MAF-format
alignments.  You can use option "-n" to move the "n"th sequence to the
top (it defaults to 2)::

  maf-swap.py -n3 my-alignments.maf > my-swapped.maf


maf-sort.sh
-----------

This sorts MAF-format alignments by sequence name, then strand, then
start position, then end position, of the top sequence.  You can use
option "-n" to sort by the "n"th sequence instead of the top sequence.


last-remove-dominated.py
------------------------

This script is used by last-reduce-alignments.sh.  It reads sorted
alignments, and removes alignments of A in genome 1 with B in genome 2
if A also aligns more strongly to a different region X.


Limitations
-----------

1) last-reduce-alignments.sh and last-remove-dominated.py do not work
   with centroid alignments.

2) The scripts that read MAF format work with the simple subset of MAF
   produced by lastal, but they don't necessarily work with more
   complex MAF data from elsewhere.

3) These scripts do not work for DNA-versus-protein alignments:
   last-dotplot.py, maf-join.py, last-reduce-alignments.sh,
   last-remove-dominated.py.