1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
|
last-dotplot
============
This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
alignments in MAF or LAST tabular format. It requires the `Python
Imaging Library <https://pillow.readthedocs.io/>`_ to be installed.
It can be used like this::
last-dotplot my-alignments my-plot.png
The output can be in any format supported by the Imaging Library::
last-dotplot alns alns.gif
Terminology
-----------
last-dotplot shows alignments of one set of sequences against another
set of sequences. This document calls a "set of sequences" a
"genome", though it need not actually be a genome.
Options
-------
-h, --help
Show a help message, with default option values, and exit.
-v, --verbose
Show progress messages & data about the plot.
-x INT, --width=INT
Maximum width in pixels.
-y INT, --height=INT
Maximum height in pixels.
-m M, --maxseqs=M
Maximum number of horizontal or vertical sequences. If there
are >M sequences, the smallest ones (after cutting) will be
discarded.
-1 PATTERN, --seq1=PATTERN
Which sequences to show from the 1st (horizontal) genome.
-2 PATTERN, --seq2=PATTERN
Which sequences to show from the 2nd (vertical) genome.
-c COLOR, --forwardcolor=COLOR
Color for forward alignments.
-r COLOR, --reversecolor=COLOR
Color for reverse alignments.
--alignments=FILE
Read secondary alignments. For example: we could use primary
alignment data with one human DNA read aligned to the human
genome, and secondary alignment data with the whole chimpanzee
versus human genomes. last-dotplot will show the parts of the
secondary alignments that are near the primary alignments.
--sort1=N
Put the 1st genome's sequences left-to-right in order of: their
appearance in the input (0), their names (1), their lengths (2),
the top-to-bottom order of (the midpoints of) their alignments
(3). You can use two colon-separated values, e.g. "2:1" to
specify 2 for primary and 1 for secondary alignments.
--sort2=N
Put the 2nd genome's sequences top-to-bottom in order of: their
appearance in the input (0), their names (1), their lengths (2),
the left-to-right order of (the midpoints of) their alignments
(3).
--strands1=N
Put the 1st genome's sequences: in forwards orientation (0), in
the orientation of most of their aligned bases (1). In the
latter case, the labels will be colored (in the same way as the
alignments) to indicate the sequences' orientations. You can
use two colon-separated values for primary and secondary
alignments.
--strands2=N
Put the 2nd genome's sequences: in forwards orientation (0), in
the orientation of most of their aligned bases (1).
--max-gap1=FRAC
Maximum unaligned gap in the 1st genome. For example, if two
parts of one DNA read align to widely-separated parts of one
chromosome, it's probably best to cut the intervening region
from the dotplot. FRAC is a fraction of the length of the
(primary) alignments. You can specify "inf" to keep all
unaligned gaps. You can use two comma-separated values,
e.g. "0.5,3" to specify 0.5 for end-gaps (unaligned sequence
ends) and 3 for mid-gaps (between alignments). You can use two
colon-separated values (each of which may be comma-separated)
for primary and secondary alignments.
--max-gap2=FRAC
Maximum unaligned gap in the 2nd genome.
--pad=FRAC
Length of pad to leave when cutting unaligned gaps.
--border-pixels=INT
Number of pixels between sequences.
--border-color=COLOR
Color for pixels between sequences.
--margin-color=COLOR
Color for the margins.
Text options
~~~~~~~~~~~~
-f FILE, --fontfile=FILE
TrueType or OpenType font file.
-s SIZE, --fontsize=SIZE
TrueType or OpenType font size.
--labels1=N
Label the displayed regions of the 1st genome with their:
sequence name (0), name:length (1), name:start:length (2),
name:start-end (3).
--labels2=N
Label the displayed regions of the 2nd genome with their:
sequence name (0), name:length (1), name:start:length (2),
name:start-end (3).
--rot1=ROT
Text rotation for the 1st genome: h(orizontal) or v(ertical).
--rot2=ROT
Text rotation for the 2nd genome: h(orizontal) or v(ertical).
Annotation options
~~~~~~~~~~~~~~~~~~
These options read annotations of sequence segments, and draw them as
colored horizontal or vertical stripes. This looks good only if the
annotations are reasonably sparse: e.g. you can't sensibly view 20000
gene annotations in one small dotplot.
--bed1=FILE
Read `BED-format
<https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
annotations for the 1st genome. They are drawn as stripes, with
coordinates given by the first three BED fields. The color is
specified by the RGB field if present, else pale red if the
strand is "+", pale blue if "-", or pale purple.
--bed2=FILE
Read BED-format annotations for the 2nd genome.
--rmsk1=FILE
Read repeat annotations for the 1st genome, in RepeatMasker .out
or rmsk.txt format. The color is pale purple for "low
complexity" and "simple repeats", else pale red for "+" strand
and pale blue for "-" strand.
--rmsk2=FILE
Read repeat annotations for the 2nd genome.
Gene options
~~~~~~~~~~~~
--genePred1=FILE
Read gene annotations for the 1st genome in `genePred format
<https://genome.ucsc.edu/FAQ/FAQformat.html#format9>`_.
--genePred2=FILE
Read gene annotations for the 2nd genome.
--exon-color=COLOR
Color for exons.
--cds-color=COLOR
Color for protein-coding regions.
Unsequenced gap options
~~~~~~~~~~~~~~~~~~~~~~~
Note: these "gaps" are *not* alignment gaps (indels): they are regions
of unknown sequence.
--gap1=FILE
Read unsequenced gaps in the 1st genome from an agp or gap file.
--gap2=FILE
Read unsequenced gaps in the 2nd genome from an agp or gap file.
--bridged-color=COLOR
Color for bridged gaps.
--unbridged-color=COLOR
Color for unbridged gaps.
An unsequenced gap will be shown only if it covers at least one whole
pixel.
Choosing sequences
------------------
For example, you can exclude sequences with names like
"chrUn_random522" like this::
last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png
Option "-1" selects sequences from the 1st (horizontal) genome, and
"-2" selects sequences from the 2nd (vertical) genome. 'chr[!U]*' is
a *pattern* that specifies names starting with "chr", followed by any
character except U, followed by anything.
========== =============================
Pattern Meaning
---------- -----------------------------
``*`` zero or more of any character
``?`` any single character
``[abc]`` any character in abc
``[!abc]`` any character not in abc
========== =============================
If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
compared to both the whole name and the part after the dot.
You can specify more than one pattern, e.g. this gets sequences with
names starting in "chr" followed by one or two characters::
last-dotplot -1 'chr?' -1 'chr??' alns alns.png
You can also specify a sequence range; for example this gets the first
1000 bases of chr9::
last-dotplot -1 chr9:0-1000 alns alns.png
Text font
---------
You can improve the font quality by increasing its size, e.g. to 20
points:
last-dotplot -s20 my-alignments my-plot.png
last-dotplot tries to find a nice font on your computer, but may fail
and use an ugly font. You can specify a font like this::
last-dotplot -f /usr/share/fonts/liberation/LiberationSans-Regular.ttf alns alns.png
Colors
------
Colors can be specified in `various ways described here
<http://effbot.org/imagingbook/imagecolor.htm>`_.
|