File: last-dotplot.txt

package info (click to toggle)
last-align 963-2
links: PTS, VCS
area: main
in suites: buster
size: 3,380 kB
sloc: cpp: 41,136; python: 2,744; ansic: 1,240; makefile: 383; sh: 255
file content (222 lines) | stat: -rw-r--r-- 7,997 bytes
last-dotplot
============

This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
alignments in MAF or LAST tabular format.  It requires the `Python
Imaging Library <https://pillow.readthedocs.io/>`_ to be installed.
It can be used like this::

  last-dotplot my-alignments my-plot.png

The output can be in any format supported by the Imaging Library::

  last-dotplot alns alns.gif

Terminology
-----------

last-dotplot shows alignments of one set of sequences against another
set of sequences.  This document calls a "set of sequences" a
"genome", though it need not actually be a genome.

Options
-------

  -h, --help
      Show a help message, with default option values, and exit.
  -v, --verbose
      Show progress messages & data about the plot.
  -x INT, --width=INT
      Maximum width in pixels.
  -y INT, --height=INT
      Maximum height in pixels.
  -m M, --maxseqs=M
      Maximum number of horizontal or vertical sequences.  If there
      are >M sequences, the smallest ones (after cutting) will be
      discarded.
  -1 PATTERN, --seq1=PATTERN
      Which sequences to show from the 1st (horizontal) genome.
  -2 PATTERN, --seq2=PATTERN
      Which sequences to show from the 2nd (vertical) genome.
  -c COLOR, --forwardcolor=COLOR
      Color for forward alignments.
  -r COLOR, --reversecolor=COLOR
      Color for reverse alignments.
  --alignments=FILE
      Read secondary alignments.  For example: we could use primary
      alignment data with one human DNA read aligned to the human
      genome, and secondary alignment data with the whole chimpanzee
      versus human genomes.  last-dotplot will show the parts of the
      secondary alignments that are near the primary alignments.
  --sort1=N
      Put the 1st genome's sequences left-to-right in order of: their
      appearance in the input (0), their names (1), their lengths (2),
      the top-to-bottom order of (the midpoints of) their alignments
      (3).  You can use two colon-separated values, e.g. "2:1" to
      specify 2 for primary and 1 for secondary alignments.
  --sort2=N
      Put the 2nd genome's sequences top-to-bottom in order of: their
      appearance in the input (0), their names (1), their lengths (2),
      the left-to-right order of (the midpoints of) their alignments
      (3).
  --strands1=N
      Put the 1st genome's sequences: in forwards orientation (0), in
      the orientation of most of their aligned bases (1).  In the
      latter case, the labels will be colored (in the same way as the
      alignments) to indicate the sequences' orientations.  You can
      use two colon-separated values for primary and secondary
      alignments.
  --strands2=N
      Put the 2nd genome's sequences: in forwards orientation (0), in
      the orientation of most of their aligned bases (1).
  --max-gap1=FRAC
      Maximum unaligned gap in the 1st genome.  For example, if two
      parts of one DNA read align to widely-separated parts of one
      chromosome, it's probably best to cut the intervening region
      from the dotplot.  FRAC is a fraction of the length of the
      (primary) alignments.  You can specify "inf" to keep all
      unaligned gaps.  You can use two comma-separated values,
      e.g. "0.5,3" to specify 0.5 for end-gaps (unaligned sequence
      ends) and 3 for mid-gaps (between alignments).  You can use two
      colon-separated values (each of which may be comma-separated)
      for primary and secondary alignments.
  --max-gap2=FRAC
      Maximum unaligned gap in the 2nd genome.
  --pad=FRAC
      Length of pad to leave when cutting unaligned gaps.
  --border-pixels=INT
      Number of pixels between sequences.
  --border-color=COLOR
      Color for pixels between sequences.
  --margin-color=COLOR
      Color for the margins.

Text options
~~~~~~~~~~~~

  -f FILE, --fontfile=FILE
      TrueType or OpenType font file.
  -s SIZE, --fontsize=SIZE
      TrueType or OpenType font size.
  --labels1=N
      Label the displayed regions of the 1st genome with their:
      sequence name (0), name:length (1), name:start:length (2),
      name:start-end (3).
  --labels2=N
      Label the displayed regions of the 2nd genome with their:
      sequence name (0), name:length (1), name:start:length (2),
      name:start-end (3).
  --rot1=ROT
      Text rotation for the 1st genome: h(orizontal) or v(ertical).
  --rot2=ROT
      Text rotation for the 2nd genome: h(orizontal) or v(ertical).

Annotation options
~~~~~~~~~~~~~~~~~~

These options read annotations of sequence segments, and draw them as
colored horizontal or vertical stripes.  This looks good only if the
annotations are reasonably sparse: e.g. you can't sensibly view 20000
gene annotations in one small dotplot.

  --bed1=FILE
      Read `BED-format
      <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
      annotations for the 1st genome.  They are drawn as stripes, with
      coordinates given by the first three BED fields.  The color is
      specified by the RGB field if present, else pale red if the
      strand is "+", pale blue if "-", or pale purple.
  --bed2=FILE
      Read BED-format annotations for the 2nd genome.
  --rmsk1=FILE
      Read repeat annotations for the 1st genome, in RepeatMasker .out
      or rmsk.txt format.  The color is pale purple for "low
      complexity" and "simple repeats", else pale red for "+" strand
      and pale blue for "-" strand.
  --rmsk2=FILE
      Read repeat annotations for the 2nd genome.

Gene options
~~~~~~~~~~~~

  --genePred1=FILE
      Read gene annotations for the 1st genome in `genePred format
      <https://genome.ucsc.edu/FAQ/FAQformat.html#format9>`_.
  --genePred2=FILE
      Read gene annotations for the 2nd genome.
  --exon-color=COLOR
      Color for exons.
  --cds-color=COLOR
      Color for protein-coding regions.

Unsequenced gap options
~~~~~~~~~~~~~~~~~~~~~~~

Note: these "gaps" are *not* alignment gaps (indels): they are regions
of unknown sequence.

  --gap1=FILE
      Read unsequenced gaps in the 1st genome from an agp or gap file.
  --gap2=FILE
      Read unsequenced gaps in the 2nd genome from an agp or gap file.
  --bridged-color=COLOR
      Color for bridged gaps.
  --unbridged-color=COLOR
      Color for unbridged gaps.

An unsequenced gap will be shown only if it covers at least one whole
pixel.

Choosing sequences
------------------

For example, you can exclude sequences with names like
"chrUn_random522" like this::

  last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png

Option "-1" selects sequences from the 1st (horizontal) genome, and
"-2" selects sequences from the 2nd (vertical) genome.  'chr[!U]*' is
a *pattern* that specifies names starting with "chr", followed by any
character except U, followed by anything.

==========  =============================
Pattern     Meaning
----------  -----------------------------
``*``       zero or more of any character
``?``       any single character
``[abc]``   any character in abc
``[!abc]``  any character not in abc
==========  =============================

If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
compared to both the whole name and the part after the dot.

You can specify more than one pattern, e.g. this gets sequences with
names starting in "chr" followed by one or two characters::

  last-dotplot -1 'chr?' -1 'chr??' alns alns.png

You can also specify a sequence range; for example this gets the first
1000 bases of chr9::

  last-dotplot -1 chr9:0-1000 alns alns.png

Text font
---------

You can improve the font quality by increasing its size, e.g. to 20
points:

  last-dotplot -s20 my-alignments my-plot.png

last-dotplot tries to find a nice font on your computer, but may fail
and use an ugly font.  You can specify a font like this::

  last-dotplot -f /usr/share/fonts/liberation/LiberationSans-Regular.ttf alns alns.png

Colors
------

Colors can be specified in `various ways described here
<http://effbot.org/imagingbook/imagecolor.htm>`_.