1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
|
-----------------------------------------------------------------------
MapView Utility software
Version 1.0
Contact: <mummer-help@lists.sourceforge.net>
Web: http://mummer.sourceforge.net
-----------------------------------------------------------------------
LICENCE: open source, included with MUMmer 3.0 and above
USAGE: see section 4, below.
1. WHAT IS MAPVIEW?
----------------
MapView is an utility program for displaying sequence alignments
as provided by NUCmer or PROmer. For further information regarding these
programs, please see the documentation and code at
http://mummer.sourceforge.net . MapView takes the output from
these programs and converts it to a FIG, PDF or PS file. It can
break the output into multiple files for easier viewing and printing.
Note that for very large reference genomes, FIG files viewed in the
xfig program (Unix) may be the only option that allows the entire
display to be stored in one file.
2. SYSTEM REQUIREMENTS
-------------------
- PERL interpreter version 5.0 or greater.
- fig2dev utility (see www.linux.org for transfig rpm package and
installation documentation)
- xfig viewer to visualize the FIG format (see www.linux.org regarding
xfig rpm package)
- Adobe Acrobat Reader for reading PDF formats (free from www.adobe.com)
- Ghostscript Postscript interpreter to view PDF and postscript documents
(on www.linux.org, look for the 'gv' rpm package)
3. INPUT
-----
The input to MapView is the table generated by the "show-coords"
program in MUMmer. It is important to use the -r -l options in
show-coords in order to have the proper format for MapView. For PROmer
output, it can be very helpful to run show-coords with the -k option as
well, to reduce the redundant matches often found in highly similar
regions. However, this option does not always select the appropriate
reading frame.
Both PROmer and NUCmer writes output into a specific format that
can be found in the *.cluster and *.delta files. To translate this
output into a human readable format, the "show-coords" program
parses the delta alignment output of either NUCmer or PROmer and
displays a summary information for each alignment. (Note that
PROmer and NUCmer include command line options that allow them to
generate the same summary information without running "show-coords"
separately.) The output of show-coords is then used by MapView to
create a FIG, PDF or PS file.
An example of the standard output of show-coords, which is used
directly as input for MapView, is below. This shows just the top
few lines of a large file created by aligning an assembly of
Drosophila pseudoobscura (165 million bases) to chromosome 2L of
Drosophila melanogaster:
/usr/local/db/euk/internal/d_melanogaster/na_arm2R_genomic_dmel_RELEASE3.FASTA celera_scaffs.fa
PROMER
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] [% SIM] [% STP] | [LEN R] [LEN Q] | [COV R] [COV Q] | [FRM] [TAGS]
========================================================================================================================================================
2540 2806 | 3216 3473 | 267 258 | 46.67 50.00 2.78 | 20302755 8916 | 0.00 2.89 | 2 3 2R 3211358
2540 2806 | 1939 2196 | 267 258 | 46.67 51.11 2.22 | 20302755 2375 | 0.00 10.86 | 2 1 2R 3211430
2540 2893 | 20172 19852 | 354 321 | 39.52 45.16 3.23 | 20302755 25647 | 0.00 1.25 | 2 -1 2R 3215406
2806 2534 | 5291 5536 | 273 246 | 41.94 47.31 3.76 | 20302755 12414 | 0.00 1.98 | -3 2 2R 3211507
....
For more information and an explanation of this format, please see
the MUMmer manual http://mummer.sourceforge.net/manual
4. USAGE
-----
USAGE: mapview [options] <coords file> [UTR coords] [CDS coords]
The optional UTR and CDS coordinates files, which are computed in
based on the reference seq, should be in GFF format. These contain
the coordinates of coding sequences and untranslated regions for
genes on the reference genome, and will be displayed graphically
if provided.
GFF format is a tab-delimited file format with the following columns:
<seq_ID> <source> <exon type> <start> <end> <score> <strand> <frame> <gene_name>
Options :
-f <output format> : pdf, ps or fig. the default is "fig".
-x1 <left coord > -x2 <right coord> : only display the region on
the reference genome between positions x1 and x2. By default the
whole sequence will be diplayed.
-d <no_bp> : the maximum distance (in bp) between the matches for
which the matches will be linked. Default is 50000 bp. To explain:
the query sequence may contain multiple contigs. All matches from
the same contig are linked by drawing lines between each successive
pair of matches. If the matches occur too far apart, then this can
get very messy. Therefore we don't draw a line if the matches are
further apart than specified by this parameter. This is especially
important if the reference genome is very long and all the output
is stored in a single graphical file.
-m <mag> : set the magnification at which the figure is rendered to
mag. The default is 1.0; this is an option for fig2dev which is
used to transform the fig files to pdf or ps files.
-n <no of output files> : the default is 10. The purpose of this
parameter is to avoid making figures that are too 'large', in the
sense that they cannot be converted to PDF by fig2dev.
-p <file name> : the output file prefix;
By default the name of the output file(s) will be
PROMER_graph_<n>.fig, where <n> will be incremented for each output
file. If you choose "-o MyName", for example, then the name of the
first output file name will be MyName_0.fig.
-h display this help;
-v verbosely list the files processed;
-g|ref If the input file is provided by 'mgaps', set the
reference sequence ID (as it appears in the first column
of the UTR/CDS coords file)
-I Display the name of query sequences
-Ir Display the name of reference genes
5. OUTPUT
------
the output can be fig, pdf, or ps files.
The program uses fig2dev to transform FIG files to PDF or PS.
If you supply UTR and CDS coords files, then the genes are displayed
first, along the top. Alternatively spliced genes are shown on
different rows, stacked vertically. The CDS regions (i.e., the
protein coding portions of exons) are diplayed in light green and the
5'end and 3'end UTR's are in different colors. (For details, please
see the legend in the left corner below the graphic.)
The reference seq is displayed in light blue, and on a row imediately
below it are shown the alignment matches.
The alignment matches are displayed again in vertical positions
depending on the percent identity (PID) of each match, ranging from
50% to 100%. Matches with PID< 50% (if any are included in the input
file) are considered to have PID=50%. For better visualization, the
connecting lines between matches are colored differently, using
randomly chosen colors, from one query seq to the next. If
these connecting lines are crossed, it indicates that the sequence
has been reverse complemented to achieve the match; however, note that
if a sequence is similar at both the protein and DNA level, we often
detect matches in multiple reading frames. NUCmer and PROmer have options
to display only one match when matches occur in multiple frames, but they
don't always choose the correct orientation.
6. KNOWN PROBLEMS
--------------
There is a known problem with the PDF files. Fig2dev has problems if
the FIG file is too big. It will constantly export that file into a
PDF with errors. We recomend using the PS format for files that are
very big, or else breaking the files up using the -n option above.
|