File: mummer.1

package info (click to toggle)
mummer 3.20-3
links: PTS, VCS
area: main
in suites: lenny
size: 5,156 kB
ctags: 2,042
sloc: cpp: 13,011; ansic: 7,530; perl: 4,140; makefile: 360; sh: 48; csh: 44; awk: 17
file content (477 lines) | stat: -rw-r--r-- 21,037 bytes
parent folder | download | duplicates (3)
.\"                                      Hey, EMACS: -*- nroff -*-
.\" First parameter, NAME, should be all caps
.\" Second parameter, SECTION, should be 1-8, maybe w/ subsection
.\" other parameters are allowed: see man(7), man(1)
.TH MUMMER 1 "May 21, 2005"
.\" Please adjust this date whenever revising the manpage.
.\"
.\" Some roff macros, for reference:
.\" .nh        disable hyphenation
.\" .hy        enable hyphenation
.\" .ad l      left justify
.\" .ad b      justify to both left and right margins
.\" .nf        disable filling
.\" .fi        enable filling
.\" .br        insert line break
.\" .sp <n>    insert n+1 empty lines
.\" for manpage-specific macros, see man(7)
.SH NAME
mummer \- package for sequence alignment of multiple genomes
.SH SYNOPSIS
.B mummer-annotate
.RI <gapfile> <datafile> 
.br
.B combineMUMs
.RI <RefSequence> <MatchSequences> <GapsFile>
.br
.B delta-filter
.RI [options]  <deltafile>
.br
.B dnadiff
.RI [options]  <reference>  <query>
or
.RI [options]  -d <delta file>
.br
.B exact-tandems
.RI <file> <min-match-len>
.br
.B gaps
.br
.B mapview
.RI [options]  <coords file>  [UTR coords]  [CDS coords]
.br
.B mgaps
.RI [-d <DiagDiff>] [-f <DiagFactor>] [-l <MatchLen>] [-s <MaxSeparation>]
.br
.B mummer
.RI [ options ] <reference-file> <query-files>
.br
.B mummerplot
.RI [options]  <match file>
.br
.B nucmer
.RI [options]  <Reference>  <Query>
.br
.B nucmer2xfig
.br
.B promer
.RI [options]  <Reference>  <Query>
.br
.B repeat-match
.RI [options]  <genome-file>
.br
.B run-mummer1
.RI <fasta reference> <fasta query> <prefix> [-r]
.br
.B run-mummer3
.RI <fasta reference> <multi-fasta query> <prefix>
.br
.B show-aligns
.RI [options]  <deltafile>  <ref ID>  <qry ID>
.PP
Input is the .delta output of either the "nucmer" or the
"promer" program passed on the command line.
.PP
Output is to stdout, and consists of all the alignments between the
query and reference sequences identified on the command line.
.PP
NOTE: No sorting is done by default, therefore the alignments
will be ordered as found in the <deltafile> input.
.br
.B show-coords
.RI [options]  <deltafile>
.br
.B show-snps
.RI [options]  <deltafile>
.br
.B show-tiling
.RI [options]  <deltafile>

.br
.SH DESCRIPTION

.SH OPTIONS
All tools (exept for gaps) obey to the -h, --help, -V and --version options
as one would expect. This help is excellent and makes these man pages basically obsolete.
.br
.B combineMUMs
Combines MUMs in <GapsFile> by extending matches off
ends and between MUMs.  <RefSequence> is a fasta file
of the reference sequence.  <MatchSequences> is a
multi-fasta file of the sequences matched against the
reference
.PP
  -D      Only output to stdout the difference positions
          and characters
  -n      Allow matches only between nucleotides, i.e., ACGTs
  -N num  Break matches at <num> or more consecutive non-ACGTs 
  -q tag  Used to label query match
  -r tag  Used to label reference match
  -S      Output all differences in strings
  -t      Label query matches with query fasta header
  -v num  Set verbose level for extra output
  -W file Reset the default output filename witherrors.gaps
  -x      Don't output .cover files
  -e      Set error-rate cutoff to e (e.g. 0.02 is two percent)
.br
.B dnadiff
Run comparative analysis of two sequence sets using nucmer and its
associated utilities with recommended parameters. See MUMmer
documentation for a more detailed description of the
output. Produces the following output files:
.PP       
    .report  - Summary of alignments, differences and SNPs
    .delta   - Standard nucmer alignment output
    .1delta  - 1-to-1 alignment from delta-filter -1
    .mdelta  - M-to-M alignment from delta-filter -m
    .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
    .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
    .snps    - SNPs from show-snps -rlTHC .1delta
    .rdiff   - Classified ref breakpoints from show-diff -rH .mdelta
    .qdiff   - Classified qry breakpoints from show-diff -qH .mdelta
    .unref   - Unaligned reference IDs and lengths (if applicable)
    .unqry   - Unaligned query IDs and lengths (if applicable)
.PP
MANDATORY:
    reference       Set the input reference multi-FASTA filename
    query           Set the input query multi-FASTA filename
      or
    delta file      Unfiltered .delta alignment file from nucmer
.PP
OPTIONS:
    -d|delta        Provide precomputed delta file for analysis
    -h
    --help          Display help information and exit
    -p|prefix       Set the prefix of the output files (default "out")
    -V
    --version       Display the version information and exit

.br
.B delta-filter
  -e float    For switches -g -r -q, keep repeats within e percent
              of the best LIS score [0, 100], no repeats by default
  -g          Global alignment using length*identity weighted LIS.
              For every reference-query pair, leave only the aligns
              which form the longest mutually consistent set
  -h          Display help information
  -i float    Set the minimum alignment identity [0, 100], default 0
  -l int      Set the minimum alignment length, default 0
  -q          Query alignment using length*identity weighted LIS.
              For each query, leave only the aligns which form the
              longest consistent set for the query
  -r          Reference alignment using length*identity weighted LIS.
              For each reference, leave only the aligns which form
              the longest consistent set for the reference
  -u float    Set the minimum alignment uniqueness, i.e. percent of
              the alignment matching to unique reference AND query
              sequence [0, 100], default 0
  -o float    Set the maximum alignment overlap for -r and -q options
              as a percent of the alignment length [0, 100], default 100
.PP 
  Reads a delta alignment file from either nucmer or promer and
filters the alignments based on the command-line switches, leaving
only the desired alignments which are output to stdout in the same
delta format as the input. For multiple switches, order of operations
is as follows: -i -l -u -q -r -g. If an alignment is excluded by a
preceding operation, it will be ignored by the succeeding operations
.PP
  An important distinction between the -g option and the -r -q
options is that -g requires the alignments to be mutually consistent
in their order, while the -r -q options are not required to be
mutually consistent and therefore tolerate translocations,
inversions, etc. Thus, -r provides a one-to-many, -q a many-to-one,
-r -q a one-to-one local mapping, and -g a one-to-one global mapping
of reference and query bases respectively.
.br
.B mapview
.br
  -h
.br
  --help   Display help information and exit
.br
  -m|mag   Set the magnification at which the figure is rendered,
           this is an option for fig2dev which is used to generate
           the PDF and PS files (default 1.0)
.br
  -n|num   Set the number of output files used to partition the
           output, this is to avoid generating files that are too
           large to display (default 10)
.br
  -p|prefix  Set the output file prefix
           (default "PROMER_graph or NUCMER_graph")
.br
  -v
  --verbose  Verbose logging of the processed files
.br
  -V
  --version  Display the version information and exit
.br
  -x1 coord  Set the lower coordinate bound of the display
.br
  -x2 coord  Set the upper coordinate bound of the display
.br
  -g|ref     If the input file is provided by 'mgaps', set the
             reference sequence ID (as it appears in the first column
             of the UTR/CDS coords file)
.br
  -I         Display the name of query sequences
.br
  -Ir        Display the name of reference genes
.br
.B mummer
Find and output (to stdout) the positions and length of all
sufficiently long maximal matches of a substring in
<query-file> and <reference-file>

  -mum           compute maximal matches that are unique in both sequences
  -mumcand       same as -mumreference
  -mumreference  compute maximal matches that are unique in
 		 the reference-sequence but not necessarily
		 in the query-sequence (default)
  -maxmatch      compute all maximal matches regardless of their uniqueness
  -n             match only the characters a, c, g, or t
                 they can be in upper or in lower case
  -l             set the minimum length of a match
                 if not set, the default value is 20
  -b             compute forward and reverse complement matches
  -r             only compute reverse complement matches
  -s             show the matching substrings
  -c             report the query-position of a reverse complement match
                 relative to the original query sequence
  -F             force 4 column output format regardless of the number of
                 reference sequence inputs
  -L             show the length of the query sequences on the header line
.br
.B nuncmer
    nucmer generates nucleotide alignments between two mutli-FASTA input
    files. Two output files are generated. The .cluster output file lists
    clusters of matches between each sequence. The .delta file lists the
    distance between insertions and deletions that produce maximal scoring
    alignments between each sequence.

.I MANDATORY:
    Reference     Set the input reference multi-FASTA filename
    Query         Set the input query multi-FASTA filename

  --mum           Use anchor matches that are unique in both the reference
                  and query
  --mumcand       Same as --mumreference
  --mumreference  Use anchor matches that are unique in in the reference
                  but not necessarily unique in the query (default behavior)
  --maxmatch      Use all anchor matches regardless of their uniqueness

  -b|breaklen     Set the distance an alignment extension will attempt to
                  extend poor scoring regions before giving up (default 200)
  -c|mincluster   Sets the minimum length of a cluster of matches (default 65)
  --[no]delta     Toggle the creation of the delta file (default --delta)
  --depend        Print the dependency information and exit
  -d|diagfactor   Set the clustering diagonal difference separation factor
                  (default 0.12)
  --[no]extend    Toggle the cluster extension step (default --extend)
  -f
  --forward       Use only the forward strand of the Query sequences
  -g|maxgap       Set the maximum gap between two adjacent matches in a
                  cluster (default 90)
  -h
  --help          Display help information and exit
  -l|minmatch     Set the minimum length of a single match (default 20)
  -o
  --coords        Automatically generate the original NUCmer1.1 coords
                  output file using the 'show-coords' program
  --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                  extension reaches the end of a sequence, it will backtrack
                  to optimize the alignment score instead of terminating the
                  alignment at the end of the sequence (default --optimize)
  -p|prefix       Set the prefix of the output files (default "out")
  -r
  --reverse       Use only the reverse complement of the Query sequences
  --[no]simplify  Simplify alignments by removing shadowed clusters. Turn
                  this option off if aligning a sequence to itself to look
                  for repeats (default --simplify)
    
.br
.B promer
    promer generates amino acid alignments between two mutli-FASTA DNA input
    files. Two output files are generated. The .cluster output file lists
    clusters of matches between each sequence. The .delta file lists the
    distance between insertions and deletions that produce maximal scoring
    alignments between each sequence. The DNA input is translated into all 6
    reading frames in order to generate the output, but the output coordinates
    reference the original DNA input.

.I MANDATORY:
    Reference     Set the input reference multi-FASTA DNA file
    Query         Set the input query multi-FASTA DNA file

  --mum           Use anchor matches that are unique in both the reference
                  and query
  --mumcand       Same as --mumreference
  --mumreference  Use anchor matches that are unique in in the reference
                  but not necessarily unique in the query (default behavior)
  --maxmatch      Use all anchor matches regardless of their uniqueness

  -b|breaklen     Set the distance an alignment extension will attempt to
                  extend poor scoring regions before giving up, measured in
                  amino acids (default 60)
  -c|mincluster   Sets the minimum length of a cluster of matches, measured in
                  amino acids (default 20)
  --[no]delta     Toggle the creation of the delta file (default --delta)
  --depend        Print the dependency information and exit
  -d|diagfactor   Set the clustering diagonal difference separation factor
                  (default .11)
  --[no]extend    Toggle the cluster extension step (default --extend)
  -g|maxgap       Set the maximum gap between two adjacent matches in a
                  cluster, measured in amino acids (default 30)
  -l|minmatch     Set the minimum length of a single match, measured in amino
                  acids (default 6)
  -m|masklen      Set the maximum bookend masking lenth, measured in amino
                  acids (default 8)
  -o
  --coords        Automatically generate the original PROmer1.1 ".coords"
                  output file using the "show-coords" program
  --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                  extension reaches the end of a sequence, it will backtrack
                  to optimize the alignment score instead of terminating the
                  alignment at the end of the sequence (default --optimize)

  -p|prefix       Set the prefix of the output files (default "out")
  -x|matrix       Set the alignment matrix number to 1 [BLOSUM 45],
                  2 [BLOSUM 62] or 3 [BLOSUM 80] (default 2)
.br
.B repeat-match
Find all maximal exact matches in <genome-file>
  -E    Use exhaustive (slow) search to find matches
  -f    Forward strand only, don't use reverse complement
  -n #  Set minimum exact match length to #
  -t    Only output tandem repeats
  -V #  Set level of verbose (debugging) printing to #
.br
.B show-aligns
  -h      Display help information
  -q      Sort alignments by the query start coordinate
  -r      Sort alignments by the reference start coordinate
  -w int  Set the screen width - default is 60
  -x int  Set the matrix type - default is 2 (BLOSUM 62),
          other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
          note: only has effect on amino acid alignments
.br
.B show-coords
  -b          Merges overlapping alignments regardless of match dir
              or frame and does not display any idenitity information.
  -B          Switch output to btab format
  -c          Include percent coverage information in the output
  -d          Display the alignment direction in the additional
              FRM columns (default for promer)
  -g          Deprecated option. Please use 'delta-filter' instead
  -h          Display help information
  -H          Do not print the output header
  -I float    Set minimum percent identity to display
  -k          Knockout (do not display) alignments that overlap
              another alignment in a different frame by more than 50%
              of their length, AND have a smaller percent similarity
              or are less than 75% of the size of the other alignment
              (promer only)
  -l          Include the sequence length information in the output
  -L long     Set minimum alignment length to display
  -o          Annotate maximal alignments between two sequences, i.e.
              overlaps between reference and query sequences
  -q          Sort output lines by query IDs and coordinates
  -r          Sort output lines by reference IDs and coordinates
  -T          Switch output to tab-delimited format

  Input is the .delta output of either the "nucmer" or the
"promer" program passed on the command line.
.PP
  Output is to stdout, and consists of a list of coordinates,
percent identity, and other useful information regarding the
alignment data contained in the .delta file used as input.
.PP
  NOTE: No sorting is done by default, therefore the alignments
will be ordered as found in the <deltafile> input.
.br
.B show-snps
  -C            Do not report SNPs from alignments with an ambiguous
                mapping, i.e. only report SNPs where the [R] and [Q]
                columns equal 0 and do not output these columns
  -h            Display help information
  -H            Do not print the output header
  -I            Do not report indels
  -l            Include sequence length information in the output
  -q            Sort output lines by query IDs and SNP positions
  -r            Sort output lines by reference IDs and SNP positions
  -S            Specify which alignments to report by passing
                'show-coords' lines to stdin
  -T            Switch to tab-delimited format
  -x int        Include x characters of surrounding SNP context in the
                output, default 0
  
  Input is the .delta output of either the nucmer or promer program
passed on the command line.
.PP
  Output is to stdout, and consists of a list of SNPs (or amino acid
substitutions for promer) with positions and other useful info.
Output will be sorted with -r by default and the [BUFF] column will
always refer to the sequence whose positions have been sorted. This
value specifies the distance from this SNP to the nearest mismatch
(end of alignment, indel, SNP, etc) in the same alignment, while the
[DIST] column specifies the distance from this SNP to the nearest
sequence end. SNPs for which the [R] and [Q] columns are greater than
0 should be evaluated with caution, as these columns specify the
number of other alignments which overlap this position. Use -C to
assure SNPs are only reported from unique alignment regions.

.B show-tiling
  -a          Describe the tiling path by printing the tab-delimited
              alignment region coordinates to stdout
  -c          Assume the reference sequences are circular, and allow
              tiled contigs to span the origin
  -g int      Set maximum gap between clustered alignments [-1, INT_MAX]
              A value of -1 will represent infinity
              (nucmer default = 1000)
              (promer default = -1)
  -i float    Set minimum percent identity to tile [0.0, 100.0]
              (nucmer default = 90.0)
              (promer default = 55.0)
  -l int      Set minimum length contig to report [-1, INT_MAX]
              A value of -1 will represent infinity
              (common default = 1)
  -p file     Output a pseudo molecule of the query contigs to 'file'
  -R          Deal with repetitive contigs by randomly placing them
              in one of their copy locations (implies -V 0)
  -t file     Output a TIGR style contig list of each query sequence
              that sufficiently matches the reference (non-circular)
  -u file     Output the tab-delimited alignment region coordinates
              of the unusable contigs to 'file'
  -v float    Set minimum contig coverage to tile [0.0, 100.0]
              (nucmer default = 95.0) sum of individual alignments
              (promer default = 50.0) extent of syntenic region
  -V float    Set minimum contig coverage difference [0.0, 100.0]
              i.e. the difference needed to determine one alignment
              is 'better' than another alignment
              (nucmer default = 10.0) sum of individual alignments
              (promer default = 30.0) extent of syntenic region
  -x          Describe the tiling path by printing the XML contig
              linking information to stdout

  Input is the .delta output of the nucmer program, run on very
similar sequence data, or the .delta output of the promer program,
run on divergent sequence data.
.PP
  Output is to stdout, and consists of the predicted location of
each aligning query contig as mapped to the reference sequences.
These coordinates reference the extent of the entire query contig,
even when only a certain percentage of the contig was actually
aligned (unless the -a option is used). Columns are, start in ref,
end in ref, distance to next contig, length of this contig, alignment
coverage, identity, orientation, and ID respectively.

.SH SEE ALSO
.BR http://mummer.sourceforge.net/ 
.br
.PP
Open source MUMmer 3.0 is described in
.br
.I "Versatile and open software for comparing large genomes."
S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg, Genome Biology (2004), 5:R12.
.SH AUTHOR
mummer was written by S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.