1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477
|
.\" Hey, EMACS: -*- nroff -*-
.\" First parameter, NAME, should be all caps
.\" Second parameter, SECTION, should be 1-8, maybe w/ subsection
.\" other parameters are allowed: see man(7), man(1)
.TH MUMMER 1 "May 21, 2005"
.\" Please adjust this date whenever revising the manpage.
.\"
.\" Some roff macros, for reference:
.\" .nh disable hyphenation
.\" .hy enable hyphenation
.\" .ad l left justify
.\" .ad b justify to both left and right margins
.\" .nf disable filling
.\" .fi enable filling
.\" .br insert line break
.\" .sp <n> insert n+1 empty lines
.\" for manpage-specific macros, see man(7)
.SH NAME
mummer \- package for sequence alignment of multiple genomes
.SH SYNOPSIS
.B mummer-annotate
.RI <gapfile> <datafile>
.br
.B combineMUMs
.RI <RefSequence> <MatchSequences> <GapsFile>
.br
.B delta-filter
.RI [options] <deltafile>
.br
.B dnadiff
.RI [options] <reference> <query>
or
.RI [options] -d <delta file>
.br
.B exact-tandems
.RI <file> <min-match-len>
.br
.B gaps
.br
.B mapview
.RI [options] <coords file> [UTR coords] [CDS coords]
.br
.B mgaps
.RI [-d <DiagDiff>] [-f <DiagFactor>] [-l <MatchLen>] [-s <MaxSeparation>]
.br
.B mummer
.RI [ options ] <reference-file> <query-files>
.br
.B mummerplot
.RI [options] <match file>
.br
.B nucmer
.RI [options] <Reference> <Query>
.br
.B nucmer2xfig
.br
.B promer
.RI [options] <Reference> <Query>
.br
.B repeat-match
.RI [options] <genome-file>
.br
.B run-mummer1
.RI <fasta reference> <fasta query> <prefix> [-r]
.br
.B run-mummer3
.RI <fasta reference> <multi-fasta query> <prefix>
.br
.B show-aligns
.RI [options] <deltafile> <ref ID> <qry ID>
.PP
Input is the .delta output of either the "nucmer" or the
"promer" program passed on the command line.
.PP
Output is to stdout, and consists of all the alignments between the
query and reference sequences identified on the command line.
.PP
NOTE: No sorting is done by default, therefore the alignments
will be ordered as found in the <deltafile> input.
.br
.B show-coords
.RI [options] <deltafile>
.br
.B show-snps
.RI [options] <deltafile>
.br
.B show-tiling
.RI [options] <deltafile>
.br
.SH DESCRIPTION
.SH OPTIONS
All tools (exept for gaps) obey to the -h, --help, -V and --version options
as one would expect. This help is excellent and makes these man pages basically obsolete.
.br
.B combineMUMs
Combines MUMs in <GapsFile> by extending matches off
ends and between MUMs. <RefSequence> is a fasta file
of the reference sequence. <MatchSequences> is a
multi-fasta file of the sequences matched against the
reference
.PP
-D Only output to stdout the difference positions
and characters
-n Allow matches only between nucleotides, i.e., ACGTs
-N num Break matches at <num> or more consecutive non-ACGTs
-q tag Used to label query match
-r tag Used to label reference match
-S Output all differences in strings
-t Label query matches with query fasta header
-v num Set verbose level for extra output
-W file Reset the default output filename witherrors.gaps
-x Don't output .cover files
-e Set error-rate cutoff to e (e.g. 0.02 is two percent)
.br
.B dnadiff
Run comparative analysis of two sequence sets using nucmer and its
associated utilities with recommended parameters. See MUMmer
documentation for a more detailed description of the
output. Produces the following output files:
.PP
.report - Summary of alignments, differences and SNPs
.delta - Standard nucmer alignment output
.1delta - 1-to-1 alignment from delta-filter -1
.mdelta - M-to-M alignment from delta-filter -m
.1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
.mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
.snps - SNPs from show-snps -rlTHC .1delta
.rdiff - Classified ref breakpoints from show-diff -rH .mdelta
.qdiff - Classified qry breakpoints from show-diff -qH .mdelta
.unref - Unaligned reference IDs and lengths (if applicable)
.unqry - Unaligned query IDs and lengths (if applicable)
.PP
MANDATORY:
reference Set the input reference multi-FASTA filename
query Set the input query multi-FASTA filename
or
delta file Unfiltered .delta alignment file from nucmer
.PP
OPTIONS:
-d|delta Provide precomputed delta file for analysis
-h
--help Display help information and exit
-p|prefix Set the prefix of the output files (default "out")
-V
--version Display the version information and exit
.br
.B delta-filter
-e float For switches -g -r -q, keep repeats within e percent
of the best LIS score [0, 100], no repeats by default
-g Global alignment using length*identity weighted LIS.
For every reference-query pair, leave only the aligns
which form the longest mutually consistent set
-h Display help information
-i float Set the minimum alignment identity [0, 100], default 0
-l int Set the minimum alignment length, default 0
-q Query alignment using length*identity weighted LIS.
For each query, leave only the aligns which form the
longest consistent set for the query
-r Reference alignment using length*identity weighted LIS.
For each reference, leave only the aligns which form
the longest consistent set for the reference
-u float Set the minimum alignment uniqueness, i.e. percent of
the alignment matching to unique reference AND query
sequence [0, 100], default 0
-o float Set the maximum alignment overlap for -r and -q options
as a percent of the alignment length [0, 100], default 100
.PP
Reads a delta alignment file from either nucmer or promer and
filters the alignments based on the command-line switches, leaving
only the desired alignments which are output to stdout in the same
delta format as the input. For multiple switches, order of operations
is as follows: -i -l -u -q -r -g. If an alignment is excluded by a
preceding operation, it will be ignored by the succeeding operations
.PP
An important distinction between the -g option and the -r -q
options is that -g requires the alignments to be mutually consistent
in their order, while the -r -q options are not required to be
mutually consistent and therefore tolerate translocations,
inversions, etc. Thus, -r provides a one-to-many, -q a many-to-one,
-r -q a one-to-one local mapping, and -g a one-to-one global mapping
of reference and query bases respectively.
.br
.B mapview
.br
-h
.br
--help Display help information and exit
.br
-m|mag Set the magnification at which the figure is rendered,
this is an option for fig2dev which is used to generate
the PDF and PS files (default 1.0)
.br
-n|num Set the number of output files used to partition the
output, this is to avoid generating files that are too
large to display (default 10)
.br
-p|prefix Set the output file prefix
(default "PROMER_graph or NUCMER_graph")
.br
-v
--verbose Verbose logging of the processed files
.br
-V
--version Display the version information and exit
.br
-x1 coord Set the lower coordinate bound of the display
.br
-x2 coord Set the upper coordinate bound of the display
.br
-g|ref If the input file is provided by 'mgaps', set the
reference sequence ID (as it appears in the first column
of the UTR/CDS coords file)
.br
-I Display the name of query sequences
.br
-Ir Display the name of reference genes
.br
.B mummer
Find and output (to stdout) the positions and length of all
sufficiently long maximal matches of a substring in
<query-file> and <reference-file>
-mum compute maximal matches that are unique in both sequences
-mumcand same as -mumreference
-mumreference compute maximal matches that are unique in
the reference-sequence but not necessarily
in the query-sequence (default)
-maxmatch compute all maximal matches regardless of their uniqueness
-n match only the characters a, c, g, or t
they can be in upper or in lower case
-l set the minimum length of a match
if not set, the default value is 20
-b compute forward and reverse complement matches
-r only compute reverse complement matches
-s show the matching substrings
-c report the query-position of a reverse complement match
relative to the original query sequence
-F force 4 column output format regardless of the number of
reference sequence inputs
-L show the length of the query sequences on the header line
.br
.B nuncmer
nucmer generates nucleotide alignments between two mutli-FASTA input
files. Two output files are generated. The .cluster output file lists
clusters of matches between each sequence. The .delta file lists the
distance between insertions and deletions that produce maximal scoring
alignments between each sequence.
.I MANDATORY:
Reference Set the input reference multi-FASTA filename
Query Set the input query multi-FASTA filename
--mum Use anchor matches that are unique in both the reference
and query
--mumcand Same as --mumreference
--mumreference Use anchor matches that are unique in in the reference
but not necessarily unique in the query (default behavior)
--maxmatch Use all anchor matches regardless of their uniqueness
-b|breaklen Set the distance an alignment extension will attempt to
extend poor scoring regions before giving up (default 200)
-c|mincluster Sets the minimum length of a cluster of matches (default 65)
--[no]delta Toggle the creation of the delta file (default --delta)
--depend Print the dependency information and exit
-d|diagfactor Set the clustering diagonal difference separation factor
(default 0.12)
--[no]extend Toggle the cluster extension step (default --extend)
-f
--forward Use only the forward strand of the Query sequences
-g|maxgap Set the maximum gap between two adjacent matches in a
cluster (default 90)
-h
--help Display help information and exit
-l|minmatch Set the minimum length of a single match (default 20)
-o
--coords Automatically generate the original NUCmer1.1 coords
output file using the 'show-coords' program
--[no]optimize Toggle alignment score optimization, i.e. if an alignment
extension reaches the end of a sequence, it will backtrack
to optimize the alignment score instead of terminating the
alignment at the end of the sequence (default --optimize)
-p|prefix Set the prefix of the output files (default "out")
-r
--reverse Use only the reverse complement of the Query sequences
--[no]simplify Simplify alignments by removing shadowed clusters. Turn
this option off if aligning a sequence to itself to look
for repeats (default --simplify)
.br
.B promer
promer generates amino acid alignments between two mutli-FASTA DNA input
files. Two output files are generated. The .cluster output file lists
clusters of matches between each sequence. The .delta file lists the
distance between insertions and deletions that produce maximal scoring
alignments between each sequence. The DNA input is translated into all 6
reading frames in order to generate the output, but the output coordinates
reference the original DNA input.
.I MANDATORY:
Reference Set the input reference multi-FASTA DNA file
Query Set the input query multi-FASTA DNA file
--mum Use anchor matches that are unique in both the reference
and query
--mumcand Same as --mumreference
--mumreference Use anchor matches that are unique in in the reference
but not necessarily unique in the query (default behavior)
--maxmatch Use all anchor matches regardless of their uniqueness
-b|breaklen Set the distance an alignment extension will attempt to
extend poor scoring regions before giving up, measured in
amino acids (default 60)
-c|mincluster Sets the minimum length of a cluster of matches, measured in
amino acids (default 20)
--[no]delta Toggle the creation of the delta file (default --delta)
--depend Print the dependency information and exit
-d|diagfactor Set the clustering diagonal difference separation factor
(default .11)
--[no]extend Toggle the cluster extension step (default --extend)
-g|maxgap Set the maximum gap between two adjacent matches in a
cluster, measured in amino acids (default 30)
-l|minmatch Set the minimum length of a single match, measured in amino
acids (default 6)
-m|masklen Set the maximum bookend masking lenth, measured in amino
acids (default 8)
-o
--coords Automatically generate the original PROmer1.1 ".coords"
output file using the "show-coords" program
--[no]optimize Toggle alignment score optimization, i.e. if an alignment
extension reaches the end of a sequence, it will backtrack
to optimize the alignment score instead of terminating the
alignment at the end of the sequence (default --optimize)
-p|prefix Set the prefix of the output files (default "out")
-x|matrix Set the alignment matrix number to 1 [BLOSUM 45],
2 [BLOSUM 62] or 3 [BLOSUM 80] (default 2)
.br
.B repeat-match
Find all maximal exact matches in <genome-file>
-E Use exhaustive (slow) search to find matches
-f Forward strand only, don't use reverse complement
-n # Set minimum exact match length to #
-t Only output tandem repeats
-V # Set level of verbose (debugging) printing to #
.br
.B show-aligns
-h Display help information
-q Sort alignments by the query start coordinate
-r Sort alignments by the reference start coordinate
-w int Set the screen width - default is 60
-x int Set the matrix type - default is 2 (BLOSUM 62),
other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
note: only has effect on amino acid alignments
.br
.B show-coords
-b Merges overlapping alignments regardless of match dir
or frame and does not display any idenitity information.
-B Switch output to btab format
-c Include percent coverage information in the output
-d Display the alignment direction in the additional
FRM columns (default for promer)
-g Deprecated option. Please use 'delta-filter' instead
-h Display help information
-H Do not print the output header
-I float Set minimum percent identity to display
-k Knockout (do not display) alignments that overlap
another alignment in a different frame by more than 50%
of their length, AND have a smaller percent similarity
or are less than 75% of the size of the other alignment
(promer only)
-l Include the sequence length information in the output
-L long Set minimum alignment length to display
-o Annotate maximal alignments between two sequences, i.e.
overlaps between reference and query sequences
-q Sort output lines by query IDs and coordinates
-r Sort output lines by reference IDs and coordinates
-T Switch output to tab-delimited format
Input is the .delta output of either the "nucmer" or the
"promer" program passed on the command line.
.PP
Output is to stdout, and consists of a list of coordinates,
percent identity, and other useful information regarding the
alignment data contained in the .delta file used as input.
.PP
NOTE: No sorting is done by default, therefore the alignments
will be ordered as found in the <deltafile> input.
.br
.B show-snps
-C Do not report SNPs from alignments with an ambiguous
mapping, i.e. only report SNPs where the [R] and [Q]
columns equal 0 and do not output these columns
-h Display help information
-H Do not print the output header
-I Do not report indels
-l Include sequence length information in the output
-q Sort output lines by query IDs and SNP positions
-r Sort output lines by reference IDs and SNP positions
-S Specify which alignments to report by passing
'show-coords' lines to stdin
-T Switch to tab-delimited format
-x int Include x characters of surrounding SNP context in the
output, default 0
Input is the .delta output of either the nucmer or promer program
passed on the command line.
.PP
Output is to stdout, and consists of a list of SNPs (or amino acid
substitutions for promer) with positions and other useful info.
Output will be sorted with -r by default and the [BUFF] column will
always refer to the sequence whose positions have been sorted. This
value specifies the distance from this SNP to the nearest mismatch
(end of alignment, indel, SNP, etc) in the same alignment, while the
[DIST] column specifies the distance from this SNP to the nearest
sequence end. SNPs for which the [R] and [Q] columns are greater than
0 should be evaluated with caution, as these columns specify the
number of other alignments which overlap this position. Use -C to
assure SNPs are only reported from unique alignment regions.
.B show-tiling
-a Describe the tiling path by printing the tab-delimited
alignment region coordinates to stdout
-c Assume the reference sequences are circular, and allow
tiled contigs to span the origin
-g int Set maximum gap between clustered alignments [-1, INT_MAX]
A value of -1 will represent infinity
(nucmer default = 1000)
(promer default = -1)
-i float Set minimum percent identity to tile [0.0, 100.0]
(nucmer default = 90.0)
(promer default = 55.0)
-l int Set minimum length contig to report [-1, INT_MAX]
A value of -1 will represent infinity
(common default = 1)
-p file Output a pseudo molecule of the query contigs to 'file'
-R Deal with repetitive contigs by randomly placing them
in one of their copy locations (implies -V 0)
-t file Output a TIGR style contig list of each query sequence
that sufficiently matches the reference (non-circular)
-u file Output the tab-delimited alignment region coordinates
of the unusable contigs to 'file'
-v float Set minimum contig coverage to tile [0.0, 100.0]
(nucmer default = 95.0) sum of individual alignments
(promer default = 50.0) extent of syntenic region
-V float Set minimum contig coverage difference [0.0, 100.0]
i.e. the difference needed to determine one alignment
is 'better' than another alignment
(nucmer default = 10.0) sum of individual alignments
(promer default = 30.0) extent of syntenic region
-x Describe the tiling path by printing the XML contig
linking information to stdout
Input is the .delta output of the nucmer program, run on very
similar sequence data, or the .delta output of the promer program,
run on divergent sequence data.
.PP
Output is to stdout, and consists of the predicted location of
each aligning query contig as mapped to the reference sequences.
These coordinates reference the extent of the entire query contig,
even when only a certain percentage of the contig was actually
aligned (unless the -a option is used). Columns are, start in ref,
end in ref, distance to next contig, length of this contig, alignment
coverage, identity, orientation, and ID respectively.
.SH SEE ALSO
.BR http://mummer.sourceforge.net/
.br
.PP
Open source MUMmer 3.0 is described in
.br
.I "Versatile and open software for comparing large genomes."
S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg, Genome Biology (2004), 5:R12.
.SH AUTHOR
mummer was written by S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.
|