1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393
|
.TH GMAP "1" "GMAP 2014-10-22" "User Commands"
.SH NAME
gmap \- Genomic Mapping and Alignment Program
.SH SYNOPSIS
.B gmap
[\fI\,OPTIONS\/\fR...] \fI\,<FASTA files\/\fR...\fI\,>, or\/\fR cat <FASTA files...> | gmap [OPTIONS...]
.SH DESCRIPTION
Align the sequences QUERY to the reference, specified with
\fB-d\fR or \fB-g\fR.
.SH OPTIONS
.SS Input options (must include \fB\-d\fR or \fB\-g\fR)
.TP
\fB\-D\fR, \fB\-\-dir\fR=\fI\,directory\/\fR
Genome directory
.TP
\fB\-d\fR, \fB\-\-db\fR=\fI\,STRING\/\fR
Genome database. If argument is '?' (with
the quotes), this command lists available databases.
.TP
\fB\-k\fR, \fB\-\-kmer\fR=\fI\,INT\/\fR
kmer size to use in genome database (allowed values: 16 or less).
If not specified, the program will find the highest available
kmer size in the genome database
.TP
\fB\-\-sampling\fR=\fI\,INT\/\fR
Sampling to use in genome database. If not specified, the program
will find the smallest available sampling value in the genome database
within selected k\-mer size
.TP
\fB\-G\fR, \fB\-\-genomefull\fR
Use full genome (all ASCII chars allowed;
built explicitly during setup), not
compressed version
.TP
\fB\-g\fR, \fB\-\-gseg\fR=\fI\,filename\/\fR
User\-supplied genomic segment
.TP
\fB\-1\fR, \fB\-\-selfalign\fR
Align one sequence against itself in FASTA format via stdin
(Useful for getting protein translation of a nucleotide sequence)
.TP
\fB\-2\fR, \fB\-\-pairalign\fR
Align two sequences in FASTA format via stdin, first one being
genomic and second one being cDNA
.TP
\fB\-\-cmdline\fR=\fI\,STRING\/\fR,STRING
Align these two sequences provided on the command line,
first one being genomic and second one being cDNA
.TP
\fB\-q\fR, \fB\-\-part\fR=\fI\,INT\/\fR/INT
Process only the i\-th out of every n sequences
e.g., 0/100 or 99/100 (useful for distributing jobs
to a computer farm).
.TP
\fB\-\-input\-buffer\-size\fR=\fI\,INT\/\fR
Size of input buffer (program reads this many sequences
at a time for efficiency) (default 1000)
.SS
.SS
Computation options
.TP
\fB\-B\fR, \fB\-\-batch\fR=\fI\,INT\/\fR
Batch mode (default = 2)
Mode Offsets Positions Genome
0 see note mmap mmap
1 see note mmap & preload mmap
(default) 2 see note mmap & preload mmap & preload
3 see note allocate mmap & preload
4 see note allocate allocate
5 expand allocate allocate
Note: For a single sequence, all data structures use mmap.
If mmap not available and allocate not chosen, then will use fileio (very slow)
.TP
Note about \fB\-\-batch\fR and offsets: Expansion of offsets can be controlled independently by the \fB\-\-expand\-offsets\fR flag.
The \fB\-\-batch\fR=\fI\,5\/\fR option is equivalent
to \fB\-\-batch\fR=\fI\,4\/\fR plus \fB\-\-expand\-offsets\fR=\fI\,1\/\fR
.TP
\fB\-\-expand\-offsets\fR=\fI\,INT\/\fR
Whether to expand the genomic offsets index
Values: 0 (no, default), or 1 (yes).
Expansion gives faster alignment, but requires more memory
.TP
\fB\-\-nosplicing\fR
Turns off splicing (useful for aligning genomic sequences
onto a genome)
.TP
\fB\-\-min\-intronlength\fR=\fI\,INT\/\fR
Min length for one internal intron (default 9). Below this size,
a genomic gap will be considered a deletion rather than an intron.
.TP
\fB\-K\fR, \fB\-\-intronlength\fR=\fI\,INT\/\fR
Max length for one internal intron (default 1000000)
.TP
\fB\-w\fR, \fB\-\-localsplicedist\fR=\fI\,INT\/\fR
Max length for known splice sites at ends of sequence
(default 2,000,000)
.TP
\fB\-L\fR, \fB\-\-totallength\fR=\fI\,INT\/\fR
Max total intron length (default 2400000)
.TP
\fB\-x\fR, \fB\-\-chimera\-margin\fR=\fI\,INT\/\fR
Amount of unaligned sequence that triggers
search for the remaining sequence (default 30).
Enables alignment of chimeric reads, and may help
with some non\-chimeric reads. To turn off, set to
zero.
.TP
\fB\-\-no\-chimeras\fR
Turns off finding of chimeras. Same effect as \fB\-\-chimera\-margin\fR=\fI\,0\/\fR
.TP
\fB\-t\fR, \fB\-\-nthreads\fR=\fI\,INT\/\fR
Number of worker threads
.TP
\fB\-c\fR, \fB\-\-chrsubset\fR=\fI\,string\/\fR
Limit search to given chromosome
.TP
\fB\-z\fR, \fB\-\-direction\fR=\fI\,STRING\/\fR
cDNA direction (sense_force, antisense_force,
sense_filter, antisense_filter,or auto (default))
.TP
\fB\-H\fR, \fB\-\-trimendexons\fR=\fI\,INT\/\fR
Trim end exons with fewer than given number of matches
(in nt, default 12)
.TP
\fB\-\-canonical\-mode\fR=\fI\,INT\/\fR
Reward for canonical and semi\-canonical introns
0=low reward, 1=high reward (default), 2=low reward for
high\-identity sequences and high reward otherwise
.TP
\fB\-\-cross\-species\fR
Use a more sensitive search for canonical splicing, which helps especially
for cross\-species alignments and other difficult cases
.TP
\fB\-\-allow\-close\-indels\fR=\fI\,INT\/\fR
Allow an insertion and deletion close to each other
(0=no, 1=yes (default), 2=only for high\-quality alignments)
.TP
\fB\-\-microexon\-spliceprob\fR=\fI\,FLOAT\/\fR
Allow microexons only if one of the splice site probabilities is
greater than this value (default 0.90)
.TP
\fB\-\-cmetdir\fR=\fI\,STRING\/\fR
Directory for methylcytosine index files (created using cmetindex)
(default is location of genome index files specified using \fB\-D\fR, \fB\-V\fR, and \fB\-d\fR)
.TP
\fB\-\-atoidir\fR=\fI\,STRING\/\fR
Directory for A\-to\-I RNA editing index files (created using atoiindex)
(default is location of genome index files specified using \fB\-D\fR, \fB\-V\fR, and \fB\-d\fR)
.TP
\fB\-\-mode\fR=\fI\,STRING\/\fR
Alignment mode: standard (default), cmet\-stranded, cmet\-nonstranded,
atoi\-stranded, or atoi\-nonstranded. Non\-standard modes requires you
to have previously run the cmetindex or atoiindex programs on the genome
.TP
\fB\-p\fR, \fB\-\-prunelevel\fR
Pruning level: 0=no pruning (default), 1=poor seqs,
2=repetitive seqs, 3=poor and repetitive
.SS
Output types
.TP
\fB\-S\fR, \fB\-\-summary\fR
Show summary of alignments only
.TP
\fB\-A\fR, \fB\-\-align\fR
Show alignments
.TP
\fB\-3\fR, \fB\-\-continuous\fR
Show alignment in three continuous lines
.TP
\fB\-4\fR, \fB\-\-continuous\-by\-exon\fR
Show alignment in three lines per exon
.TP
\fB\-Z\fR, \fB\-\-compress\fR
Print output in compressed format
.TP
\fB\-E\fR, \fB\-\-exons\fR=\fI\,STRING\/\fR
Print exons ("cdna" or "genomic")
.TP
\fB\-P\fR, \fB\-\-protein_dna\fR
Print protein sequence (cDNA)
.TP
\fB\-Q\fR, \fB\-\-protein_gen\fR
Print protein sequence (genomic)
.TP
\fB\-f\fR, \fB\-\-format\fR=\fI\,INT\/\fR
Other format for output (also note the \fB\-A\fR and \fB\-S\fR options
and other options listed under Output types):
psl (or 1) = PSL (BLAT) format,
gff3_gene (or 2) = GFF3 gene format,
gff3_match_cdna (or 3) = GFF3 cDNA_match format,
gff3_match_est (or 4) = GFF3 EST_match format,
splicesites (or 6) = splicesites output (for GSNAP splicing file),
introns = introns output (for GSNAP splicing file),
map_exons (or 7) = IIT FASTA exon map format,
map_ranges (or 8) = IIT FASTA range map format,
coords (or 9) = coords in table format,
sampe = SAM format (setting paired_read bit in flag),
samse = SAM format (without setting paired_read bit)
.SS
Output options
.TP
\fB\-n\fR, \fB\-\-npaths\fR=\fI\,INT\/\fR
Maximum number of paths to show (default 5). If set to 1, GMAP
will not report chimeric alignments, since those imply
two paths. If you want a single alignment plus chimeric
alignments, then set this to be 0.
.TP
\fB\-\-suboptimal\-score\fR=\fI\,INT\/\fR
Report only paths whose score is within this value of the
best path. By default, if this option is not provided,
the program prints all paths found.
.TP
\fB\-O\fR, \fB\-\-ordered\fR
Print output in same order as input (relevant
only if there is more than one worker thread)
.TP
\fB\-5\fR, \fB\-\-md5\fR
Print MD5 checksum for each query sequence
.TP
\fB\-o\fR, \fB\-\-chimera\-overlap\fR
Overlap to show, if any, at chimera breakpoint
.TP
\fB\-\-failsonly\fR
Print only failed alignments, those with no results
.TP
\fB\-\-nofails\fR
Exclude printing of failed alignments
.TP
\fB\-V\fR, \fB\-\-snpsdir\fR=\fI\,STRING\/\fR
Directory for SNPs index files (created using snpindex) (default is
location of genome index files specified using \fB\-D\fR and \fB\-d\fR)
.TP
\fB\-v\fR, \fB\-\-use\-snps\fR=\fI\,STRING\/\fR
Use database containing known SNPs (in <STRING>.iit, built
previously using snpindex) for tolerance to SNPs
.TP
\fB\-\-split\-output\fR=\fI\,STRING\/\fR
Basename for multiple\-file output, separately for nomapping,
uniq, mult, (and chimera, if \fB\-\-chimera\-margin\fR is selected)
.TP
\fB\-\-failed\-input\fR=\fI\,STRING\/\fR
Print completely failed alignments as input FASTA or FASTQ format
to the given file. If the \fB\-\-split\-output\fR flag is also given, this file
is generated in addition to the output in the .nomapping file.
.TP
\fB\-\-append\-output\fR
When \fB\-\-split\-output\fR or \fB\-\-failedinput\fR is given, this flag will append output
to the existing files. Otherwise, the default is to create new files.
.TP
\fB\-\-output\-buffer\-size\fR=\fI\,INT\/\fR
Buffer size, in queries, for output thread (default 1000). When the number
of results to be printed exceeds this size, the worker threads are halted
until the backlog is cleared
.TP
\fB\-F\fR, \fB\-\-fulllength\fR
Assume full\-length protein, starting with Met
.TP
\fB\-a\fR, \fB\-\-cdsstart\fR=\fI\,INT\/\fR
Translate codons from given nucleotide (1\-based)
.TP
\fB\-T\fR, \fB\-\-truncate\fR
Truncate alignment around full\-length protein, Met to Stop
Implies \fB\-F\fR flag.
.TP
\fB\-Y\fR, \fB\-\-tolerant\fR
Translates cDNA with corrections for frameshifts
.SS
Options for SAM output
.TP
\fB\-\-no\-sam\-headers\fR
Do not print headers beginning with '@'
.TP
\fB\-\-sam\-use\-0M\fR
Insert 0M in CIGAR between adjacent insertions and deletions
Required by Picard, but can cause errors in other tools
.TP
\fB\-\-force\-xs\-dir\fR
For RNA\-Seq alignments, disallows XS:A:? when the sense direction
is unclear, and replaces this value arbitrarily with XS:A:+.
May be useful for some programs, such as Cufflinks, that cannot
handle XS:A:?. However, if you use this flag, the reported value
of XS:A:+ in these cases will not be meaningful.
.TP
\fB\-\-md\-lowercase\-snp\fR
In MD string, when known SNPs are given by the \fB\-v\fR flag,
prints difference nucleotides as lower\-case when they,
differ from reference but match a known alternate allele
.TP
\fB\-\-read\-group\-id\fR=\fI\,STRING\/\fR
Value to put into read\-group id (RG\-ID) field
.TP
\fB\-\-read\-group\-name\fR=\fI\,STRING\/\fR
Value to put into read\-group name (RG\-SM) field
.TP
\fB\-\-read\-group\-library\fR=\fI\,STRING\/\fR
Value to put into read\-group library (RG\-LB) field
.TP
\fB\-\-read\-group\-platform\fR=\fI\,STRING\/\fR
Value to put into read\-group library (RG\-PL) field
.SS
Options for quality scores
.TP
\fB\-\-quality\-protocol\fR=\fI\,STRING\/\fR
Protocol for input quality scores. Allowed values:
illumina (ASCII 64\-126) (equivalent to \fB\-J\fR 64 \fB\-j\fR \fB\-31\fR)
sanger (ASCII 33\-126) (equivalent to \fB\-J\fR 33 \fB\-j\fR 0)
Default is sanger (no quality print shift)
SAM output files should have quality scores in sanger protocol
Or you can specify the print shift with this flag:
.TP
\fB\-j\fR, \fB\-\-quality\-print\-shift\fR=\fI\,INT\/\fR
Shift FASTQ quality scores by this amount in output
(default is 0 for sanger protocol; to change Illumina input
to Sanger output, select \fB\-31\fR)
.SS
External map file options
.TP
\fB\-M\fR, \fB\-\-mapdir\fR=\fI\,directory\/\fR
Map directory
.TP
\fB\-m\fR, \fB\-\-map\fR=\fI\,iitfile\/\fR
Map file. If argument is '?' (with the quotes),
this lists available map files.
.TP
\fB\-e\fR, \fB\-\-mapexons\fR
Map each exon separately
.TP
\fB\-b\fR, \fB\-\-mapboth\fR
Report hits from both strands of genome
.TP
\fB\-u\fR, \fB\-\-flanking\fR=\fI\,INT\/\fR
Show flanking hits (default 0)
.TP
\fB\-\-print\-comment\fR
Show comment line for each hit
.SS
Alignment output options
.TP
\fB\-N\fR, \fB\-\-nolengths\fR
No intron lengths in alignment
.TP
\fB\-I\fR, \fB\-\-invertmode\fR=\fI\,INT\/\fR
Mode for alignments to genomic (\-) strand:
0=Don't invert the cDNA (default)
1=Invert cDNA and print genomic (\-) strand
2=Invert cDNA and print genomic (+) strand
.TP
\fB\-i\fR, \fB\-\-introngap\fR=\fI\,INT\/\fR
Nucleotides to show on each end of intron (default=3)
.TP
\fB\-l\fR, \fB\-\-wraplength\fR=\fI\,INT\/\fR
Wrap length for alignment (default=50)
.SS
Filtering output options
.TP
\fB\-\-min\-trimmed\-coverage\fR=\fI\,FLOAT\/\fR
Do not print alignments with trimmed coverage less
this value (default=0.0, which means no filtering)
Note that chimeric alignments will be output regardless
of this filter
.TP
\fB\-\-min\-identity\fR=\fI\,FLOAT\/\fR
Do not print alignments with identity less
this value (default=0.0, which means no filtering)
Note that chimeric alignments will be output regardless
of this filter
.SS
Help options
.TP
\fB\-\-version\fR
Show version
.TP
\fB\-\-help\fR
Show this help message
.SH ENVIRONMENT
.TP
\fBGMAPDB\fR
genome directory (eqivalent to \fB-D\fR)
.SH FILES
.TP
~/.gmaprc
configuration file
.SH AUTHOR
Thomas D. Wu and Colin K. Watanabe
.SH "REPORTING BUGS"
Report bugs to Thomas Wu <twu@gene.com>.
.SH COPYRIGHT
Copyright 2005 Genentech, Inc. All rights reserved.
.SH "SEE ALSO"
\fBgmap_build\fR(1), \fBgsnap\fR(1)
.br
|