File: gmap.1

package info (click to toggle)
gmap 2014-10-22-1
  • links: PTS, VCS
  • area: non-free
  • in suites: jessie, jessie-kfreebsd
  • size: 25,436 kB
  • ctags: 9,219
  • sloc: ansic: 428,660; sh: 10,260; perl: 5,188; makefile: 554
file content (393 lines) | stat: -rw-r--r-- 13,428 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
.TH GMAP "1" "GMAP 2014-10-22" "User Commands"
.SH NAME
gmap \- Genomic Mapping and Alignment Program
.SH SYNOPSIS
.B gmap
[\fI\,OPTIONS\/\fR...] \fI\,<FASTA files\/\fR...\fI\,>, or\/\fR cat <FASTA files...> | gmap [OPTIONS...]
.SH DESCRIPTION
Align the sequences QUERY to the reference, specified with
\fB-d\fR or \fB-g\fR.
.SH OPTIONS
.SS Input options (must include \fB\-d\fR or \fB\-g\fR)
.TP
\fB\-D\fR, \fB\-\-dir\fR=\fI\,directory\/\fR
Genome directory
.TP
\fB\-d\fR, \fB\-\-db\fR=\fI\,STRING\/\fR
Genome database.  If argument is '?' (with
the quotes), this command lists available databases.
.TP
\fB\-k\fR, \fB\-\-kmer\fR=\fI\,INT\/\fR
kmer size to use in genome database (allowed values: 16 or less).
If not specified, the program will find the highest available
kmer size in the genome database
.TP
\fB\-\-sampling\fR=\fI\,INT\/\fR
Sampling to use in genome database.  If not specified, the program
will find the smallest available sampling value in the genome database
within selected k\-mer size
.TP
\fB\-G\fR, \fB\-\-genomefull\fR
Use full genome (all ASCII chars allowed;
built explicitly during setup), not
compressed version
.TP
\fB\-g\fR, \fB\-\-gseg\fR=\fI\,filename\/\fR
User\-supplied genomic segment
.TP
\fB\-1\fR, \fB\-\-selfalign\fR
Align one sequence against itself in FASTA format via stdin
(Useful for getting protein translation of a nucleotide sequence)
.TP
\fB\-2\fR, \fB\-\-pairalign\fR
Align two sequences in FASTA format via stdin, first one being
genomic and second one being cDNA
.TP
\fB\-\-cmdline\fR=\fI\,STRING\/\fR,STRING
Align these two sequences provided on the command line,
first one being genomic and second one being cDNA
.TP
\fB\-q\fR, \fB\-\-part\fR=\fI\,INT\/\fR/INT
Process only the i\-th out of every n sequences
e.g., 0/100 or 99/100 (useful for distributing jobs
to a computer farm).
.TP
\fB\-\-input\-buffer\-size\fR=\fI\,INT\/\fR
Size of input buffer (program reads this many sequences
at a time for efficiency) (default 1000)
.SS
.SS
Computation options
.TP
\fB\-B\fR, \fB\-\-batch\fR=\fI\,INT\/\fR
Batch mode (default = 2)
         Mode     Offsets       Positions       Genome
           0      see note      mmap            mmap
           1      see note      mmap & preload  mmap
 (default) 2      see note      mmap & preload  mmap & preload
           3      see note      allocate        mmap & preload
           4      see note      allocate        allocate
           5      expand        allocate        allocate

Note: For a single sequence, all data structures use mmap.
If mmap not available and allocate not chosen, then will use fileio (very slow)
.TP
Note about \fB\-\-batch\fR and offsets: Expansion of offsets can be controlled independently by the \fB\-\-expand\-offsets\fR flag.
The \fB\-\-batch\fR=\fI\,5\/\fR option is equivalent
to \fB\-\-batch\fR=\fI\,4\/\fR plus \fB\-\-expand\-offsets\fR=\fI\,1\/\fR
.TP
\fB\-\-expand\-offsets\fR=\fI\,INT\/\fR
Whether to expand the genomic offsets index
Values: 0 (no, default), or 1 (yes).
Expansion gives faster alignment, but requires more memory
.TP
\fB\-\-nosplicing\fR
Turns off splicing (useful for aligning genomic sequences
onto a genome)
.TP
\fB\-\-min\-intronlength\fR=\fI\,INT\/\fR
Min length for one internal intron (default 9).  Below this size,
a genomic gap will be considered a deletion rather than an intron.
.TP
\fB\-K\fR, \fB\-\-intronlength\fR=\fI\,INT\/\fR
Max length for one internal intron (default 1000000)
.TP
\fB\-w\fR, \fB\-\-localsplicedist\fR=\fI\,INT\/\fR
Max length for known splice sites at ends of sequence
(default 2,000,000)
.TP
\fB\-L\fR, \fB\-\-totallength\fR=\fI\,INT\/\fR
Max total intron length (default 2400000)
.TP
\fB\-x\fR, \fB\-\-chimera\-margin\fR=\fI\,INT\/\fR
Amount of unaligned sequence that triggers
search for the remaining sequence (default 30).
Enables alignment of chimeric reads, and may help
with some non\-chimeric reads.  To turn off, set to
zero.
.TP
\fB\-\-no\-chimeras\fR
Turns off finding of chimeras.  Same effect as \fB\-\-chimera\-margin\fR=\fI\,0\/\fR
.TP
\fB\-t\fR, \fB\-\-nthreads\fR=\fI\,INT\/\fR
Number of worker threads
.TP
\fB\-c\fR, \fB\-\-chrsubset\fR=\fI\,string\/\fR
Limit search to given chromosome
.TP
\fB\-z\fR, \fB\-\-direction\fR=\fI\,STRING\/\fR
cDNA direction (sense_force, antisense_force,
sense_filter, antisense_filter,or auto (default))
.TP
\fB\-H\fR, \fB\-\-trimendexons\fR=\fI\,INT\/\fR
Trim end exons with fewer than given number of matches
(in nt, default 12)
.TP
\fB\-\-canonical\-mode\fR=\fI\,INT\/\fR
Reward for canonical and semi\-canonical introns
0=low reward, 1=high reward (default), 2=low reward for
high\-identity sequences and high reward otherwise
.TP
\fB\-\-cross\-species\fR
Use a more sensitive search for canonical splicing, which helps especially
for cross\-species alignments and other difficult cases
.TP
\fB\-\-allow\-close\-indels\fR=\fI\,INT\/\fR
Allow an insertion and deletion close to each other
(0=no, 1=yes (default), 2=only for high\-quality alignments)
.TP
\fB\-\-microexon\-spliceprob\fR=\fI\,FLOAT\/\fR
Allow microexons only if one of the splice site probabilities is
greater than this value (default 0.90)
.TP
\fB\-\-cmetdir\fR=\fI\,STRING\/\fR
Directory for methylcytosine index files (created using cmetindex)
(default is location of genome index files specified using \fB\-D\fR, \fB\-V\fR, and \fB\-d\fR)
.TP
\fB\-\-atoidir\fR=\fI\,STRING\/\fR
Directory for A\-to\-I RNA editing index files (created using atoiindex)
(default is location of genome index files specified using \fB\-D\fR, \fB\-V\fR, and \fB\-d\fR)
.TP
\fB\-\-mode\fR=\fI\,STRING\/\fR
Alignment mode: standard (default), cmet\-stranded, cmet\-nonstranded,
atoi\-stranded, or atoi\-nonstranded.  Non\-standard modes requires you
to have previously run the cmetindex or atoiindex programs on the genome
.TP
\fB\-p\fR, \fB\-\-prunelevel\fR
Pruning level: 0=no pruning (default), 1=poor seqs,
2=repetitive seqs, 3=poor and repetitive
.SS
Output types
.TP
\fB\-S\fR, \fB\-\-summary\fR
Show summary of alignments only
.TP
\fB\-A\fR, \fB\-\-align\fR
Show alignments
.TP
\fB\-3\fR, \fB\-\-continuous\fR
Show alignment in three continuous lines
.TP
\fB\-4\fR, \fB\-\-continuous\-by\-exon\fR
Show alignment in three lines per exon
.TP
\fB\-Z\fR, \fB\-\-compress\fR
Print output in compressed format
.TP
\fB\-E\fR, \fB\-\-exons\fR=\fI\,STRING\/\fR
Print exons ("cdna" or "genomic")
.TP
\fB\-P\fR, \fB\-\-protein_dna\fR
Print protein sequence (cDNA)
.TP
\fB\-Q\fR, \fB\-\-protein_gen\fR
Print protein sequence (genomic)
.TP
\fB\-f\fR, \fB\-\-format\fR=\fI\,INT\/\fR
Other format for output (also note the \fB\-A\fR and \fB\-S\fR options
and other options listed under Output types):
 psl (or 1) = PSL (BLAT) format,
 gff3_gene (or 2) = GFF3 gene format,
 gff3_match_cdna (or 3) = GFF3 cDNA_match format,
 gff3_match_est (or 4) = GFF3 EST_match format,
 splicesites (or 6) = splicesites output (for GSNAP splicing file),
 introns = introns output (for GSNAP splicing file),
 map_exons (or 7) = IIT FASTA exon map format,
 map_ranges (or 8) = IIT FASTA range map format,
 coords (or 9) = coords in table format,
 sampe = SAM format (setting paired_read bit in flag),
 samse = SAM format (without setting paired_read bit)
.SS
Output options
.TP
\fB\-n\fR, \fB\-\-npaths\fR=\fI\,INT\/\fR
Maximum number of paths to show (default 5).  If set to 1, GMAP
will not report chimeric alignments, since those imply
two paths.  If you want a single alignment plus chimeric
alignments, then set this to be 0.
.TP
\fB\-\-suboptimal\-score\fR=\fI\,INT\/\fR
Report only paths whose score is within this value of the
best path.  By default, if this option is not provided,
the program prints all paths found.
.TP
\fB\-O\fR, \fB\-\-ordered\fR
Print output in same order as input (relevant
only if there is more than one worker thread)
.TP
\fB\-5\fR, \fB\-\-md5\fR
Print MD5 checksum for each query sequence
.TP
\fB\-o\fR, \fB\-\-chimera\-overlap\fR
Overlap to show, if any, at chimera breakpoint
.TP
\fB\-\-failsonly\fR
Print only failed alignments, those with no results
.TP
\fB\-\-nofails\fR
Exclude printing of failed alignments
.TP
\fB\-V\fR, \fB\-\-snpsdir\fR=\fI\,STRING\/\fR
Directory for SNPs index files (created using snpindex) (default is
location of genome index files specified using \fB\-D\fR and \fB\-d\fR)
.TP
\fB\-v\fR, \fB\-\-use\-snps\fR=\fI\,STRING\/\fR
Use database containing known SNPs (in <STRING>.iit, built
previously using snpindex) for tolerance to SNPs
.TP
\fB\-\-split\-output\fR=\fI\,STRING\/\fR
Basename for multiple\-file output, separately for nomapping,
uniq, mult, (and chimera, if \fB\-\-chimera\-margin\fR is selected)
.TP
\fB\-\-failed\-input\fR=\fI\,STRING\/\fR
Print completely failed alignments as input FASTA or FASTQ format
to the given file.  If the \fB\-\-split\-output\fR flag is also given, this file
is generated in addition to the output in the .nomapping file.
.TP
\fB\-\-append\-output\fR
When \fB\-\-split\-output\fR or \fB\-\-failedinput\fR is given, this flag will append output
to the existing files.  Otherwise, the default is to create new files.
.TP
\fB\-\-output\-buffer\-size\fR=\fI\,INT\/\fR
Buffer size, in queries, for output thread (default 1000).  When the number
of results to be printed exceeds this size, the worker threads are halted
until the backlog is cleared
.TP
\fB\-F\fR, \fB\-\-fulllength\fR
Assume full\-length protein, starting with Met
.TP
\fB\-a\fR, \fB\-\-cdsstart\fR=\fI\,INT\/\fR
Translate codons from given nucleotide (1\-based)
.TP
\fB\-T\fR, \fB\-\-truncate\fR
Truncate alignment around full\-length protein, Met to Stop
Implies \fB\-F\fR flag.
.TP
\fB\-Y\fR, \fB\-\-tolerant\fR
Translates cDNA with corrections for frameshifts
.SS
Options for SAM output
.TP
\fB\-\-no\-sam\-headers\fR
Do not print headers beginning with '@'
.TP
\fB\-\-sam\-use\-0M\fR
Insert 0M in CIGAR between adjacent insertions and deletions
Required by Picard, but can cause errors in other tools
.TP
\fB\-\-force\-xs\-dir\fR
For RNA\-Seq alignments, disallows XS:A:? when the sense direction
is unclear, and replaces this value arbitrarily with XS:A:+.
May be useful for some programs, such as Cufflinks, that cannot
handle XS:A:?.  However, if you use this flag, the reported value
of XS:A:+ in these cases will not be meaningful.
.TP
\fB\-\-md\-lowercase\-snp\fR
In MD string, when known SNPs are given by the \fB\-v\fR flag,
prints difference nucleotides as lower\-case when they,
differ from reference but match a known alternate allele
.TP
\fB\-\-read\-group\-id\fR=\fI\,STRING\/\fR
Value to put into read\-group id (RG\-ID) field
.TP
\fB\-\-read\-group\-name\fR=\fI\,STRING\/\fR
Value to put into read\-group name (RG\-SM) field
.TP
\fB\-\-read\-group\-library\fR=\fI\,STRING\/\fR
Value to put into read\-group library (RG\-LB) field
.TP
\fB\-\-read\-group\-platform\fR=\fI\,STRING\/\fR
Value to put into read\-group library (RG\-PL) field
.SS
Options for quality scores
.TP
\fB\-\-quality\-protocol\fR=\fI\,STRING\/\fR
Protocol for input quality scores.  Allowed values:
 illumina (ASCII 64\-126) (equivalent to \fB\-J\fR 64 \fB\-j\fR \fB\-31\fR)
 sanger   (ASCII 33\-126) (equivalent to \fB\-J\fR 33 \fB\-j\fR 0)

Default is sanger (no quality print shift)
SAM output files should have quality scores in sanger protocol
Or you can specify the print shift with this flag:
.TP
\fB\-j\fR, \fB\-\-quality\-print\-shift\fR=\fI\,INT\/\fR
Shift FASTQ quality scores by this amount in output
(default is 0 for sanger protocol; to change Illumina input
to Sanger output, select \fB\-31\fR)
.SS
External map file options
.TP
\fB\-M\fR, \fB\-\-mapdir\fR=\fI\,directory\/\fR
Map directory
.TP
\fB\-m\fR, \fB\-\-map\fR=\fI\,iitfile\/\fR
Map file.  If argument is '?' (with the quotes),
this lists available map files.
.TP
\fB\-e\fR, \fB\-\-mapexons\fR
Map each exon separately
.TP
\fB\-b\fR, \fB\-\-mapboth\fR
Report hits from both strands of genome
.TP
\fB\-u\fR, \fB\-\-flanking\fR=\fI\,INT\/\fR
Show flanking hits (default 0)
.TP
\fB\-\-print\-comment\fR
Show comment line for each hit
.SS
Alignment output options
.TP
\fB\-N\fR, \fB\-\-nolengths\fR
No intron lengths in alignment
.TP
\fB\-I\fR, \fB\-\-invertmode\fR=\fI\,INT\/\fR
Mode for alignments to genomic (\-) strand:
0=Don't invert the cDNA (default)
1=Invert cDNA and print genomic (\-) strand
2=Invert cDNA and print genomic (+) strand
.TP
\fB\-i\fR, \fB\-\-introngap\fR=\fI\,INT\/\fR
Nucleotides to show on each end of intron (default=3)
.TP
\fB\-l\fR, \fB\-\-wraplength\fR=\fI\,INT\/\fR
Wrap length for alignment (default=50)
.SS
Filtering output options
.TP
\fB\-\-min\-trimmed\-coverage\fR=\fI\,FLOAT\/\fR
Do not print alignments with trimmed coverage less
this value (default=0.0, which means no filtering)
Note that chimeric alignments will be output regardless
of this filter
.TP
\fB\-\-min\-identity\fR=\fI\,FLOAT\/\fR
Do not print alignments with identity less
this value (default=0.0, which means no filtering)
Note that chimeric alignments will be output regardless
of this filter
.SS
Help options
.TP
\fB\-\-version\fR
Show version
.TP
\fB\-\-help\fR
Show this help message
.SH ENVIRONMENT
.TP
\fBGMAPDB\fR
genome directory (eqivalent to \fB-D\fR)
.SH FILES
.TP
~/.gmaprc
configuration file
.SH AUTHOR
Thomas D. Wu and Colin K. Watanabe
.SH "REPORTING BUGS"
Report bugs to Thomas Wu <twu@gene.com>.
.SH COPYRIGHT
Copyright 2005 Genentech, Inc. All rights reserved.
.SH "SEE ALSO"
\fBgmap_build\fR(1), \fBgsnap\fR(1)
.br