1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
|
.TH GMAP_SETUP "1" "Nov 2011" "GMAP 2011-11-30" "User Commands"
.SH NAME
gmap_setup \- create a genome database for GMAP or GSNAP
.SH SYNOPSIS
.B gmap_setup
\fB-d\fR\fIgenomename\fR [\fB-D\fR\fIdestdir\fR]
[\fB-o\fR\fIMakefile\fR] \fIFASTA\fR
.SH OPTIONS
.TP
\fB\-d\fR
genome name
.TP
\fB\-D\fR
destination directory for installation (defaults to gmapdb directory specified at configure time)
.TP
\fB\-o\fR
name of output Makefile (default is "Makefile.<genome>")
.TP
\fB\-M\fR
use coordinates from an .md file (e.g., seq_contig.md file from NCBI)
.TP
\fB\-C\fR
try to parse chromosomal coordinates from each FASTA header
.TP
\fB\-E\fR
interpret argument as a command, instead of a list of FASTA files
.TP
\fB\-O\fR
order chromosomes in numeric/alphabetic order (0 = no, 1 = yes (default))
.SS Advanced options
.TP
\fB\-W\fR
write some output directly to file, instead of using RAM (use only if RAM is limited)
.TP
\fB\-q\fR
GMAP indexing interval (default: 3 nt)
.TP
\fB\-Q\fR
PMAP indexing interval (default: 6 aa)
.SH DESCRIPTION
.PP
If you want to treat each FASTA entry as a separate chromosome (either
because it is in fact an entire chromosome or because you have contigs
without any chromosomal information), you can simply call gmap_setup
like this:
.IP
gmap_setup \fB\-d\fR <genome> <fasta_file>...
.PP
The accession of each FASTA header (the word following each ">") will
be the name of each chromosome. GMAP can handle an unlimited number
of "chromosomes", with arbitrarily long names. In this way, GMAP
could be used as a general search program for near\-identity matches
against a FASTA file.
.TP
\fB\-M\fR and \fB\-C\fR
If your sequences represent contigs that have
mapping information to specific chromosomal regions, then you can
have gmap_setup try to read each header to determine its chromosomal
region (the \fB\-C\fR flag) or read an .md file that contains information
about chromosomal regions (the \fB\-M\fR flag). The .md files are often
provided in NCBI releases, but since the formats change often,
gmap_setup will prompt you to make sure it parses it correctly.
.TP
\fB\-E\fR
If you need to pre\-process the FASTA files before using
these programs, perhaps because they are compressed or because you
need to insert chromosomal information in the header lines, you can
specify a command instead of multiple fasta_files, like these
examples:
gmap_setup \fB\-d\fR <genome> \fB\-E\fR 'gunzip \fB\-c\fR genomefiles.gz'
gmap_setup \fB\-d\fR <genome> \fB\-E\fR 'cat *.fa | ./add\-chromosomal\-info.pl'
.TP
\fB\-W\fR
The gmap_setup process works best if you have a
computer with enough RAM to hold the entire genome (e.g., 3
gigabytes for a human\- or mouse\-sized genome). Since the resulting
genome files work across all machine architectures, you can find any
machine with sufficient RAM to build the genome files and then
transfer the files to another machine. (GMAP itself runs fine on
machines with limited RAM.) If you cannot find any machine with
sufficient RAM for gmap_setup, you can run the program with the \fB\-W\fR
flag to write the files directly, but this can be very slow.
.TP
\fB\-q\fR and \fB\-Q\fR
If you specify a smaller interval (for example,
3 for the GMAP interval), you can create a higher\-resolution
database, which can be useful for mapping small oligomers (smaller
than 18 nt). However, the corresponding genome index files will be
larger (twice as big if you specify \fB\-q\fR 3). These index files may
exceed the 2 gigabyte file offset limit on some computers, and will
therefore fail to work on those computers.
.SH AUTHOR
Thomas D. Wu and Colin K. Watanabe
.SH "REPORTING BUGS"
Report bugs to Thomas Wu <twu@gene.com>.
.SH COPYRIGHT
Copyright 2005 Genentech, Inc. All rights reserved.
.SH "SEE ALSO"
\fBgmap\fR(1), \fBgsnap\fR(1)
.br
http://research-pub.gene.com/gmap/
|