File: maf_parse.1

package info (click to toggle)
phast 1.5%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 13,008 kB
  • sloc: ansic: 54,195; makefile: 358; sh: 337; perl: 321
file content (172 lines) | stat: -rw-r--r-- 5,880 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
.TH MAF_PARSE "1" "May 2016" "maf_parse 1.4" "User Commands"
.SH NAME
maf_parse \- Reads a MAF file and perform various operations on it.
.SH DESCRIPTION
Reads a MAF file and perform various operations on it.
Performs parsing operations block\-by\-block whenever possible,
rather than storing entire alignment in memory.
Can extract a sub\-alignment from an alignment (by row
or by column).
Can extract features given GFF, BED, or
genepred file.
Can also extract sub\-features such as CDS1,2,3
or 4d sites.
Can perform various functions such as gap
stripping or re\-ordering of sequences.
Capable of reading and
.IP
writing in a few common formats, but will not load input or output
alignments into memory if output format is MAF.
.SH OPTIONS
.SS Output format
.HP
\fB\-\-out\-format\fR, \fB\-o\fR MAF|PHYLIP|FASTA|MPM|SS
(Default MAF).
Output file format.  SS format is only
available un\-ordered.
Note that some options, which involve
reversing alignments based on strand, or stripping gaps,
cannot be output in MAF format and use FASTA by default.
Also note that when output format is not MAF, the entire
output must be loaded into memory.
.HP
\fB\-\-pretty\fR, \fB\-p\fR
.IP
Pretty\-print alignment (use '.' when character matches
corresponding character in first sequence).  Ignored if
\fB\-\-out\-format\fR SS is selected.
.SS Obtaining sub\-alignments and re\-ordering rows
.HP
\fB\-\-start\fR, \fB\-s\fR <start_col>
Start index of sub\-alignment (indexing starts with 1).
Coordinates are in terms of the reference sequence unless
the \fB\-\-no\-refseq\fR option is used, in which case they are in
terms of alignment columns.  Default is 1.
.HP
\fB\-\-end\fR, \fB\-e\fR <end_col>
End index of sub\-alignment.
Default is length of alignment.
.IP
Coordinates defined as in \fB\-\-start\fR option, above.
.HP
\fB\-\-seqs\fR, \fB\-l\fR <seq_list>
.IP
Comma\-separated list of sequences to include (default)
exclude (if \fB\-\-exclude\fR).  Indicate by sequence number or name
(numbering starts with 1 and is evaluated *after* \fB\-\-order\fR is
applied).
.HP
\fB\-\-exclude\fR, \fB\-x\fR
Exclude rather than include specified sequences.
.HP
\fB\-\-order\fR, \fB\-O\fR <name_list>
.IP
Change order of rows in alignment to match sequence names
specified in name_list.  The first name in the alignment becomes
the reference sequence.
.HP
\fB\-\-no\-refseq\fR, \fB\-n\fR
Do not assume first sequence in MAF is refseq.
Instead, use
coordinates
given by absolute position in alignment (starting
from 1).
.SS Splitting into multiple MAFs by length
.HP
\fB\-\-split\fR, \fB\-S\fR length
.IP
Split MAF into pieces by length, and puts output in
outRootX.maf, where X=1,2,...,numPieces.  outRoot can be
modified with \fB\-\-out\-root\fR, and the minimum number of digits in X
can be modified with \fB\-\-out\-root\-digits\fR.
Splits between blocks, so that each output file does not exceed
specified length.  By default, length is counted by distance
spanned in alignment by refseq, unless \fB\-\-no\-refseq\fR is specified.
.HP
\fB\-\-out\-root\fR, \fB\-r\fR <name>
.IP
Filename root for output files produced by \fB\-\-split\fR (default
"maf_parse").
.HP
\fB\-\-out\-root\-digits\fR, \fB\-d\fR <numdigits>
(for use with \fB\-\-split\fR).
The minimum number of digits used to
.IP
index each output file produced by split.
.SS Extracting features from MAF
.HP
\fB\-\-features\fR, \fB\-g\fR <fname>
Annotations file.
May be GFF, BED, or genepred format.
.IP
Coordinates assumed to be in frame of first sequence of
alignment (reference sequence).  By default, outputs subset of
MAF which are labeled in annotations file.  But can be used with
\fB\-\-by\-category\fR, \fB\-\-by\-group\fR, and/or \fB\-\-do\-cats\fR to split MAF by
annotation type.  Or if used with \fB\-\-mask\-features\fR, is only used
to determine regions to mask.  Implies \fB\-\-strip\-i\-lines\fR,
\fB\-\-strip\-e\-lines\fR
.HP
\fB\-\-by\-category\fR, \fB\-L\fR
.TP
(Requires \fB\-\-features\fR).
Split by category, as defined by
annotations file and (optionally) category map (see \fB\-\-catmap\fR).
.HP
\fB\-\-do\-cats\fR, \fB\-C\fR <cat_list>
(For use with \fB\-\-by\-category\fR) Output sub\-alignments for only the
specified categories.
.HP
\fB\-\-catmap\fR, \fB\-c\fR <fname>|<string>
.IP
(Optionally use with \fB\-\-by\-category\fR) Mapping of feature types to
category numbers.  Can either give a filename or an "inline"
description of a simple category map, e.g.,
.HP
\fB\-\-catmap\fR "NCATS = 3 ; CDS 1\-3" or
.HP
\fB\-\-catmap\fR "NCATS = 1; UTR 1".
.HP
\fB\-\-by\-group\fR, \fB\-P\fR <tag>
(Requires \fB\-\-features\fR).
Split by groups in annotation file, as
defined by specified tag.
.SS Masking by quality score
.HP
\fB\-\-mask\-bases\fR, \fB\-b\fR <qscore>
Mask all bases with quality score <= n.
Note that n is in the
same units as displayed in the MAF (ranging from 0\-9), and
represents min(9, floor(PHRED_score/5)).  Bases without any
quality score will not be masked.
.HP
\fB\-\-masked\-file\fR, \fB\-m\fR <filename>
(For use with \fB\-\-mask\-bases\fR).
Write a file containing all the
regions masked for low quality.
The file will be in 0\-based
coordinates relative to the refseq, with an additional column
giving the name of the species masked.  Note that low\-quality bases
masked at alignment columns with a gap in the reference sequence
may not be represented in the output file.
.HP
\fB\-\-mask\-features\fR \fB\-M\fR <spec>
(Requires \fB\-\-features\fR).
Mask all bases annotated in features in the
given species (can be a comma\-delimited list of species).
Note that
.IP
coordinates are always in terms of refseq, even if a different species
is being masked.
.SS Other
.HP
\fB\-\-strip\-i\-lines\fR, \fB\-I\fR
.IP
Remove lines in MAF starting with i.
.HP
\fB\-\-strip\-e\-lines\fR, \fB\-E\fR
Remove lines in MAF starting with e.
.HP
\fB\-\-help\fR, \fB\-h\fR
.IP
Print this help message.