1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224
|
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH GFFREAD "1" "June 2019" "gffread 0.11.2" "User Commands"
.SH NAME
gffread \- GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction
.SH SYNOPSIS
.B gffread
<input_gff> [\-g <genomic_seqs_fasta> | <dir>][\-s <seq_info.fsize>]
[\-o <outfile.gff>] [\-t <tname>] [\-r [[<strand>]<chr>:]<start>..<end> [\-R]]
[\-CTVNJMKQAFPGUBHZWTOLE] [\-w <exons.fa>] [\-x <cds.fa>] [\-y <tr_cds.fa>]
[\-i <maxintron>] [\-\-sort\-by <refseq_list.txt>]
.SH DESCRIPTION
.IP
Filter and convert GFF3/GTF2 records, extract corresponding sequences etc.
By default (i.e. without \fB\-O\fR) only process transcripts, ignore other features.
.IP
<input_gff> is a GFF file, use '\-' for stdin
.SH OPTIONS
.TP
\fB\-i\fR
discard transcripts having an intron larger than <maxintron>
.TP
\fB\-l\fR
discard transcripts shorter than <minlen> bases
.TP
\fB\-r\fR
only show transcripts overlapping coordinate range <start>..<end>
(on chromosome/contig <chr>, strand <strand> if provided)
.TP
\fB\-R\fR
for \fB\-r\fR option, discard all transcripts that are not fully
contained within the given range
.TP
\fB\-U\fR
discard single\-exon transcripts
.TP
\fB\-C\fR
coding only: discard mRNAs that have no CDS features
.HP
\fB\-\-nc\fR non\-coding only: discard mRNAs that have CDS features
.HP
\fB\-\-ignore\-locus\fR : discard locus features and attributes found in the input
.TP
\fB\-A\fR
use the description field from <seq_info.fsize> and add it
as the value for a 'descr' attribute to the GFF record
.TP
\fB\-s\fR
<seq_info.fsize> is a tab\-delimited file providing this info
for each of the mapped sequences:
<seq\-name> <seq\-length> <seq\-description>
(useful for \fB\-A\fR option with mRNA/EST/protein mappings)
.PP
Sorting: (by default, chromosomes are kept in the order they were found)
.HP
\fB\-\-sort\-alpha\fR : chromosomes (reference sequences) are sorted alphabetically
.HP
\fB\-\-sort\-by\fR : sort the reference sequences by the order in which their
.IP
names are given in the <refseq.lst> file
.SS "Misc options:"
.TP
\fB\-F\fR
attempt to preserve all GFF attributes preservation
.HP
\fB\-\-keep\-exon\-attrs\fR : for \fB\-F\fR option, do not attempt to reduce redundant
.IP
exon/CDS attributes
.TP
\fB\-G\fR
do not keep exon attributes, move them to the transcript feature
(for GFF3 output)
.HP
\fB\-\-keep\-genes\fR : in transcript\-only mode (default), also preserve gene records
.HP
\fB\-\-keep\-comments\fR: for GFF3 input/output, try to preserve comments
.TP
\fB\-O\fR
process other non\-transcript GFF records (by default non\-transcript
records are ignored)
.TP
\fB\-V\fR
discard any mRNAs with CDS having in\-frame stop codons (requires \fB\-g\fR)
.TP
\fB\-H\fR
for \fB\-V\fR option, check and adjust the starting CDS phase
if the original phase leads to a translation with an
in\-frame stop codon
.TP
\fB\-B\fR
for \fB\-V\fR option, single\-exon transcripts are also checked on the
opposite strand (requires \fB\-g\fR)
.TP
\fB\-P\fR
add transcript level GFF attributes about the coding status of each
transcript, including partialness or in\-frame stop codons (requires \fB\-g\fR)
.HP
\fB\-\-add\-hasCDS\fR : add a "hasCDS" attribute with value "true" for transcripts
.IP
that have CDS features
.HP
\fB\-\-adj\-stop\fR stop codon adjustment: enables \fB\-P\fR and performs automatic
.IP
adjustment of the CDS stop coordinate if premature or downstream
.TP
\fB\-N\fR
discard multi\-exon mRNAs that have any intron with a non\-canonical
splice site consensus (i.e. not GT\-AG, GC\-AG or AT\-AC)
.TP
\fB\-J\fR
discard any mRNAs that either lack initial START codon
or the terminal STOP codon, or have an in\-frame stop codon
(i.e. only print mRNAs with a complete CDS)
.HP
\fB\-\-no\-pseudo\fR: filter out records matching the 'pseudo' keyword
.HP
\fB\-\-in\-bed\fR: input should be parsed as BED format (automatic if the input
.IP
filename ends with .bed*)
.HP
\fB\-\-in\-tlf\fR: input GFF\-like one\-line\-per\-transcript format without exon/CDS
.IP
features (see \fB\-\-tlf\fR option below); automatic if the input
filename ends with .tlf)
.SS "Clustering:"
.HP
\fB\-M\fR/\-\-merge : cluster the input transcripts into loci, discarding
.IP
"duplicated" transcripts (those with the same exact introns
and fully contained or equal boundaries)
.HP
\fB\-d\fR <dupinfo> : for \fB\-M\fR option, write duplication info to file <dupinfo>
.HP
\fB\-\-cluster\-only\fR: same as \fB\-M\fR/\-\-merge but without discarding any of the
.IP
"duplicate" transcripts, only create "locus" features
.TP
\fB\-K\fR
for \fB\-M\fR option: also discard as redundant the shorter, fully contained
.IP
transcripts (intron chains matching a part of the container)
.TP
\fB\-Q\fR
for \fB\-M\fR option, no longer require boundary containment when assessing
redundancy (can be combined with \fB\-K\fR); only introns have to match for
multi\-exon transcripts, and >=80% overlap for single\-exon transcripts
.TP
\fB\-Y\fR
for \fB\-M\fR option, enforce \fB\-Q\fR but also discard overlapping single\-exon
transcripts, even on the opposite strand (can be combined with \fB\-K\fR)
.SS "Output options:"
.HP
\fB\-\-force\-exons\fR: make sure that the lowest level GFF features are considered
.IP
"exon" features
.HP
\fB\-\-gene2exon\fR: for single\-line genes not parenting any transcripts, add an
.IP
exon feature spanning the entire gene (treat it as a transcript)
.TP
\fB\-D\fR
decode url encoded characters within attributes
.TP
\fB\-Z\fR
merge very close exons into a single exon (when intron size<4)
.TP
\fB\-g\fR
full path to a multi\-fasta file with the genomic sequences
for all input mappings, OR a directory with single\-fasta files
(one per genomic sequence, with file names matching sequence names)
.TP
\fB\-w\fR
write a fasta file with spliced exons for each GFF transcript
.TP
\fB\-x\fR
write a fasta file with spliced CDS for each GFF transcript
.TP
\fB\-y\fR
write a protein fasta file with the translation of CDS for each record
.TP
\fB\-W\fR
for \fB\-w\fR and \fB\-x\fR options, write in the FASTA defline the exon
coordinates projected onto the spliced sequence;
for \fB\-y\fR option, write transcript attributes in the FASTA defline
.TP
\fB\-S\fR
for \fB\-y\fR option, use '*' instead of '.' as stop codon translation
.TP
\fB\-L\fR
Ensembl GTF to GFF3 conversion (implies \fB\-F\fR; should be used with \fB\-m\fR)
.TP
\fB\-m\fR
<chr_replace> is a name mapping table for converting reference
sequence names, having this 2\-column format:
<original_ref_ID> <new_ref_ID>
WARNING: all GFF records on reference sequences whose original IDs
are not found in the 1st column of this table will be discarded!
.TP
\fB\-t\fR
use <trackname> in the 2nd column of each GFF/GTF output line
.TP
\fB\-o\fR
print the GFF records to <outfile.gff> (those that passed any
given filters). Use \fB\-o\-\fR to enable printing of to stdout
.TP
\fB\-T\fR
for \fB\-o\fR, output will be GTF instead of GFF3
.HP
\fB\-\-bed\fR for \fB\-o\fR, output BED format instead of GFF3
.HP
\fB\-\-tlf\fR for \fB\-o\fR, output "transcript line format" which is like GFF
.IP
but exons, CDS features and related data are stored as GFF
attributes in the transcript feature line, like this:
.IP
exoncount=N;exons=<exons>;CDSphase=<N>;CDS=<CDScoords>
.IP
<exons> is a comma\-delimited list of exon_start\-exon_end coordinates;
<CDScoords> is CDS_start:CDS_end coordinates or a list like <exons>;
.HP
\fB\-v\fR,\-E expose (warn about) duplicate transcript IDs and other potential
.IP
problems with the given GFF/GTF records
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
|