1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
|
# gthbssmtrain(1)
## NAME
gthbssmtrain - train splice site model
## SYNOPSIS
*gthbssmtrain* [option ...] GFF3_file
## DESCRIPTION
Create BSSM training data from annotation given in GFF3_file.
## OPTIONS
*-outdir*::
set name of output directory to which the training files are
written
default: training_data
*-gcdonor*::
extract training data for GC donor sites
default: yes
*-filtertype*::
set type of features to used for filtering (usually 'exon' or
'CDS')
default: exon
*-goodexoncount*::
set the minimum number of good exons a feature must have to be
included into the training data
default: 1
*-cutoff*::
set the minimum score an exon must have to count towards the
``good exon count'' (exons without a score count as good)
default: 1.00
*-extracttype*::
set type of features to be extracted as exons (usually 'exon' or
'CDS')
default: CDS
*-seqfile*::
set the sequence file from which to take the sequences
default: undefined
*-encseq*::
set the encoded sequence indexname from which to take the
sequences
default: undefined
*-seqfiles*::
set the sequence files from which to extract the features
use '--' to terminate the list of sequence files
*-matchdesc*::
search the sequence descriptions from the input files for the
desired sequence IDs (in GFF3), reporting the first match
default: no
*-matchdescstart*::
exactly match the sequence descriptions from the input files for
the desired sequence IDs (in GFF3) from the beginning to the
first whitespace
default: no
*-usedesc*::
use sequence descriptions to map the sequence IDs (in GFF3) to
actual sequence entries.
If a description contains a sequence range (e.g.,
III:1000001..2000000), the first part is used as sequence ID
('III') and the first range position as offset ('1000001')
default: no
*-regionmapping*::
set file containing sequence-region to sequence file mapping
default: undefined
*-seed*::
set seed for random number generator manually
0 generates a seed from the current time and the process id
default: 0
*-v*::
be verbose
default: no
*-gzip*::
write gzip compressed output files
default: no
*-bzip2*::
write bzip2 compressed output files
default: no
*-force*::
force writing to output files
default: no
*-help*::
display help and exit
*-version*::
display version information and exit
|