File: gthbssmtrain.1.adoc

package info (click to toggle)
genomethreader 1.7.3%2Bdfsg-10
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 46,568 kB
  • sloc: ansic: 90,168; ruby: 1,769; makefile: 573; sh: 112; perl: 105
file content (107 lines) | stat: -rw-r--r-- 2,845 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# gthbssmtrain(1)

## NAME

gthbssmtrain - train splice site model

## SYNOPSIS

*gthbssmtrain* [option ...] GFF3_file

## DESCRIPTION

Create BSSM training data from annotation given in GFF3_file.

## OPTIONS

*-outdir*::
  set name of output directory to which the training files are
                written
                default: training_data

*-gcdonor*::
  extract training data for GC donor sites
                default: yes

*-filtertype*::
  set type of features to used for filtering (usually 'exon' or
                'CDS')
                default: exon

*-goodexoncount*::
  set the minimum number of good exons a feature must have to be
                included into the training data
                default: 1

*-cutoff*::
  set the minimum score an exon must have to count towards the
                ``good exon count'' (exons without a score count as good)
                default: 1.00

*-extracttype*::
  set type of features to be extracted as exons (usually 'exon' or
                'CDS')
                default: CDS

*-seqfile*::
  set the sequence file from which to take the sequences
                default: undefined

*-encseq*::
  set the encoded sequence indexname from which to take the
                sequences
                default: undefined

*-seqfiles*::
  set the sequence files from which to extract the features
                use '--' to terminate the list of sequence files 

*-matchdesc*::
  search the sequence descriptions from the input files for the
                desired sequence IDs (in GFF3), reporting the first match
                default: no

*-matchdescstart*::
  exactly match the sequence descriptions from the input files for
                the desired sequence IDs (in GFF3) from the beginning to the
                first whitespace
                default: no

*-usedesc*::
  use sequence descriptions to map the sequence IDs (in GFF3) to
                actual sequence entries.
                If a description contains a sequence range (e.g.,
                III:1000001..2000000), the first  part is used as sequence ID
                ('III') and the first range position as offset ('1000001')
                default: no

*-regionmapping*::
  set file containing sequence-region to sequence file mapping
                default: undefined

*-seed*::
  set seed for random number generator manually
                0 generates a seed from the current time and the process id
                default: 0

*-v*::
  be verbose
                default: no

*-gzip*::
  write gzip compressed output files
                default: no

*-bzip2*::
  write bzip2 compressed output files
                default: no

*-force*::
  force writing to output files
                default: no

*-help*::
  display help and exit

*-version*::
  display version information and exit