File: gthbssmtrain.1.adoc

package info (click to toggle)

genomethreader 1.7.3%2Bdfsg-10

links: PTS, VCS
area: main
in suites: trixie
size: 46,568 kB
sloc: ansic: 90,168; ruby: 1,769; makefile: 573; sh: 112; perl: 105

file content (107 lines) | stat: -rw-r--r-- 2,845 bytes

parent folder | download | duplicates (4)

# gthbssmtrain(1)

## NAME

gthbssmtrain - train splice site model

## SYNOPSIS

*gthbssmtrain* [option ...] GFF3_file

## DESCRIPTION

Create BSSM training data from annotation given in GFF3_file.

## OPTIONS

*-outdir*::
  set name of output directory to which the training files are
                written
                default: training_data

*-gcdonor*::
  extract training data for GC donor sites
                default: yes

*-filtertype*::
  set type of features to used for filtering (usually 'exon' or
                'CDS')
                default: exon

*-goodexoncount*::
  set the minimum number of good exons a feature must have to be
                included into the training data
                default: 1

*-cutoff*::
  set the minimum score an exon must have to count towards the
                ``good exon count'' (exons without a score count as good)
                default: 1.00

*-extracttype*::
  set type of features to be extracted as exons (usually 'exon' or
                'CDS')
                default: CDS

*-seqfile*::
  set the sequence file from which to take the sequences
                default: undefined

*-encseq*::
  set the encoded sequence indexname from which to take the
                sequences
                default: undefined

*-seqfiles*::
  set the sequence files from which to extract the features
                use '--' to terminate the list of sequence files 

*-matchdesc*::
  search the sequence descriptions from the input files for the
                desired sequence IDs (in GFF3), reporting the first match
                default: no

*-matchdescstart*::
  exactly match the sequence descriptions from the input files for
                the desired sequence IDs (in GFF3) from the beginning to the
                first whitespace
                default: no

*-usedesc*::
  use sequence descriptions to map the sequence IDs (in GFF3) to
                actual sequence entries.
                If a description contains a sequence range (e.g.,
                III:1000001..2000000), the first  part is used as sequence ID
                ('III') and the first range position as offset ('1000001')
                default: no

*-regionmapping*::
  set file containing sequence-region to sequence file mapping
                default: undefined

*-seed*::
  set seed for random number generator manually
                0 generates a seed from the current time and the process id
                default: 0

*-v*::
  be verbose
                default: no

*-gzip*::
  write gzip compressed output files
                default: no

*-bzip2*::
  write bzip2 compressed output files
                default: no

*-force*::
  force writing to output files
                default: no

*-help*::
  display help and exit

*-version*::
  display version information and exit