File: README.TXT

package info (click to toggle)
augustus 3.3.2%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 486,188 kB
  • sloc: cpp: 51,969; perl: 20,926; ansic: 1,251; makefile: 935; python: 120; sh: 118
file content (45 lines) | stat: -rw-r--r-- 1,894 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
This directory contains toy data to test the pipeline autoAug.pl for training and running
AUGUSTUS.

genome.fa          region 2000001..3000000 of chromosome I from C.elegans
cdna.fa            sample cDNA sequences (ESTs)
traingenes.gff     coordinates of nGASP training genes falling in region of genome.fa
                   coding regions (CDS) only
traingenes.gb      genes and sequences together in genbank format
                   this file was created from genome.fa and traingenes.gff with the command
		   gff2gbSmallDNA.pl traingenes.gff genome.fa 1000 traingenes.gb
cdna.psl           EST alignments in a format produced by BLAT and GMAP

1) Try whether AUGUSTUS and BLAT are installed correctly by executing


autoAug.pl --species=xyz --genome=genome.fa --trainingset=traingenes.gb --cdna=cdna.fa
	   --singleCPU -v -v -v --opt=0 --useexisting --noutr

This will run the training and prediction pipeline on a single machine without meta parameter optimization
and without training and predicting untranslated regions.


2) For a first test, whether you have installed everything correctly, including PASA, do

autoAug.pl --species=test --genome=genome.fa --cdna=cdna.fa --singleCPU -v -v -v --pasa --useexisting --opt=0

This will run the complete pipeline, including training set generation with PASA.
If something fails, you can fix the problem, delete any incomplete files and rerun the same command to resume.
This command takes about 40m on my computer to finish.


3) Here is an example for how to predict genes genome-wide using a cluster:

autoAugPred.pl --species=elegans -g genome.fa --hints=hints.E.gff

Then follow the commands, and submit the job(s) on your cluster. E.g. do

cd x/y/z/autoAugPred_hints/shells/
qsub aug1

When they are finished, then rerun

autoAugPred.pl --species=elegans -g genome.fa --hints=hints.E.gff --useexisting

from the original directory.