1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html><head><title></title>
<link rel="stylesheet" type="text/css" href="augustus.css">
<script src="tutorial.js" type="text/javascript"></script>
</head><body>
<font size=-1>
Navigate to
<a href="ittrain.html">Iterative Trainingset Construction</a>.
<a href="training.html">Training AUGUSTUS</a>.
<!--<a href="prediction.html">Predicting Genes</a>.-->
</font>
<h1>Tutorial on Gene Prediction with <a href="http://bioinf.uni-greifswald.de/augustus/">AUGUSTUS</a></h1>
<h2>UCSC, June 24th, 2015, <a href="http://www.math-inf.uni-greifswald.de/mathe/index.php/mitarbeiter/382-prof-dr-mario-stanke">Mario Stanke</a></h2>
If you want to follow tomorrow in real-time, download the data and install the software.
In this lab session we practice the most common steps when predicting the protein-coding genes in
a eukaryotic genome with <a href="http://bioinf.uni-greifswald.de/augustus/">AUGUSTUS</a>. We will assume the case of a "new"
genome, for which AUGUSTUS has not been trained before, but will use a well-studied species as example because
example data is readily available and visualization is easier.
<h4>Styles</h4>
<p><span class="assignment">Assignments are in this color</span>. The lazy ones may go through very
fast through this tutorial by just reading these assignments and cutting and pasting the commands
that follow them (more or less).</p>
<p>
<span class="result">Results are in this color</span>.<br></p>
<p>
<a href="javascript:onoff('explain')" class="dlink"><span id="explain" title="explaind" class="dcross">[+]</span>
<span class="dtitle">Details are hidden...</span></a> <br>
<div id="explaind" class="details" style="display:none;">
You don't have to read this. If you get bored with the speed of the tutorial then you can read these details boxes.
</div></p>
<h3>Example Input Data</h3>
All example files are in the <a href="data/" target="_blank">data directory</a>. I recommend
you work directly in this directory.
<h4><i>Drosophila melanogaster</i></h4>
<ul>
<li> <a href="data/chr2L.sm.fa"><tt>chr2L.sm.fa</tt></a>: softmasked chromosome 2L of assembly dm6 of the genome of the fruit fly</li>
<li> <a href="data/rnaseq1.fq"><tt>rnaseq1.fq</tt></a>, <a href="data/rnaseq2.fq"><tt>rnaseq2.fq</tt></a>: paired RNA-Seq reads, an excerpt of SRR1732756 that maps to first 10Mb of chr2L, 2x100bp, HiSeq 2500</li>
</ul>
<h3>For Cheaters: <span class="result">Result Files</span></h3>
You can use the <a href="results/" target="_blank">files in the results directory</a> to catch on if you are behind or to compare your results.
<br>
<h3>Software</h3>
In order to run these examples, you will need to have installed below software. As all important results are in the results folder, you can skip any step/program.
<ul>
<li> <a href="http://bioinf.uni-greifswald.de/augustus/binaries/augustus.current.tar.gz">augustus.current.tar.gz</a>, make sure the binaries <tt>augustus</tt>, <tt>etraining</tt> and <tt>bam2ints</tt> (auxprogs) are compiled
and in your path as well as the <tt>augustus/scripts</tt> directory, put <tt>export AUGUSTUS_CONFIG_PATH=/path/to/your/installation/config/</tt> in your </tt>~/.bashrc</tt>
</li>
<li> <a href="http://code.google.com/p/rna-star/">STAR</a>, an RNA-Seq spliced aligner
<li> <a href="https://github.com/pezmaster31/bamtools"><tt>bamtools</tt></a> (may be a package on your system)
<li> <a href="http://hgdownload.cse.ucsc.edu/admin/exe/"><tt>wigToBigWig</tt></a>
</ul>
<h3>Exercise 1: <span class="assignment">Compile a Training Set</span></h3>
There are several typical <a href="training.html#trainoptions">options for creating a training set</a>
to estimate the parameters of gene finders. We will here go through option 6.
We assume that we have RNA-Seq data only and no substantial homology data. We will reuse an existing parameter set for AUGUSTUS.<br>
<ol>
<li><span class="assignment">Follow the tutorial on <a href="ittrain.html">"Iterative Training Set Construction"</a></span>
and create a training set <tt>genes.gb</tt>.
<li><span class="assignment">Partition <tt>genes.gb</tt></span> into a training set and a holdout test setas described in <a href="training.html#split">1.2 Split gene structure set...</a>.
</ol>
<h3>Exercise 2: <span class="assignment">Train the Coding Regions of AUGUSTUS</span></h3>
Let's name our species "<tt>bug</tt>". Pretending that there was not already a parameters set of AUGUSTUS for
<i>Drosophila</i> (named "<tt>fly</tt>"), we will estimate the parameters from the training set.
<ol>
<li> <span class="assignment">Create a meta parameters file</span> for <tt>bug</tt> as described in <a href="training.html#meta">2. CREATE A META PARAMETERS FILE...</a>
<li> <span class="assignment">Estimate the parameters</span> using your training set as described in <a href="training.html#etraining">3. MAKE AN INITIAL TRAINING</a>
</ol>
<br>
For further tutorial parts on prediction, hint preparation and homology-based training set construction and prediction, see <a href="http://bioinf.uni-greifswald.de/augustus/binaries/tutorial/index.html">lab session tutorial</a>.
</body></html>
|