
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<!-- Version: Multiflex-5.4 / About -->
<!-- Type: Design with sidebar -->
<!-- Date: March 13, 2008 -->
<!-- Design: www.1234.info -->
<!-- License: Fully open source without restrictions. -->
<!-- Please keep footer credits with the words -->
<!-- "Design by 1234.info". Thank you! -->
<head>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-25486066-1']);
_gaq.push(['_trackPageview']);
(function() {
})();
</script>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="3600" />
<meta name="revisit-after" content="2 days" />
<meta name="robots" content="index,follow" />
<meta name="publisher" content="Your publisher infos here" />
<meta name="copyright" content="Copyright (c) 2011 Adam Roberts" />
<meta name="author" content="Designed by www.1234.info / Modified: Adam Roberts" />
<meta name="distribution" content="global" />
<meta name="image" content="http://bio.math.berkeley.edu/eXpress/img/logo.png" />
<meta name="description" content="eXpress is a general quantification tool for target DNA/RNA sequences. While its primary use currently is RNA-Seq it has the potential for applications in many other areas including as allele-specific expression and metgenomics. What makes eXpress different is that it is an online (or streaming) algorithm, meaning it only makes one pass through the data. This allows it to be very light-weight and efficient using a constant amount of memory and time linear in the number of sequenced fragments being processed. Furthermore, it accepts piped SAM/BAM input, allowing users to avoid storing extremely large alignment files. eXpress models fragment biases, fragment lengths, and errors, allowing it to also be one of the most accurate quantification methods available." />
<meta name="keywords" content="RNA-Seq, Genomics, transcript,quantification" />
<link rel="stylesheet" type="text/css" media="screen,projection,print" href="./css/mf54_reset.css" />
<link rel="stylesheet" type="text/css" media="screen,projection,print" href="./css/mf54_grid.css" />
<link rel="stylesheet" type="text/css" media="screen,projection,print" href="./css/mf54_content.css" />
<link rel="icon" type="image/x-icon" href="./img/favicon.ico" />
<script language="Javascript">
logo1=new Image
logo1.src="img/logo.png"
logo2=new Image
logo2.src="img/logo_yellow.png"
</script>
<title>eXpress • Getting Started</title>
</head>
<link rel="shortcut icon" href="favicon.ico" />
<!-- Global IE fix to avoid layout crash when single word size wider than column width -->
<!-- Following line MUST remain as a comment to have the proper effect -->
<!--[if IE]><style type="text/css"> body {word-wrap: break-word;}</style><![endif]-->
<body>
<!-- CONTAINER FOR ENTIRE PAGE -->
<div class="container">
<!-- A. HEADER -->
<div class="corner-page-top"></div>
<div class="header">
<div class="header-top">
<!-- A.1 SITENAME -->
<div class="sitelogo">
<ul>
<li><a href="#" onMouseOver="document.logo.src=logo2.src" onMouseOut="document.logo.src=logo1.src"><img name="logo" src="img/logo.png"/></a></li>
</ul>
</div>
<div class="sitename">
<h1><a href="#">eXpress</a></h1>
<h2><i>Streaming</i> quantification for high-throughput sequencing</h2>
</div>
<!-- A.2 BUTTON NAVIGATION -->
<div class="navbutton">
<ul>
<li><a href="http://www.berkeley.edu"><img src="img/berkeley_seal.gif"/></a></li>
</ul>
</div>
</div>
<!-- A.4 BREADCRUMB and SEARCHFORM -->
<div class="header-bottom">
<!-- Search form -->
<div class="searchform" id="cse-search-form" style="padding-top:4px; width:30%;">Loading</div>
<script type="text/javascript">
}, true);
</script>
<link rel="stylesheet" href="css/googlesearch.css" type="text/css" />
</div>
</div>
<div class="corner-page-bottom"></div>
<!-- B. NAVIGATION BAR -->
<div class="corner-page-top"></div>
<div class="navbar">
<!-- Navigation item -->
<ul>
<li><a href="index.html">Home</a></li>
</ul>
<!-- Navigation item -->
<ul>
<li><a href="overview.html">About</a></li>
</ul>
<!-- Navigation item -->
<ul>
<li><a href="#">Download<!--[if IE 7]><!--></a><!--<![endif]-->
<!--[if lte IE 6]><table><tr><td><![endif]-->
<ul>
<li><a href=downloads/express-1.5.1/express-1.5.1-macosx_x86_64.tgz onClick="_gaq.push(['_trackEvent', 'Downloads', 'Mac', 'Tutorial']);" target="_blank">Mac OS X (64-bit)</a></li>
<li><a href=downloads/express-1.5.1/express-1.5.1-linux_x86_64.tgz onClick="_gaq.push(['_trackEvent', 'Downloads', 'Linux', 'Tutorial']);" target="_blank">Linux (64-bit)</a></li>
<li><a href=downloads/express-1.5.1/express-1.5.1-win32_x86_64.zip onClick="_gaq.push(['_trackEvent', 'Downloads', 'Windows', 'Tutorial']);" target="_blank">Windows (64-bit)</a></li>
<li><a href=downloads/express-1.5.1/express-1.5.1-src.tgz onClick="_gaq.push(['_trackEvent', 'Downloads', 'Source', 'Tutorial']);" target="_blank">Source</a></li>
<li><a href="downloads" onClick="_gaq.push(['_trackEvent', 'Downloads', 'Previous','Tutorial']);">Previous Versions</a></li>
</ul>
<!--[if lte IE 6]></td></tr></table></a><![endif]-->
</li>
</ul>
<ul>
<li><a href="tutorial.html">Getting Started</a></li>
</ul>
<ul>
<li><a href="https://github.com/adarob/eXpress">Source</a></li>
</ul>
<ul>
<li><a href="manual.html">Manual</a></li>
</ul>
<ul>
<li><a href="faq.html">FAQ</a></li>
</ul>
</div>
<!-- C. MAIN SECTION -->
<div class="main">
<h1 class="pagetitle">Getting Started</h1>
<!-- C.1 CONTENT -->
<div class="content">
<!-- CONTENT CELL -->
<div class="corner-content-1col-top"></div>
<div class="content-1col-nobox">
<h1 id="install">Installation</h1>
<h2 id="precomp">Installing a pre-compiled binary release</h2>
<p>In order to make it easy to install eXpress, we provide a few binary packages that save you the trouble of having to compile from source. To use the binary packages for OSX/Linux (Windows), simply download the appropriate one for your machine, untar (unzip) it, and make sure the <code>express(.exe)</code> binary is in a directory in your <tt>PATH</tt> environment variable.</p>
<p>If using the Windows binary, you will also need to install the <a href="http://www.microsoft.com/download/en/details.aspx?id=5555">Visual C++ 2010 Runtime Library</a>.</p>
<h2 id="src">Installing from source</h2>
<p><b>Note:</b> Installing from source is often an unnecessary hassle. If a binary is available for your system, it is highly recommended that you use it. If one is not available, please send us a request to <a href="mailto:ask.xprs@gmail.com">ask.xprs@gmail.com</a>.</p>
<p>The instructions below include commands that are compatible with Unix/Linux/Mac. If compiling on a Windows machine, use the appropriate DOS commands instead. Assuming you are using the Visual C++ compiler, you must type your commands from the Visual Studio Command Prompt.
<h4>• Install C++ Compiler (if not already available)</h4>
<p>While most Linux systems will already have GCC installed, Mac OS X and Windows do not have a C++ compiler installed by default. Free options include <a href=http://developer.apple.com/tools/xcode>XCode</a> for Mac OS X and <a href=http://www.microsoft.com/visualstudio/en-us/products/2010-editions/visual-cpp-express>Visual C++ Express</a> for Windows.</p>
<h4>• Download and extract the source</h4>
<p>First you must download source code from the Download menu at the top of this page. Untar the file using:</p>
<ol><li><pre class="sc"><code>$ tar -xf express-<EXPRESS_VERSION>-src.tgz</code></pre></li></ol>
<p>From now on we refer to the path to the directory that is created as <code><YOUR_EXPRESS_DIR></code>.</p>
<h4>• Install CMake</h4>
<p>Fortunately, the developers of CMake provide simple installation packages for most architectures. Find the right package for your system at their <a href=http://www.cmake.org/cmake/resources/software.html>website</a>.</p>
<h4>• Install BamTools</h4>
<p>With CMake installed, BamTools installation is straightforward.</p>
<ol>
<li>Download the source from <a href=https://github.com/pezmaster31/bamtools/tarball/master>here</a> and untar into <code><YOUR_EXPRESS_DIR></code>.</li>
<li>Navigate to <code><YOUR_EXPRESS_DIR>/bamtools</code>.
<pre class="sc"><code>$ cd <YOUR_EXPRESS_DIR>/bamtools</code></pre></li>
<li>Make a new directory called build and navigate to it.
<pre class="sc"><code>$ mkdir build<br />$ cd build</code></pre></li>
<li>Have CMake generate the makefile.
<pre class="sc"><code>$ cmake ..</code></pre></li>
<li>Build the BamTools libraries by calling make.
<pre class="sc"><code>$ make</code></pre></li>
</ol>
<h4>• Install BOOST</h4>
<p>If you have a package manager such as yum (Linux) or MacPorts (OS X) installed, these will probably be the easiest way to install BOOST. Be sure to install <code>boost-devel</code> so that the headers are included. If you are using Windows, you can use the installer found <a href=http://www.boostpro.com/download/>here</a>. Otherwise, follow the instructions below.</p>
<ol>
<li><a href="http://www.boost.org/users/download/">Download</a>
Boost and the <code>bjam</code> build engine.</li>
<li>Unpack <code>bjam</code> and add it to your <tt>PATH</tt>.</li>
<li>Unpack the Boost tarball and <code>cd</code> to the
Boost source directory. This directory is called
the <code>BOOST_ROOT</code> in some Boost installation
instructions.</li>
<li>Build Boost. Note that you can specify where to
put Boost with the <code>--prefix</code> option. The default
Boost installation directory is <code>/usr/local</code>.
</li></ol>
<ul><ul>
<li>If you are on Mac OS X, type:<br />
<pre class="sc"><code>$ bjam --prefix=<YOUR_BOOST_INSTALL_DIRECTORY> --toolset=darwin architecture=x86 address_model=32_64 link=static runtime-link=static --layout=versioned stage install</code></pre></li>
<li>If you are on a 32-bit Linux system, type:<br />
<pre class="sc"><code>$ bjam --prefix=<YOUR_BOOST_INSTALL_DIRECTORY> --toolset=gcc architecture=x86 address_model=32 link=static runtime-link=static stage install</code></pre></li>
<li>If you are on a 64-bit Linux system, type:<br />
<pre class="sc"><code>$ bjam --prefix=<YOUR_BOOST_INSTALL_DIRECTORY> --toolset=gcc architecture=x86 address_model=64 link=static runtime-link=static stage install</code></pre></li>
</ul> </ul>
<h4>• Install eXpress</h4>
<p>We are now ready to build and install eXpress!</p>
<ol>
<li>Navigate to <code><YOUR_EXPRESS_DIR></code>.
<pre class="sc"><code>$ cd <YOUR_EXPRESS_DIR></code></pre></li>
<li>Make a new directory called build and navigate to it.
<pre class="sc"><code>$ mkdir build<br />$ cd build</code></pre></li>
<li>Have CMake generate the makefile.
<pre class="sc"><code>$ cmake ..</code></pre></li>
<li>Build the eXpress binary by calling make.
<pre class="sc"><code>$ make</code></pre></li>
<li>Copy the binary to <tt>/usr/lib/bin</tt> (or alternatively to another directory in your <tt>PATH</tt>).
<pre class="sc"><code>$ sudo make
install</code></pre></li>
</ol>
<p>You should now be able to type <tt>express</tt> from any directory and see a print-out of the usage and options. If you do not see this and there were no errors in the compilation, double check to see that the binary was copied into a directory in your <tt>PATH</tt>.</p>
<p>→ <a href="#top">Back to top.</a></p>
</div>
<div class="corner-content-1col-bottom"></div>
<!-- CONTENT CELL -->
<div class="corner-content-1col-top"></div>
<div class="content-1col-nobox">
<h1 id="usecase">General Use Case: RNA-Seq abundances</h1>
<h2 id="reqin">Required input</h2>
<p>eXpress requires two input files:</p>
<ol>
<li>A multi-FASTA file containing the transcript
sequences. If the transcriptome of your organism is not
annotated, you can generate this file from your sequencing
reads using a <i>de novo</i> transcriptome assembler such as
<a href=http://trinityrnaseq.sourceforge.net/>Trinity</a>, <a
href=http://www.ebi.ac.uk/~zerbino/oases/>Oases</a>, or <a
href=http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss>Trans-ABySS</a>.
If your organism has a reference genome you can assemble
transcripts directly from mapped reads using <a
href="http://cufflinks.cbcb.umd.edu/">Cufflinks</a>. If your genome is already annotated (in
GTF/GFF), you can generate a multi-FASTA file using the <a href=http://genome.ucsc.edu/>UCSC Genome Browser</a> by uploading your annotation as a track and downloading the sequences under the "Tables" tab.</li>
<li>Read alignments to the multi-FASTA file in SAM
or BAM format. These can either be stored in a file or
streamed directly from an aligner. It is important that you
allow as many multi-mappings as possible. You can also allow
many mismatches during mapping since eXpress builds an error model to probabalistically assign the reads, although this will increase mapping time. If you are combining reads from several library preparations or from sequencing runs using different read lengths, please see the <a href="manual#sam">Manual</a> for important details on how the alignments should be input.</li>
</ol>
<h2 id="example">An example</h2>
<p>In the following two sub-sections, you will run eXpress on a sample RNA-Seq dataset with simulated reads from UGT3A2 and the HOXC cluster using the human genome build hg18. Both the transcript sequences (transcripts.fasta) and raw reads (reads_1.fastq, reads_2.fastq) can be found in the <tt><YOUR_EXPRESS_DIR>/sample_data</tt> directory. For this example to work, you will need to have both <a href=http://bowtie-bio.sourceforge.net>Bowtie</a> and <a href=http://samtools.sf.net>SAMtools</a> installed, but in general any aligner will work and the conversion to BAM is not necessary unless you have insufficient disk space to store the uncompressed SAM.</p>
<p>Before you begin, you must prepare your Bowtie index. Since you wish to allow many multi-mappings, it is useful to build the index with a small offrate (in this case 1). The smaller the offrate, the larger the index and the faster the mapping. If you have disk space to spare, always use an offrate of 1. Build the index with the following commands.</p>
<ol><li><pre class="sc"><code>$ cd <YOUR_EXPRESS_DIR>/sample_data <br />$ bowtie-build --offrate 1 transcripts.fasta transcripts</code></pre></li></ol>
<p>This command will populate your directory with several index files that allow Bowtie to more easily align reads to the transcripts.</p>
<p>You can now map the reads to the transcript sequences using the following Bowtie command, which outputs in SAM (<code>-S</code>), allows for unlimited multi-mappings (<code>-a</code>), and a maximum insert distance of 800 bp between the paired-ends (<code>-X 800</code>). The first three options (<code>a,S,X</code>) are highly recommended for best results. You should also allow for many mismatches, since eXpress models sequencing errors. Furthermore, you will want to take advantage of multiple processors when mapping large files using the <tt>-p</tt> option. See the <a href=http://bowtie-bio.sourceforge.net/manual.shtml>Bowtie Manual</a> for more details on various parameters and options. <br><br>The SAM output from Bowtie is piped into SAMtools in order to compress it to BAM format. This conversion is optional, but will greatly reduce the size of the alignment file.
<ol><li><pre class="sc"><code>$ bowtie -aS -X 800 --offrate 1 transcripts -1 reads_1.fastq -2 reads_2.fastq | samtools view -Sb - > hits.bam </code></pre></li></ol></p>
<h2 id="fromfile">Input from SAM/BAM file</h2>
<p>Once you have aligned your reads to the transcriptome and stored them in a SAM or BAM file, you can run eXpress in default mode with the command:
<ol><li><pre class="sc" margin-left:20px;><code>$ express transcripts.fasta hits.bam</code></pre></li></ol></p>
<p>The default settings will be sufficient for most users, but please see the <a href=manual.html#running>Manual</a> for full descriptions of all available options.</p>
<h2 id="stream">Streaming input from aligner</h2>
<p>If you do not wish to store an intermediate SAM/BAM file, you can pipe the Bowtie output directly into eXpress with the command:</p>
<ol><li><pre class="sc"><code>$ bowtie -aS -X 800 --offrate 1 transcripts -1 reads_1.fastq -2 reads_2.fastq | express transcripts.fasta </code></pre></li></ol></p>
<p>This will give you the exact same output as the
previous command, while avoiding the need potentially large
amounts of disk space for storing the mapped reads.</p>
<h2 id="output">Understanding the output</h2>
<p>The output for eXpress is saved in a file called <tt>results.xprs</tt> in an easy-to-parse tab-delimited format. Since an output directory was not specified, it is simply placed in the working directory (<tt><YOUR_EXPRESS_DIR>/sample_data</tt>). You can view the output for the run using the command:
<ol><li><pre class="sc"><code>$ less results.xprs</code></pre></li></ol></p>
<p>Your results should look like this:</p>
<ol><li><pre class="output"><code>bundle_id target_id length eff_length tot_counts uniq_counts est_counts eff_counts ambig_distr_alpha ambig_distr_beta fpkm fpkm_conf_low fpkm_conf_high solvable
1 NM_014620 2300 2123.247776 958 77 520.554767 563.888952 39.404036 38.861284 24516.910965 21256.860893 27776.961037 T
1 NM_153693 2072 1895.275039 481 5 119.916333 131.097934 19.284056 60.593282 6327.120373 4587.418700 8066.822046 T
1 NR_003084 1640 1463.326696 266 0 17.689103 19.824779 6.175215 86.684623 1208.828007 348.657645 2068.998369 T
1 NM_153633 1666 1489.323587 762 10 416.151511 465.518994 64.183691 54.654069 27942.316563 23818.690311 32065.942816 T
1 NM_018953 1612 1435.330044 228 91 212.893398 239.097731 3.148257 0.390173 14832.365459 11791.549012 17873.181907 T
1 NM_004503 1681 1504.321793 384 37 297.794888 332.770029 16.332212 5.398573 19795.956525 16293.650427 23298.262622 T
2 NM_014212 2037 1860.279224 55 55 55.000000 60.224830 0.000000 0.000000 2956.545409 1737.816128 4175.274690 T
3 NM_173860 849 672.421281 962 962 962.000000 1214.622475 0.000000 0.000000 143065.073604 129374.738511 156755.408697 T
4 NM_022658 2288 2111.249211 4881 4881 4881.000000 5289.630397 0.000000 0.000000 231190.139724 221343.835827 241036.443620 T
5 NM_017410 2396 2219.236296 42 42 42.000000 45.345329 0.000000 0.000000 1892.542947 953.138236 2831.947658 T
6 NM_006897 1541 1364.338534 664 664 664.000000 749.978084 0.000000 0.000000 48668.272828 42912.224535 54424.321120 T
7 NM_017409 1959 1782.288551 47 47 47.000000 51.659985 0.000000 0.000000 2637.058964 1581.064918 3693.053009 T
8 NM_001168316 2283 2106.249809 1552 12 443.212928 480.406032 86.571551 222.603290 21042.752189 18083.116378 24002.388000 T
8 NM_174914 2385 2208.237612 1745 38 1049.949880 1133.995024 74.786096 51.366264 47546.961175 43110.530924 51983.391426 T
8 NR_031764 1853 1676.301226 1243 7 270.837192 299.386118 100.878291 371.706966 16156.833161 13233.910476 19079.755847 T
</code></pre></li></ol>
<p>While it may be difficult to read in your
terminal, opening the file with R or Excel should help you to
visualize the columns. An important column is FPKM,
which reports the estimated abundance in expected
<b>F</b>ragments <b>P</b>er <b>K</b>ilobase per <b>M</b>illion
mapped fragments. Other fields include the estimated counts (est_counts) and parameters for the the posterior count distribution (ambig_distr_alpha/beta), as well as the "effective" estimated counts (eff_counts) after correction for bias. Transcripts are sorted by their bundle_id, denoting which multi-mapping group the transcript belongs to and can help determine isoforms and gene families. More details on the output including full descriptions of all columns can be found in the <a href=manual.html#expr>Manual</a>.</p>
<p>→ <a href="#top">Back to top.</a></p>
<h2 id="diff">Calculating differential expression</h2>
<p>If you have multiple samples and replicates, you may want to discover if there is a significant change in abundance of any genes or isoforms under different conditions. Unfortunately, a differential expression tool does not yet exist to take advantage of the full distribution on estimated counts that we output, but we are working on one that will be available soon. For now, we recommend inputting the rounded effective counts for your samples into a count-based differential expression tool such as <a href='http://www.bioconductor.org/packages/2.6/bioc/html/DEGseq.html'>DEGSeq</a> or <a href='http://www.bioconductor.org/packages/release/bioc/html/edgeR.html'>edgeR</a>.</p>
<p>→ <a href="#top">Back to top.</a></p>
</div>
<div class="corner-content-1col-bottom"></div>
<!-- CONTENT CELL -->
<div class="corner-content-1col-top"></div>
<div class="content-1col-nobox">
<h1 id="more">Additional Resources</h1>
<p><a href=http://www.eecs.berkeley.edu/~pimentel/>Harold Pimentel</a> has made an excellent walkthrough for today's <a href=http://qb3.berkeley.edu/qb3/starseq/>*Seq I Meeting</a>, which is available <a href=http://lmcb.wikispaces.com/eXpress+Walkthrough>here</a>. New users should have a look if you need help getting started with eXpress!</p>
<p>→ <a href="#top">Back to top.</a></p>
</div>
<div class="corner-content-1col-bottom"></div>
</div>
<!-- C.2 SUBCONTENT -->
<div class="subcontent">
<!-- SUBCONTENT CELL -->
<div class="corner-subcontent-top"></div>
<div class="subcontent-box">
<h1 class="menu">Outline</h1>
<div class="sidemenu1">
<!-- CONTENT CELL -->
<ul>
<li><a href="#install">Installation </a></li>
<ul>
<li><a href="#precomp">→Installing pre-compiled binary</a></li>
<li><a href="#src">→Installing from source</a></li>
</ul>
<li><a href="#usecase">General Use Case</a></li>
<ul>
<li><a href="#reqin">→Required input</a></li>
<li><a href="#example">→An example</a></li>
<li><a href="#fromfile">→Input from SAM/BAM file</a></li>
<li><a href="#stream">→Streaming input from aligner</a></li>
<li><a href="#output">→Understanding the output</a></li>
<li><a href="#diff">→Calculating differential expression</a></li>
</ul>
<li><a href="#more">Additional Resources</a></li>
</ul>
</div>
</div>
<div class="corner-subcontent-bottom"></div>
</div>
</div>
<!-- D. FOOTER -->
<div class="footer">
<p>Copyright © 2011 Adam Roberts | All Rights Reserved</p>
<p class="credits">Design by <a href="http://1234.info/" title="Designer Homepage">1234.info</a> | Modified by <a href="http://cs.berkeley.edu/~adarob/">Adam Roberts</a> | <a href="http://validator.w3.org/check?uri=referer" title="Validate XHTML code">XHTML 1.0</a> | <a href="http://jigsaw.w3.org/css-validator/" title="Validate CSS code">CSS 2.0</a></p>
<br />
<p>The eXpress project was funded in part by an NSF graduate fellowship to Adam Roberts and NIH grant 1R01HG006129-01</p>
</div>
<div class="corner-page-bottom"></div>
</div>
</body>
</html>
|