1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
|
<!DOCTYPE book [
<!ENTITY % sgml.features "IGNORE">
<!ENTITY % xml.features "INCLUDE">
<!ENTITY % dbcent
PUBLIC "-//OASIS//ENTITIES DocBook Character Entities V4.5//EN"
"/usr/share/xml/docbook/schema/dtd/4.5/dbcentx.mod">
%dbcent;
<!ENTITY % ent-mmlalias
PUBLIC "-//W3C//ENTITIES Aiases for MathML 2.0//EN"
"/usr/share/xml/schema/w3c/mathml/dtd/mmlalias.ent" >
%ent-mmlalias;
]>
<article xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:mml="http://www.w3.org/1998/Math/MathML"
xml:lang="en">
<info><title><application>BAli-Phy</application> Tutorial</title>
<author><personname><firstname>Benjamin</firstname><surname>Redelings</surname></personname></author>
</info>
<section xml:id="intro"><info><title>Introduction</title></info>
<para>
This tutorial contains step-by-step commands, assuming that you have already installed bali-phy in <filename>~/local/</filename>. The <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.bali-phy.org/README.html">manual</link> contains more detailed explanations for many things, so you might want to refer to it during the tutorial.</para>
</section>
<section xml:id="work_directory"><info><title>Setting up the <filename>~/Work</filename> directory</title></info>
<para>Go to your home directory:
% cd ~
Make a directory called Work inside it:
% mkdir Work
Go into the <filename>Work</filename> directory:
% cd Work
Make a shortcut (a symbolic link) to the examples directory:
% ln -s ~/local/share/bali-phy/examples examples
Take a look inside the <filename>examples</filename> directory:
% ls examples
Take a look at an input file:
% less examples/5S-rRNA/5d.fasta
Take a look at an input file:
% alignment-info examples/5S-rRNA/5d.fasta
</para>
</section>
<section xml:id="command_line_options"><info><title>Command line options</title></info>
<para>
What version of bali-phy are you running? When was it compiled? Which compiler? For what computer type?
% bali-phy -v
Look at the list of command line options:
% bali-phy --help
Look at them with the ability to scroll back:
% bali-phy --help | less
Some options have a short form which is a single letter:
% bali-phy -h | less
</para>
<section ><info><title>RNA</title></info>
<para>
Analyze a data set, but don't begin MCMC. (This is useful to know if the analysis works, what model will be used,
compute likelihoods, etc.)
% bali-phy --test examples/5S-rRNA/5d.fasta
Finally, run an analysis! (This is just 10 iterations, so its not a real run.)
% bali-phy examples/5S-rRNA/5d.fasta --iterations=10
If you specify <parameter>--imodel=none</parameter>, then the alignment won't be estimated, and indels will be ignored (just like <application>MrBayes</application>).
% bali-phy examples/5S-rRNA/5d.fasta --iterations=10 --imodel=none
You can specify the alphabet, substitution model, insertion/deletion model, etc.
Defaults are used if you don't specify.
% bali-phy examples/5S-rRNA/5d.fasta --iterations=10 --alphabet=DNA --smodel=TN --imodel=RS07
You can change this to the GTR, if you want:
% bali-phy examples/5S-rRNA/5d.fasta --iterations=10 --alphabet=DNA --smodel=GTR --imodel=RS07
You can add gamma[4]+INV rate heterogeneity:
% bali-phy examples/5S-rRNA/5d.fasta --iterations=10 --alphabet=DNA --smodel=GTR+Rates.Gamma[4]+INV --imodel=RS07
</para>
</section>
<section ><info><title>Amino Acids</title></info>
<para>
When the data set contains amino acids, the default is amino acids:
% bali-phy examples/EF-Tu/12d.fasta --iterations=10
</para>
</section>
<section ><info><title>Codons</title></info>
<para>
What alphabet is used here? What substitution model?
% bali-phy examples/HIV/chain-2005/env-clustal-codons.fasta --test
What happens when trying to use the M0 model (e.g. positive selection)?
% bali-phy examples/HIV/chain-2005/env-clustal-codons.fasta --test --smodel=M0
The M0 model requires a codon alphabet:
% bali-phy examples/HIV/chain-2005/env-clustal-codons.fasta --test --smodel=M0 --alphabet=Codons
</para>
</section>
</section>
<section ><info><title>Output</title></info>
<para>
See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.bali-phy.org/README.html#output">Section 6: Output</link> of the manual for more information about this section.
</para>
<para>
Try running an analysis with a few more iterations. The & makes the command run in the background, so that you can run other commands before it finishes.
% bali-phy examples/5S-rRNA/25-muscle.fasta --iterations=200 &
Run another copy of the same analysis:
% bali-phy examples/5S-rRNA/25-muscle.fasta --iterations=200 &
You can take a look at your running jobs:
% jobs
</para>
<section ><info><title>Inspecting output files</title></info>
<para>
Look at the directories that were created to store the output files:
% ls
% ls 25-muscle-1/
% ls 25-muscle-2/
See how many iterations have been completed so far:
% wc -l 25-muscle-1/C1.p 25-muscle-2/C1.p
Wait a second, and repeat the command.
% wc -l 25-muscle-1/C1.p 25-muscle-2/C1.p
See if you can determine the following information from the beginning of the C1.out file:
What command was run?
When was it run?
Which computer was it run on?
Which directory was it run in?
Which directory contains the output files?
What was the process id (PID) of the running bali-phy program?
What random seed was used?
What was the input file?
What alphabet was used to read in the file?
What substitution model was used to analyze the file?
What insertion/deletion model was used to analyze the file?
% less 25-muscle-1/C1.out
Examine the file containing the sampled trees:
% less 25-muscle-1/C1.trees
Examine the file containing the sampled alignments:
% less 25-muscle-1/C1.P1.fastas
Examine the file containing the successive best alignment/tree pairs visited:
% less 25-muscle-1/C1.MAP
</para>
</section>
<section ><info><title>Summarizing the output</title></info>
<para>
Try summarizing the sampled numerical parameters (e.g. not trees and alignments):
% statreport --help
% statreport 25-muscle-1/C1.p 25-muscle-2/C1.p > Report
% statreport 25-muscle-1/C1.p 25-muscle-2/C1.p --mean > Report
% statreport 25-muscle-1/C1.p 25-muscle-2/C1.p --mean --median > Report
% less Report
Now lets examine the summaries using a graphical program. If you are using Windows or Mac, run Tracer, and press the <guilabel>+</guilabel> button to add these files. What kind of ESS do you get? If you are using Linux, do
% tracer 25-muscle-1/C1.p 25-muscle-2/C1.p &
Now lets compute the consensus tree for these two runs:
% trees-consensus --help
% trees-consensus 25-muscle-1/C1.trees 25-muscle-2/C1.trees > c50.PP.tree
% trees-consensus 25-muscle-1/C1.trees 25-muscle-2/C1.trees --report=consensus > c50.PP.tree
% less consensus
% figtree c50.PP.tree &
Now lets see if there is evidence that the two runs have not converged yet:
% trees-bootstrap --help
% trees-bootstrap 25-muscle-1/C1.trees 25-muscle-2/C1.trees > partitions.bs
% less partitions.bs
Now lets use the analysis script to run all the summaries and make a report:
% bp-analyze.pl 25-muscle-1/ 25-muscle-2/
% firefox Results/index.html &
</para>
</section>
</section>
<section><info><title>Multi-gene analyses</title></info>
<para>This is described in more detail in section 4.3 of the <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.bali-phy.org/README.html">manual</link>.</para>
<para>
Download HIV env and pol alignments with the same sequence names:
% wget http://nucleus.biology.duke.edu/~bredelings/alignment_files/env-common.fasta
% wget http://nucleus.biology.duke.edu/~bredelings/alignment_files/pol-common.fasta
Try to analyze these genes jointly under a codon model:
% bali-phy --test --alphabet Codons env-common.fasta pol-common.fasta --smodel M0
Take a look at the lengths of these files, to see whether the sequencealignment lengths are multiples of 3:
% alignment-info pol-common.fasta
% alignment-info env-common.fasta
Now edit the fasta files so that the lengths are a multiple of three:
% alignment-cat -c1-951 env-common.fasta > env-common2.fasta
% alignment-cat -c1-1068 pol-common.fasta > pol-common2.fasta
Try running bali-phy again:
% bali-phy --test --alphabet Codons env-common2.fasta pol-common2.fasta --smodel M0
Now open the fasta files in an editor and replace all nucleotide ambiguity codes K and M with N. This time the run should succeed:
% bali-phy --test --alphabet Codons env-common2.fasta pol-common2.fasta --smodel M0
By default, the M0 substitution model will be selected for both genes. However, each gene will have a separate instance of the model with its own gene-specific values for the model parameters. Try specifying that both genes share one instance of the model parameters:
% bali-phy --iterations=10 --alphabet Codons env-common2.fasta pol-common2.fasta --smodel=1,2:M0
Now try reading one gene as Codons, and the other as DNA:
% bali-phy --iterations=10 --alphabet=1:Codons --alphabet=2:DNA env-common2.fasta pol-common2.fasta
</para>
</section>
</article>
|