1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339
|
.TH "fasttree" "1" "June 2012" "Lawrence Berkeley National Lab" "User Commands"
.SH NAME
fasttree \- create phylogenetic trees from alignments of nucleotide or protein sequences
.SH DESCRIPTION
fasttree infers approximately-maximum-likelihood phylogenetic trees from
alignments of nucleotide or protein sequences. It handles alignments
with up to a million of sequences in a reasonable amount of time and memory.
fasttree is more accurate than PhyML 3 with default settings, and much
more accurate than the distance-matrix methods that are traditionally
used for large alignments. fasttree uses the Jukes-Cantor or generalized
time-reversible (GTR) models of nucleotide evolution and the JTT
(Jones-Taylor-Thornton 1992) model of amino acid evolution. To account
for the varying rates of evolution across sites, fasttree uses a single
rate for each site (the "CAT" approximation). To quickly estimate the
reliability of each split in the tree, fasttree computes local support
values with the Shimodaira-Hasegawa test (these are the same as PhyML
3's "SH-like local supports").
.SH SYNOPSIS
.PP
.B fasttree protein_alignment > tree
.PP
.B fasttree \fB\-nt\fR nucleotide_alignment > tree
.PP
.B fasttree \fB\-nt\fR \fB\-gtr\fR < nucleotide_alignment > tree
.PP
fasttree accepts alignments in fasta or phylip interleaved formats
.SS "Common options (must be before the alignment file):"
.HP
\fB\-quiet\fR to suppress reporting information
.HP
\fB\-nopr\fR to suppress progress indicator
.HP
\fB\-log\fR logfile \fB\-\-\fR save intermediate trees, settings, and model details
.HP
\fB\-fastest\fR \fB\-\-\fR speed up the neighbor joining phase & reduce memory usage
.IP
(recommended for >50,000 sequences)
.HP
\fB\-n\fR <number> to analyze multiple alignments (phylip format only)
.IP
(use for global bootstrap, with seqboot and CompareToBootstrap.pl)
.HP
\fB\-nosupport\fR to not compute support values
.HP
\fB\-intree\fR newick_file to set the starting tree(s)
.HP
\fB\-intree1\fR newick_file to use this starting tree for all the alignments
.IP
(for faster global bootstrap on huge alignments)
.HP
\fB\-pseudo\fR to use pseudocounts (recommended for highly gapped sequences)
.HP
\fB\-gtr\fR \fB\-\-\fR generalized time\-reversible model (nucleotide alignments only)
.HP
\fB\-wag\fR \fB\-\-\fR Whelan\-And\-Goldman 2001 model (amino acid alignments only)
.HP
\fB\-quote\fR \fB\-\-\fR allow spaces and other restricted characters (but not ' characters) in
.IP
sequence names and quote names in the output tree (fasta input only;
fasttree will not be able to read these trees back in
.HP
\fB\-noml\fR to turn off maximum\-likelihood
.HP
\fB\-nome\fR to turn off minimum\-evolution NNIs and SPRs
.IP
(recommended if running additional ML NNIs with \fB\-intree\fR)
.HP
\fB\-nome\fR \fB\-mllen\fR with \fB\-intree\fR to optimize branch lengths for a fixed topology
.HP
\fB\-cat\fR # to specify the number of rate categories of sites (default 20)
.IP
or \fB\-nocat\fR to use constant rates
.HP
\fB\-gamma\fR \fB\-\-\fR after optimizing the tree under the CAT approximation,
.IP
rescale the lengths to optimize the Gamma20 likelihood
.HP
\fB\-constraints\fR constraintAlignment to constrain the topology search
.IP
constraintAlignment should have 1s or 0s to indicates splits
.HP
\fB\-expert\fR \fB\-\-\fR see more options
.PP
.SS Detailed usage for fasttree 2.1.4 SSE3:
fasttree [\-nt] [\-n 100] [\-quote] [\-pseudo | \fB\-pseudo\fR 1.0]
.IP
[\-boot 1000 | \fB\-nosupport]\fR
[\-intree starting_trees_file | \fB\-intree1\fR starting_tree_file]
[\-quiet | \fB\-nopr]\fR
[\-nni 10] [\-spr 2] [\-noml | \fB\-mllen\fR | \fB\-mlnni\fR 10]
[\-mlacc 2] [\-cat 20 | \fB\-nocat]\fR [\-gamma]
[\-slow | \fB\-fastest]\fR [\-2nd | \fB\-no2nd]\fR [\-slownni] [\-seed 1253]
[\-top | \fB\-notop]\fR [\-topm 1.0 [\-close 0.75] [\-refresh 0.8]]
[\-matrix Matrix | \fB\-nomatrix]\fR [\-nj | \fB\-bionj]\fR
[\-wag] [\-nt] [\-gtr] [\-gtrrates ac ag at cg ct gt] [\-gtrfreq A C G T]
[ \fB\-constraints\fR constraintAlignment [ \fB\-constraintWeight\fR 100.0 ] ]
[\-log logfile]
.IP
[ alignment_file ]
.IP
\f(CW> newick_tree\fR
.PP
or
.PP
fasttree [\-nt] [\-matrix Matrix | \fB\-nomatrix]\fR [\-rawdist] \fB\-makematrix\fR [alignment]
.IP
[\-n 100] > phylip_distance_matrix
.IP
fasttree supports fasta or phylip interleaved alignments
By default fasttree expects protein alignments, use \fB\-nt\fR for nucleotides
fasttree reads standard input if no alignment file is given
.SS "Input/output options:"
.HP
\fB\-n\fR \fB\-\-\fR read in multiple alignments in. This only
.IP
works with phylip interleaved format. For example, you can
use it with the output from phylip's seqboot. If you use \fB\-n\fR, fasttree
will write 1 tree per line to standard output.
.HP
\fB\-intree\fR newickfile \fB\-\-\fR read the starting tree in from newickfile.
.IP
Any branch lengths in the starting trees are ignored.
.HP
\fB\-intree\fR with \fB\-n\fR will read a separate starting tree for each alignment.
.HP
\fB\-intree1\fR newickfile \fB\-\-\fR read the same starting tree for each alignment
.HP
\fB\-quiet\fR \fB\-\-\fR do not write to standard error during normal operation (no progress
.IP
indicator, no options summary, no likelihood values, etc.)
.HP
\fB\-nopr\fR \fB\-\-\fR do not write the progress indicator to stderr
.HP
\fB\-log\fR logfile \fB\-\-\fR save intermediate trees so you can extract
.IP
the trees and restart long\-running jobs if they crash
\fB\-log\fR also reports the per\-site rates (1 means slowest category)
.HP
\fB\-quote\fR \fB\-\-\fR quote sequence names in the output and allow spaces, commas,
.IP
parentheses, and colons in them but not ' characters (fasta files only)
.SS "Distances:"
.IP
Default: For protein sequences, log\-corrected distances and an
.IP
amino acid dissimilarity matrix derived from BLOSUM45
.IP
or for nucleotide sequences, Jukes\-Cantor distances
To specify a different matrix, use \fB\-matrix\fR FilePrefix or \fB\-nomatrix\fR
Use \fB\-rawdist\fR to turn the log\-correction off
or to use %different instead of Jukes\-Cantor
.HP
\fB\-pseudo\fR [weight] \fB\-\-\fR Use pseudocounts to estimate distances between
.IP
sequences with little or no overlap. (Off by default.) Recommended
if analyzing the alignment has sequences with little or no overlap.
If the weight is not specified, it is 1.0
.SS "Topology refinement:"
.IP
By default, fasttree tries to improve the tree with up to 4*log2(N)
rounds of minimum\-evolution nearest\-neighbor interchanges (NNI),
where N is the number of unique sequences, 2 rounds of
subtree\-prune\-regraft (SPR) moves (also min. evo.), and
up to 2*log(N) rounds of maximum\-likelihood NNIs.
Use \fB\-nni\fR to set the number of rounds of min. evo. NNIs,
and \fB\-spr\fR to set the rounds of SPRs.
Use \fB\-noml\fR to turn off both min\-evo NNIs and SPRs (useful if refining
.IP
an approximately maximum\-likelihood tree with further NNIs)
.IP
Use \fB\-sprlength\fR set the maximum length of a SPR move (default 10)
Use \fB\-mlnni\fR to set the number of rounds of maximum\-likelihood NNIs
Use \fB\-mlacc\fR 2 or \fB\-mlacc\fR 3 to always optimize all 5 branches at each NNI,
.IP
and to optimize all 5 branches in 2 or 3 rounds
.IP
Use \fB\-mllen\fR to optimize branch lengths without ML NNIs
Use \fB\-mllen\fR \fB\-nome\fR with \fB\-intree\fR to optimize branch lengths on a fixed topology
Use \fB\-slownni\fR to turn off heuristics to avoid constant subtrees (affects both
.IP
ML and ME NNIs)
.SS "Maximum likelihood model options:"
.HP
\fB\-wag\fR \fB\-\-\fR Whelan\-And\-Goldman 2001 model instead of (default) Jones\-Taylor\-Thorton 1992 model (a.a. only)
.HP
\fB\-gtr\fR \fB\-\-\fR generalized time\-reversible instead of (default) Jukes\-Cantor (nt only)
.HP
\fB\-cat\fR # \fB\-\-\fR specify the number of rate categories of sites (default 20)
.HP
\fB\-nocat\fR \fB\-\-\fR no CAT model (just 1 category)
.HP
\fB\-gamma\fR \fB\-\-\fR after the final round of optimizing branch lengths with the CAT model,
.IP
report the likelihood under the discrete gamma model with the same
number of categories. fasttree uses the same branch lengths but
optimizes the gamma shape parameter and the scale of the lengths.
The final tree will have rescaled lengths. Used with \fB\-log\fR, this
also generates per\-site likelihoods for use with CONSEL, see
GammaLogToPaup.pl and documentation on the fasttree web site.
.SS "Support value options:"
.IP
By default, fasttree computes local support values by resampling the site
likelihoods 1,000 times and the Shimodaira Hasegawa test. If you specify \fB\-nome\fR,
it will compute minimum\-evolution bootstrap supports instead
In either case, the support values are proportions ranging from 0 to 1
.IP
Use \fB\-nosupport\fR to turn off support values or \fB\-boot\fR 100 to use just 100 resamples
Use \fB\-seed\fR to initialize the random number generator
.SS "Searching for the best join:"
.IP
By default, fasttree combines the 'visible set' of fast neighbor\-joining with
.IP
local hill\-climbing as in relaxed neighbor\-joining
.HP
\fB\-slow\fR \fB\-\-\fR exhaustive search (like NJ or BIONJ, but different gap handling)
.HP
\fB\-slow\fR takes half an hour instead of 8 seconds for 1,250 proteins
.HP
\fB\-fastest\fR \fB\-\-\fR search the visible set (the top hit for each node) only
.IP
Unlike the original fast neighbor\-joining, \fB\-fastest\fR updates visible(C)
after joining A and B if join(AB,C) is better than join(C,visible(C))
\fB\-fastest\fR also updates out\-distances in a very lazy way,
\fB\-fastest\fR sets \fB\-2nd\fR on as well, use \fB\-fastest\fR \fB\-no2nd\fR to avoid this
.SS "Top-hit heuristics:"
.IP
By default, fasttree uses a top\-hit list to speed up search
Use \fB\-notop\fR (or \fB\-slow\fR) to turn this feature off
.IP
and compare all leaves to each other,
and all new joined nodes to each other
.HP
\fB\-topm\fR 1.0 \fB\-\-\fR set the top\-hit list size to parameter*sqrt(N)
.IP
fasttree estimates the top m hits of a leaf from the
top 2*m hits of a 'close' neighbor, where close is
defined as d(seed,close) < 0.75 * d(seed, hit of rank 2*m),
and updates the top\-hits as joins proceed
.HP
\fB\-close\fR 0.75 \fB\-\-\fR modify the close heuristic, lower is more conservative
.HP
\fB\-refresh\fR 0.8 \fB\-\-\fR compare a joined node to all other nodes if its
.IP
top\-hit list is less than 80% of the desired length,
or if the age of the top\-hit list is log2(m) or greater
.HP
\fB\-2nd\fR or \fB\-no2nd\fR to turn 2nd\-level top hits heuristic on or off
.IP
This reduces memory usage and running time but may lead to
marginal reductions in tree quality.
(By default, \fB\-fastest\fR turns on \fB\-2nd\fR.)
.SS "Join options:"
.HP
\fB\-nj\fR: regular (unweighted) neighbor\-joining (default)
.HP
\fB\-bionj\fR: weighted joins as in BIONJ
.IP
fasttree will also weight joins during NNIs
.SS "Constrained topology search options:"
.HP
\fB\-constraints\fR alignmentfile \fB\-\-\fR an alignment with values of 0, 1, and \-
.IP
Not all sequences need be present. A column of 0s and 1s defines a
constrained split. Some constraints may be violated
(see 'violating constraints:' in standard error).
.HP
\fB\-constraintWeight\fR \fB\-\-\fR how strongly to weight the constraints. A value of 1
.IP
means a penalty of 1 in tree length for violating a constraint
Default: 100.0
.PP
For more information, see http://www.microbesonline.org/fasttree/
.IP
or the comments in the source code
.IP
fasttree protein_alignment > tree
fasttree \fB\-nt\fR nucleotide_alignment > tree
fasttree \fB\-nt\fR \fB\-gtr\fR < nucleotide_alignment > tree
.PP
fasttree accepts alignments in fasta or phylip interleaved formats
.SS "Common options (must be before the alignment file):"
.HP
\fB\-quiet\fR to suppress reporting information
.HP
\fB\-nopr\fR to suppress progress indicator
.HP
\fB\-log\fR logfile \fB\-\-\fR save intermediate trees, settings, and model details
.HP
\fB\-fastest\fR \fB\-\-\fR speed up the neighbor joining phase & reduce memory usage
.IP
(recommended for >50,000 sequences)
.HP
\fB\-n\fR <number> to analyze multiple alignments (phylip format only)
.IP
(use for global bootstrap, with seqboot and CompareToBootstrap.pl)
.HP
\fB\-nosupport\fR to not compute support values
.HP
\fB\-intree\fR newick_file to set the starting tree(s)
.HP
\fB\-intree1\fR newick_file to use this starting tree for all the alignments
.IP
(for faster global bootstrap on huge alignments)
.HP
\fB\-pseudo\fR to use pseudocounts (recommended for highly gapped sequences)
.HP
\fB\-gtr\fR \fB\-\-\fR generalized time\-reversible model (nucleotide alignments only)
.HP
\fB\-wag\fR \fB\-\-\fR Whelan\-And\-Goldman 2001 model (amino acid alignments only)
.HP
\fB\-quote\fR \fB\-\-\fR allow spaces and other restricted characters (but not ' characters) in
.IP
sequence names and quote names in the output tree (fasta input only;
fasttree will not be able to read these trees back in
.HP
\fB\-noml\fR to turn off maximum\-likelihood
.HP
\fB\-nome\fR to turn off minimum\-evolution NNIs and SPRs
.IP
(recommended if running additional ML NNIs with \fB\-intree\fR)
.HP
\fB\-nome\fR \fB\-mllen\fR with \fB\-intree\fR to optimize branch lengths for a fixed topology
.HP
\fB\-cat\fR # to specify the number of rate categories of sites (default 20)
.IP
or \fB\-nocat\fR to use constant rates
.HP
\fB\-gamma\fR \fB\-\-\fR after optimizing the tree under the CAT approximation,
.IP
rescale the lengths to optimize the Gamma20 likelihood
.HP
\fB\-constraints\fR constraintAlignment to constrain the topology search
.IP
constraintAlignment should have 1s or 0s to indicates splits
.HP
\fB\-expert\fR \fB\-\-\fR see more options
.PP
For more information, see http://www.microbesonline.org/fasttree/
|