1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351
|
.TH SUMTREES "1" "June 2015" "sumtrees 4.0.2" "User Commands"
.SH NAME
sumtrees \- Phylogenetic Tree Summarization and Annotation
.SH SYNOPSIS
.B sumtrees
\fI[\-i FORMAT] [\-b BURNIN] [\-\-force\-rooted] [\-\-force\-unrooted]\fR
.SH DESCRIPTION
SumTrees is a program to summarize non-parameteric bootstrap or
Bayesian posterior probability support for splits or clades on
phylogenetic trees.
.P
The basis of the support assessment is typically given by a set of
non-parametric bootstrap replicate tree samples produced by programs
such as GARLI or RAxML, or by a set of MCMC tree samples produced by
programs such as Mr. Bayes or BEAST. The proportion of trees out of the
samples in which a particular split is found is taken to be the degree
of support for that split as indicated by the samples. The samples that
are the basis of the support can be distributed across multiple files,
and a burn-in option allows for an initial number of trees in each file
to be excluded from the analysis if they are not considered to be drawn
from the true support distribution.
.P
Summarizations collections of trees, e.g., MCMC samples from a posterior
distribution, non\-parametric bootstrap replicates, mapping posterior
probability, support, or frequency that splits/clades are found in the source
set of trees onto a target tree.
.SH OPTIONS
.SS "Source Options:"
.TP
TREE\-FILEPATH
Source(s) of trees to summarize. At least one valid
source of trees must be provided. Use '\-' to specify
reading from standard input (note that this requires
the input file format to be explicitly set using the
\&'\-\-source\-format' option).
.TP
\fB\-i\fR FORMAT, \fB\-\-input\-format\fR FORMAT, \fB\-\-source\-format\fR FORMAT
Format of all input trees (defaults to handling either
NEXUS or NEWICK through inspection; it is more
efficient to explicitly specify the format if it is
known).
.TP
\fB\-b\fR BURNIN, \fB\-\-burnin\fR BURNIN
Number of trees to skip from the beginning of *each*
tree file when counting support (default: 0).
.TP
\fB\-\-force\-rooted\fR, \fB\-\-rooted\fR
Treat source trees as rooted.
.TP
\fB\-\-force\-unrooted\fR, \fB\-\-unrooted\fR
Treat source trees as unrooted.
.TP
\fB\-v\fR, \fB\-\-ultrametricity\-precision\fR, \fB\-\-branch\-length\-epsilon\fR
Precision to use when validating ultrametricity
(default: 1e\-05; specify '0' to disable validation).
.TP
\fB\-\-weighted\-trees\fR
Use weights of trees (as indicated by '[&W m/n]'
comment token) to weight contribution of splits found
on each tree to overall split frequencies.
.TP
\fB\-\-preserve\-underscores\fR
Do not convert unprotected (unquoted) underscores to
spaces when reading NEXUS/NEWICK format trees.
.TP
\fB\-\-taxon\-name\-filepath\fR FILEPATH
Path to file listing all the taxon names or labels
that will be found across the entire set of source
trees. This file should be a plain text file with a
single name list on each line. This file is only read
when multiprocessing ('\-M' or '\-m') is requested. When
multiprocessing using the '\-M' or '\-m' options, all
taxon names need to be defined in advance of any
actual tree analysis. By default this is done by
reading the first tree in the first tree source and
extracting the taxon names. At best, this is,
inefficient, as it involves an extraneous reading of
the tree. At worst, this can be erroneous, if the
first tree does not contain all the taxa. Explicitly
providing the taxon names via this option can avoid
these issues.
.SS "Target Tree Topology Options:"
.TP
\fB\-t\fR FILE, \fB\-\-target\-tree\-filepath\fR FILE
Summarize support and other information from the
source trees to topology or topologies given by the
tree(s) described in FILE. If no use\-specified target
topologies are given, then a summary topology will be
used as the target. Use the '\-s' or '\-\-summary\-target'
to specify the type of summary tree to use.
.TP
\fB\-s\fR SUMMARY\-TYPE, \fB\-\-summary\-target\fR SUMMARY\-TYPE
Construct and summarize support and other information
from the source trees to one of the following summary
topologies:
\- 'consensus'
.TP
A consensus tree. The minimum frequency
threshold of clades to be included can be
specified using the '\-f' or '\-\-min\-clade\-freq'
flags. This is the DEFAULT if a user\- specified
target tree is not given through the '\-t' or
\&'\-\-target\-tree\-filepath' options.
.TP
\- 'mcct'
The maximum clade credibility tree. The tree
from the source set that maximizes the *product*
of clade posterior probabilities.
.TP
\- 'msct'
The maximum clade credibility tree. The tree
from the source set that maximizes the *product*
of clade posterior probabilities.
.SS "Target Tree Supplemental Options:"
.TP
\fB\-f\fR #.##, \fB\-\-min\-consensus\-freq\fR #.##, \fB\-\-min\-freq\fR #.##, \fB\-\-min\-clade\-freq\fR #.##
If using a consensus tree summarization strategy, then
this is the minimum frequency or probability for a
clade or a split to be included in the resulting tree
(default: > 0.5).
.TP
\fB\-\-allow\-unknown\-target\-tree\-taxa\fR
Do not fail with error if target tree(s) have taxa not
previously encountered in source trees or defined in
the taxon discovery file.
.SS "Target Tree Rooting Options:"
.TP
\fB\-\-root\-target\-at\-outgroup\fR TAXON\-LABEL
Root target tree(s) using specified taxon as outgroup.
.TP
\fB\-\-root\-target\-at\-midpoint\fR
Root target tree(s) at midpoint.
.TP
\fB\-\-set\-outgroup\fR TAXON\-LABEL
Rotate the target trees such the specified taxon is in
the outgroup position, but do not explicitly change
the target tree rooting.
.SS "Target Tree Edge Options:"
.TP
\fB\-e\fR STRATEGY, \fB\-\-set\-edges\fR STRATEGY, \fB\-\-edges\fR STRATEGY
Set the edge lengths of the target or summary trees
based on the specified summarization STRATEGY:
\- 'mean\-length'
.TP
Edge lengths will be set to the mean of the
lengths of the corresponding split or clade in
the source trees.
.TP
\- 'median\-length'
Edge lengths will be set to the median of the
.TP
lengths of the corresponding split or clade in
the source trees.
.TP
\- 'mean\-age'
Edge lengths will be adjusted so that the age of
subtended nodes will be equal to the mean age of
the corresponding split or clade in the source
trees. Source trees will need to to be
ultrametric for this option.
.TP
\- 'median\-age'
Edge lengths will be adjusted so that the age of
subtended nodes will be equal to the median age
of the corresponding split or clade in the
source trees. Source trees will need to to be
ultrametric for this option.
.TP
\- support
Edge lengths will be set to the support value
for the split represented by the edge.
.TP
\- 'keep'
Do not change the existing edge lengths. This is
the DEFAULT if target tree(s) are sourced from
an external file using the '\-t' or '\-\-targettree\-filepath' option
.TP
\- 'clear'
Edge lengths will be cleared from the target
trees if they are present.
.TP
Note the default settings varies according to the
following, in order of preference:
(1) If target trees are specified using the '\-t' or
.TP
\&'\-\-target\-tree\-filepath' option, then the default edge
summarization strategy is: 'keep'.
.TP
(2) If target trees are not specified, but the
\&'\-\-summarize\-node\-ages' option is specified,
then the default edge summarization strategy is:
\&'mean\-age'.
.TP
(3) If no target trees are specified and the
node ages are NOT specified to be summarized,
then the default edge summarization strategy is:
\&'mean\-length'.
.TP
\fB\-\-force\-minimum\-edge\-length\fR FORCE_MINIMUM_EDGE_LENGTH
(If setting edge lengths) force all edges to be at
least this length.
.TP
\fB\-\-collapse\-negative\-edges\fR
(If setting edge lengths) force parent node ages to be
at least as old as its oldest child when summarizing
node ages.
.SS "Target Tree Annotation Options:"
.TP
\fB\-\-summarize\-node\-ages\fR, \fB\-\-ultrametric\fR, \fB\-\-node\-ages\fR
Assume that source trees are ultrametic and summarize
node ages (distances from tips).
.TP
\fB\-l\fR {support,keep,clear}, \fB\-\-labels\fR {support,keep,clear}
Set the node labels of the summary or target tree(s):
\- 'support'
.TP
Node labels will be set to the support value for
the clade represented by the node. This is the
DEFAULT.
.TP
\- 'keep'
Do not change the existing node labels.
.TP
\- 'clear'
Node labels will be cleared from the target
trees if they are present.
.TP
\fB\-\-suppress\-annotations\fR, \fB\-\-no\-annotations\fR
Do NOT annotate nodes and edges with any summarization
information metadata such as.support values, edge
length and/or node age summary statistcs, etc.
.SS "Support Expression Options:"
.TP
\fB\-p\fR, \fB\-\-percentages\fR
Indicate branch support as percentages (otherwise,
will report as proportions by default).
.TP
\fB\-d\fR #, \fB\-\-decimals\fR #
Number of decimal places in indication of support
values (default: 8).
.SS "Output Options:"
.TP
\fB\-o\fR FILEPATH, \fB\-\-output\-tree\-filepath\fR FILEPATH, \fB\-\-output\fR FILEPATH
Path to output file (if not specified, will print to
standard output).
.TP
\fB\-F\fR {nexus,newick,phylip,nexml}, \fB\-\-output\-tree\-format\fR {nexus,newick,phylip,nexml}
Format of the output tree file (if not specified,
defaults to input format, if this has been explicitly
specified, or 'nexus' otherwise).
.TP
\fB\-x\fR PREFIX, \fB\-\-extended\-output\fR PREFIX
If specified, extended summarization information will
be generated, consisting of the following files:
\- '<PREFIX>.topologies.trees'
.TP
A collection of topologies found in the sources
reported with their associated posterior
probabilities as metadata annotations.
.TP
\- '<PREFIX>.bipartitions.trees'
A collection of bipartitions, each represented
as a tree, with associated information as
metadataannotations.
.TP
\- '<PREFIX>.bipartitions.tsv'
Table listing bipartitions as a group pattern as
the key column, and information regarding each
the bipartitions as the remaining columns.
.TP
\- '<PREFIX>.edge\-lengths.tsv'
List of bipartitions and corresponding edge
lengths. Only generated if edge lengths are
summarized.
.TP
\- '<PREFIX>.node\-ages.tsv'
List of bipartitions and corresponding ages.
Only generated if node ages are summarized.
.TP
\fB\-\-no\-taxa\-block\fR
When writing NEXUS format output, do not include a
taxa block in the output treefile (otherwise will
create taxa block by default).
.TP
\fB\-\-no\-analysis\-metainformation\fR, \fB\-\-no\-meta\-comments\fR
Do not include meta\-information describing the
summarization parameters and execution details.
.TP
\fB\-c\fR ADDITIONAL_COMMENTS, \fB\-\-additional\-comments\fR ADDITIONAL_COMMENTS
Additional comments to be added to the summary file.
.TP
\fB\-r\fR, \fB\-\-replace\fR
Replace/overwrite output file without asking if it
already exists.
.SS "Parallel Processing Options:"
.TP
\fB\-M\fR, \fB\-\-maximum\-multiprocessing\fR
Run in parallel mode using as many processors as
available, up to the number of sources.
.TP
\fB\-m\fR NUM\-PROCESSES, \fB\-\-multiprocessing\fR NUM\-PROCESSES
Run in parallel mode with up to a maximum of NUMPROCESSES processes ('max' or '#' means to run in as
many processes as there are cores on the local
machine; i.e., same as specifying '\-M' or '\-\-maximummultiprocessing').
.SS "Program Logging Options:"
.TP
\fB\-g\fR LOG\-FREQUENCY, \fB\-\-log\-frequency\fR LOG\-FREQUENCY
Tree processing progress logging frequency (default:
500; set to 0 to suppress).
.TP
\fB\-q\fR, \fB\-\-quiet\fR
Suppress ALL logging, progress and feedback messages.
.SS "Program Error Options:"
.TP
\fB\-\-ignore\-missing\-support\fR
Ignore missing support tree files (note that at least
one must exist).
.SS "Program Information Options:"
.TP
\fB\-h\fR, \fB\-\-help\fR
Show help information for program and exit.
.TP
\fB\-\-citation\fR
Show citation information for program and exit.
.TP
\fB\-\-usage\-examples\fR
Show usage examples of program and exit.
.TP
\fB\-\-describe\fR
Show information regarding your DendroPy and Python
installations and exit.
.SH AUTHORS
Jeet Sukumaran and Mark T. Holder
.SH SEE ALSO
If any stage of your work or analyses relies on code or programs from
this library, either directly or indirectly (e.g., through usage of your
own or third\-party programs, pipelines, or toolkits which use, rely on,
incorporate, or are otherwise primarily derivative of code/programs in
this library), please cite:
.IP
Sukumaran, J and MT Holder. 2010. DendroPy: a Python library for
phylogenetic computing. Bioinformatics 26: 1569\-1571.
.IP
Sukumaran, J and MT Holder. SumTrees: Phylogenetic Tree Summarization.
4.0.0 (Jan 31 2015). Available at
https://github.com/jeetsukumaran/DendroPy.
.P
Note that, in the interests of scientific reproducibility, you should
describe in the text of your publications not only the specific
version of the SumTrees program, but also the DendroPy library used in
your analysis. For your information, you are running DendroPy 4.0.2.
|