1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367
|
.\" -*- coding: utf-8 -*-
.\" ============================================================================
.TH mptp 1 "Sep 11, 2023" "mptp 0.2.5" "USER COMMANDS"
.\" ============================================================================
.SH NAME
mptp \(em single-locus species delimitation
.\" ============================================================================
.SH SYNOPSIS
.\" left justified, ragged right
.ad l
Maximum-likelihood species delimitation:
.RS
\fBmptp\fR \-\-ml (\-\-single | \-\-multi) \-\-tree_file \fInewickfile\fR
\-\-output_file \fIoutputfile\fR [\fIoptions\fR]
.PP
.RE
Species delimitation with support values:
.RS
\fBmptp\fR \-\-mcmc \fIpositive integer\fR (\-\-single | \-\-multi)
(\-\-mcmc_startnull | \-\-mcmc_startrandom | \-\-mcmc_startml) \-\-mcmc_log
\fIpositive integer\fR \-\-tree_file \fInewickfile\fR \-\-output_file
\fIoutputfile\fR [\fIoptions\fR]
.PP
.RE
.\" left and right justified (default)
.ad b
.\" ============================================================================
.SH DESCRIPTION
Species is one of the fundamental units of comparison in virtually all
subfields of biology, from systematics to anatomy, development, ecology,
evolution, genetics and molecular biology. The aim of \fBmptp\fR is to offer
an open source tool to infer species boundaries on a a given phylogenetic tree
based on the Poisson Tree Process (PTP) and the Multiple Poisson Tree Process
(mPTP) models.
.PP
\fBmptp\fR offers two methods for inferring species delimitation. First, a
maximum-likelihood based method that uses a dynamic programming approach to
infer an ML estimate. Second, an mcmc approach for sampling the space of
possible delimitations providing the user with support values on the tree clades.
Both approaches are available in two flavours: the PTP and the mPTP model. The
PTP model is specified by using the \fIsingle\fR switch and the mPTP by using
\fImulti\fR.
.\" ============================================================================
.SS Input
The input for \fBmptp\fR is a newick file that contains one phylogenetic tree,
i.e., branches express the expected number of substitutions per alignment site.
.\" ============================================================================
.SS Options
\fBmptp\fR parses a large number of command-line options. For easier
navigation, options are grouped below by theme.
.PP
General options:
.RS
.TP 9
.B \-\-help
Display help text and exit.
.TP
.B \-\-version
Output version information and exit.
.TP
.B \-\-quiet
Supress all output to stdout except for warnings and fatal error messages.
.TP
.BI \-\-tree_file \0filename
Input newick file that contains a phylogenetic tree. Can be rooted or unrooted.
.TP
.BI \-\-output_file \0filename
Specifies the prefix used for generating output files. For maximum-likelihood
species delimitation two files will be created. First, \fIfilename\fR.txt that
contains the actual delimitation and \fIfilename\fR.svg that contains an SVG
figure of the computed delimitation. For mcmc analyses, a file
\fIfilename\fR.txt is created that contains the newick tree with supports
values.
.TP
.BI \-\-outgroup\~ "comma-separated list of taxa"
All computations for species delimitation are carried out on rooted trees. This
option is used only (and is required) In case an unrooted tree was specified
with the \-\-tree_file option. \fImptp\fR roots the unrooted tree by
splitting the branch leading to the most recent common ancestor (MRCA) of the
comma-separated list of taxa into two branches of equal size and introducing a
new node (the root of the new rooted tree) that connects these two branches.
.TP
.BI \-\-outgroup_crop
Crops taxa specified with the \-\-outgroup option from the the tree.
.TP
.BI \-\-min_br \0real
Any branch lengths in the input tree smaller or equal than \fIreal\fR are
excluded (ignored) from the computations. In addition, for mcmc analyses,
subtrees that exclusively consist of branch lengths smaller or equal to
\fIreal\fR are completely ignored from the proposals (support values for those
clades are set to 0). (default: 0.0001)
.TP
.BI \-\-precision\~ "positive integer"
Specifies the precision of the decimal part of floating point numbers on output
(default: 7)
.TP
.BI \-\-minbr_auto \0filename
Automatically detects the minimum branch length from the p-distances of the
FASTA file \fIfilename\fR.
.TP
.BI \-\-tree_show
Show an ASCII version of the processed input tree (i.e. after it is rooted by,
potentially cropping, the outgroup).
.RE
.PP
.\" ============================================================================
Maximum-likelihood estimations:
.PP
.RS
Estimating the maximum-likelihood delimitation is triggered by the switch
\-\-ml followed by \-\-single (the PTP model) or \-\-ml \-\-multi (the mPTP
model). Note that these two methods affect how options \-\-output_file behaves
and can be controlled using the \-\-min_br switch. Both methods require a
rooted phylogenetic tree, however an unrooted tree may be specified in
conjuction with the option \-\-outgroup. In this case, \fImptp\fR roots it at
that outgroup (see General options, \-\-outgroup for more info). Note that both
methods output an SVG depiction of the ML delimitation. See Visualization for
more information on adjusting and fine-tuning the SVG output.
.PP
Both methods ignore discard branch lengths of size smaller than the size
specified using the \-\-min_br option. The PTP model then attempts to find a
connected subgraph of the rooted tree that (a) contains the root, and (b) the
sum of likelihoods of fitting the edges of that subgraph in one exponential
distribution and the remaining edges in another (exponential distribution) is
maximized. With likelihood we mean the sums of the probability density function
with the mean defined as the reciprocal of the average of edge lengths in the
particular distribution.
.PP
.TP 9
.B \-\-ml \-\-single
Triggers the algorithm for computing an ML estimate of the delimitation using
the PTP model.
.TP
.B \-\-ml \-\-multi
Triggers the algorithm for computing an ML estimate of the delimitation using
the mPTP model.
.TP
.B \-\-pvalue \0real
Only used with the PTP model (specified with \-\-single). Sets the p-value for
performing a likelihood ratio test. Note that, there is no likelihood ratio test
for the mPTP model this test is not done. (default: 0.001)
.RE
.PP
.\" ============================================================================
MCMC method:
.PP
.RS
The MCMC method is triggered with the \-\-mcmc switch combined with either
\-\-single (the PTP model) or \-\-multi (the mPTP model).
.PP
Some more stuff to write
.PP
.TP 9
.B \-\-mcmc\~ "positive integer" \-\-single
Triggers the algorithm for computing support values by taking the specified
number of MCMC samples (delimitations) using the PTP model.
.TP
.B \-\-mcmc\~ "positive integer" \-\-multi
Triggers the algorithm for computing support values by taking the specified
number of MCMC samples (delimitations) using the mPTP model.
.TP
.B \-\-mcmc_sample\~ "positive integer"
Sample only every n-th MCMC step.
.TP
.B \-\-mcmc_log
Log the scores (log-likelihood) for each MCMC sample in a file and create an SVG
plot.
.TP
.B \-\-mcmc_burnin\~ "positive integer"
Ignore all MCMC samples generated before the specified step. (default: 1)
.TP
.B \-\-mcmc_runs\~ "positive integer"
Perform multiple MCMC runs. If more than 1 run is specified, mptp will generate
one seed for each run based on the provided seed using the \-\-seed switch.
Output files will be generated for each run (default: 1)
.TP
.B \-\-mcmc_credible \0real
Specify the probability (0.0 to 1.0) for which to generate the credible interval
i.e., the probability the true number of species will fall within the credible
interval given the observed data. (default: 0.95)
.TP
.B \-\-mcmc_startnull
Start MCMC sampling from the null-model.
.TP
.B \-\-mcmc_startrandom
Start MCMC sampling from a random delimitation.
.TP
.B \-\-mcmc_startrandom
Start MCMC sampling from the ML delimitation.
.TP
.B \-\-seed\~ "positive integer"
Specifies the seed for the pseudo-random number generator. (default: randomly
generated based on system time)
.RE
.PP
.\" ============================================================================
SVG Output:
.PP
.RS
The ML method generates one SVG file that visualizes the processed input tree
(i.e. after it is rooted by, potentially cropping, the outgroup) and marks the
subtrees corresponding to coalescent processes (the detected species groups)
with red color, while the speciation process is colored green.
.PP
The MCMC method generates one SVG file per run visualizing the processed
tree, and indicates the support value for each node, i.e., the percentage of
MCMC samples (delimitations) in which the particular node was part of the
speciation process. A value of 1 means it was always in the speciation process
while a value of 0 means it was always in a coalescent process. The tree
branches are colored according to the support values of descendant nodes; a
support of value of 0 is colored with red, 1 with black, and values in between
are gradients of the two colors. Only support values above 0.5 are shown to
avoid packed numbers in dense branching events. In addition, if \-\-mcmc_log is
specified, an additional SVG image of log-likelihoods plots for each sampled
delimitation is created.
.PP
.TP 9
.B \-\-svg_width\~ "positive integer"
Sets the total width (including margins) of the SVG in pixels. (default: 1920)
.TP
.B \-\-svg_fontsize\~ "positive integer"
Size of font in SVG image. (default: 12)
.TP
.B \-\-svg_tipspacing\~ "positive integer"
Vertical space in pixels between taxa in SVG tree. (default: 20)
.TP
.B \-\-svg_legend_ratio \0real
Ratio (value between 0.0 and 1.0) of total tree length to be displayed as
legend line. (default: 0.1)
.TP
.B \-\-svg_nolengend
Hide legend.
.TP
.B \-\-svg_marginleft\~ "positive integer"
Left margin in pixels. (default: 20)
.TP
.B \-\-svg_marginright\~ "positive integer"
Right margin in pixels. (default: 20)
.TP
.B \-\-svg_margintop\~ "positive integer"
Top margin in pixels. (default: 20)
.TP
.B \-\-svg_marginbottom\~ "positive integer"
Top margin in pixels. (default: 20)
.TP
.B \-\-svg_inner_radius\~ "positive integer"
Radius of inner nodes in pixels. (default: 0)
.RE
.PP
.\" ============================================================================
.SH EXAMPLES
.PP
Compute the maximum likelihood estimate using the mPTP model by discarding all
branches with length below or equal to 0.0001
.PP
.RS
\fBmptp\fR \-\-ml \-\-multi \-\-min_br 0.0001 \-\-tree_file \fInewick.txt\fR
\-\-output_file \fIout\fR
.RE
.PP
Run an MCMC analysis of 100 million steps with the mPTP model, that logs every
one million-th step, ignores the first 2 million steps and discards all branches
with lengths smaller or equal to 0.0001. Use 777 as seed. The chain will start
from the ML delimitation (default).
.PP
.RS
\fBmptp\fR \-\-mcmc 100000000 \-\-multi \-\-min_br 0.0001 \-\-tree_file
\fInewick.txt\fR \-\-output_file \fIout\fR \-\-mcmc_log 1000000 \-\-mcmc_burnin
2000000 -seed 777
.RE
.PP
Perform an MCMC analysis of 5 runs, each of 100 million steps with the mPTP
model, log every one million-th step, ignore the first 2 million steps, and
detect the minimum branch length by specifying the FASTA file alignment.fa that
contains the alignment. Use 777 as seed. Start each run from a random
delimitation.
.PP
.RS
\fBmptp\fR \-\-mcmc 100000000 \-\-multi -\-\-mcmc_runs 5 \-\-mcmc_log 1000000
\-\-minbr_auto \fIalignment.fa\fR \-\-tree_file \fInewick.txt\fR
\-\-output_file \fIout\fR \-\-mcmc_burnin 2000000 -seed 777
\-\-mcmc_startrandom
.RE
.PP
.\"
.\" ============================================================================
.SH AUTHORS
Implementation by Tomas Flouri, Sarah Lutteropp and Paschalia Kapli. Additional
PTP and mPTP model authors include Kassian Kobert, Jiajie Zhang, Pavlos
Pavlidis, and Alexandros Stamatakis.
.SH REPORTING BUGS
Submit suggestions and bug-reports at
<https://github.com/Pas-Kapli/mptp/issues>, or e-mail Tomas Flouri
<Tomas.Flouri@h-its.org>.
.\" ============================================================================
.SH AVAILABILITY
Source code and binaries are available at
<https://github.com/Pas-Kapli/mptp>.
.\" ============================================================================
.SH COPYRIGHT
Copyright (C) 2015-2017, Tomas Flouri, Sarah Lutteropp, Paschalia Kapli
.PP
All rights reserved.
.PP
Contact: Tomas Flouri <Tomas.Flouri@h-its.org>,
Scientific Computing, Heidelberg Insititute for Theoretical Studies,
69118 Heidelberg, Germany
.PP
This software is licensed under the terms of the GNU Affero General Public
License version 3.
.PP
\fBGNU Affero General Public License version 3\fR
.PP
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option) any
later version.
.PP
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU Affero General Public License for more
details.
.PP
You should have received a copy of the GNU Affero General Public License along
with this program. If not, see <http://www.gnu.org/licenses/>.
.SH VERSION HISTORY
New features and important modifications of \fBmptp\fR (short lived or minor
bug releases may not be mentioned):
.RS
.TP
.BR v0.1.0\~ "released June 27th, 2016"
First public release.
.TP
.BR v0.1.1\~ "released July 15th, 2016"
Bug fix (now LRT test is not printed in output file when using --multi)
.TP
.BR v.0.2.0\~ "released September 27th, 2016"
Fixed floating point exception error when constructing random trees, caused
from dividing by zero. Changed allocation from malloc to calloc, as it caused
unititialized variables when converting unrooted trees to rooted when using the
MCMC method. Fixed sample size for the AIC with a correction for finite sample
sizes.
.TP
.BR v.0.2.1\~ "released October 18th, 2016"
Updated ASV to consider only coalescent roots of ML delimitation. Removed
assertion stopping mptp when using random starting delimitations for the MCMC
method.
.TP
.BR v0.2.2\~ "released January 31st, 2017"
Fixed regular expressions to allow scientific notation for branch lengths when
parsing trees. Improved the accuracy of ASV score by also taking into account
tips forming coalescent roots. Fixed memory leaks that occur when parsing
incorrectly formatted trees.
.TP
.BR v0.2.3\~ "released July 25th, 2017"
Replaced hsearch() with custom hashtable. Fixed minor output error messages.
.TP
.BR v0.2.4\~ "released May 14th, 2018"
If we do not manage to generate a random starting delimitation with the wanted
number of species (randomly chosen), we use the currently generated
delimitation instead.
.TP
.BR v0.2.5\~ "released Sep 9th, 2023"
Added likelihood ratio test for the multi method. Added implementation for the
incomplete gamma function, and removed dependency for GNU scientific library.
.RE
.LP
|