File: mptp.1

package info (click to toggle)
mptp 0.2.5-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 756 kB
  • sloc: ansic: 4,554; python: 1,274; yacc: 301; lex: 132; makefile: 66; sh: 16
file content (367 lines) | stat: -rw-r--r-- 15,177 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
.\" -*- coding: utf-8 -*-
.\" ============================================================================
.TH mptp 1 "Sep 11, 2023" "mptp 0.2.5" "USER COMMANDS"
.\" ============================================================================
.SH NAME
mptp \(em single-locus species delimitation
.\" ============================================================================
.SH SYNOPSIS
.\" left justified, ragged right
.ad l
Maximum-likelihood species delimitation:
.RS
\fBmptp\fR \-\-ml (\-\-single | \-\-multi) \-\-tree_file \fInewickfile\fR 
\-\-output_file \fIoutputfile\fR [\fIoptions\fR]
.PP
.RE
Species delimitation with support values:
.RS
\fBmptp\fR \-\-mcmc \fIpositive integer\fR (\-\-single | \-\-multi)
(\-\-mcmc_startnull | \-\-mcmc_startrandom | \-\-mcmc_startml) \-\-mcmc_log
\fIpositive integer\fR \-\-tree_file \fInewickfile\fR \-\-output_file
\fIoutputfile\fR [\fIoptions\fR]
.PP
.RE
.\" left and right justified (default)
.ad b
.\" ============================================================================
.SH DESCRIPTION
Species is one of the fundamental units of comparison in virtually all
subfields of biology, from systematics to anatomy, development, ecology,
evolution, genetics and molecular biology. The aim of \fBmptp\fR is to offer
an open source tool to infer species boundaries on a a given phylogenetic tree
based on the Poisson Tree Process (PTP) and the Multiple Poisson Tree Process
(mPTP) models.
.PP
\fBmptp\fR offers two methods for inferring species delimitation. First, a
maximum-likelihood based method that uses a dynamic programming approach to
infer an ML estimate. Second, an mcmc approach for sampling the space of
possible delimitations providing the user with support values on the tree clades.
Both approaches are available in two flavours: the PTP and the mPTP model. The
PTP model is specified by using the \fIsingle\fR switch and the mPTP by using
\fImulti\fR.
.\" ============================================================================
.SS Input
The input for \fBmptp\fR is a newick file that contains one phylogenetic tree,
i.e., branches express the expected number of substitutions per alignment site.
.\" ============================================================================
.SS Options
\fBmptp\fR parses a large number of command-line options. For easier
navigation, options are grouped below by theme.
.PP
General options:
.RS
.TP 9
.B \-\-help
Display help text and exit.
.TP
.B \-\-version
Output version information and exit.
.TP
.B \-\-quiet
Supress all output to stdout except for warnings and fatal error messages.
.TP
.BI \-\-tree_file \0filename
Input newick file that contains a phylogenetic tree. Can be rooted or unrooted.
.TP
.BI \-\-output_file \0filename
Specifies the prefix used for generating output files. For maximum-likelihood
species delimitation two files will be created. First, \fIfilename\fR.txt that
contains the actual delimitation and \fIfilename\fR.svg that contains an SVG
figure of the computed delimitation. For mcmc analyses, a file
\fIfilename\fR.txt is created that contains the newick tree with supports
values.
.TP
.BI \-\-outgroup\~ "comma-separated list of taxa"
All computations for species delimitation are carried out on rooted trees. This
option is used only (and is required) In case an unrooted tree was specified
with the \-\-tree_file option. \fImptp\fR roots the unrooted tree by
splitting the branch leading to the most recent common ancestor (MRCA) of the
comma-separated list of taxa into two branches of equal size and introducing a
new node (the root of the new rooted tree) that connects these two branches.
.TP
.BI \-\-outgroup_crop
Crops taxa specified with the \-\-outgroup option from the the tree.
.TP
.BI \-\-min_br \0real
Any branch lengths in the input tree smaller or equal than \fIreal\fR are
excluded (ignored) from the computations. In addition, for mcmc analyses,
subtrees that exclusively consist of branch lengths smaller or equal to
\fIreal\fR are completely ignored from the proposals (support values for those
clades are set to 0). (default: 0.0001)
.TP
.BI \-\-precision\~ "positive integer"
Specifies the precision of the decimal part of floating point numbers on output
(default: 7)
.TP
.BI \-\-minbr_auto \0filename
Automatically detects the minimum branch length from the p-distances of the
FASTA file \fIfilename\fR.
.TP
.BI \-\-tree_show
Show an ASCII version of the processed input tree (i.e. after it is rooted by,
potentially cropping, the outgroup).
.RE
.PP
.\" ============================================================================
Maximum-likelihood estimations:
.PP
.RS
Estimating the maximum-likelihood delimitation is triggered by the switch
\-\-ml followed by \-\-single (the PTP model) or \-\-ml \-\-multi (the mPTP
model). Note that these two methods affect how options \-\-output_file behaves
and can be controlled using the \-\-min_br switch. Both methods require a
rooted phylogenetic tree, however an unrooted tree may be specified in
conjuction with the option \-\-outgroup. In this case, \fImptp\fR roots it at
that outgroup (see General options, \-\-outgroup for more info). Note that both
methods output an SVG depiction of the ML delimitation. See Visualization for
more information on adjusting and fine-tuning the SVG output.
.PP
Both methods ignore discard branch lengths of size smaller than the size
specified using the \-\-min_br option. The PTP model then attempts to find a
connected subgraph of the rooted tree that (a) contains the root, and (b) the
sum of likelihoods of fitting the edges of that subgraph in one exponential
distribution and the remaining  edges in another (exponential distribution) is
maximized. With likelihood we mean the sums of the probability density function
with the mean defined as the reciprocal of the average of edge lengths in the
particular distribution.
.PP
.TP 9
.B \-\-ml \-\-single
Triggers the algorithm for computing an ML estimate of the delimitation using
the PTP model.
.TP
.B \-\-ml \-\-multi
Triggers the algorithm for computing an ML estimate of the delimitation using
the mPTP model.
.TP
.B \-\-pvalue \0real
Only used with the PTP model (specified with \-\-single). Sets the p-value for
performing a likelihood ratio test. Note that, there is no likelihood ratio test
for the mPTP model this test is not done. (default: 0.001)
.RE
.PP
.\" ============================================================================
MCMC method:
.PP
.RS
The MCMC method is triggered with the \-\-mcmc switch combined with either
\-\-single (the PTP model) or \-\-multi (the mPTP model). 
.PP
Some more stuff to write
.PP
.TP 9
.B \-\-mcmc\~ "positive integer" \-\-single
Triggers the algorithm for computing support values by taking the specified
number of MCMC samples (delimitations) using the PTP model.
.TP
.B \-\-mcmc\~ "positive integer" \-\-multi
Triggers the algorithm for computing support values by taking the specified
number of MCMC samples (delimitations) using the mPTP model.
.TP
.B \-\-mcmc_sample\~ "positive integer"
Sample only every n-th MCMC step.
.TP
.B \-\-mcmc_log
Log the scores (log-likelihood) for each MCMC sample in a file and create an SVG
plot.
.TP
.B \-\-mcmc_burnin\~ "positive integer"
Ignore all MCMC samples generated before the specified step. (default: 1)
.TP
.B \-\-mcmc_runs\~ "positive integer"
Perform multiple MCMC runs. If more than 1 run is specified, mptp will generate
one seed for each run based on the provided seed using the \-\-seed switch.
Output files will be generated for each run (default: 1)
.TP
.B \-\-mcmc_credible \0real
Specify the probability (0.0 to 1.0) for which to generate the credible interval
i.e., the probability the true number of species will fall within the credible
interval given the observed data. (default: 0.95)
.TP
.B \-\-mcmc_startnull
Start MCMC sampling from the null-model.
.TP
.B \-\-mcmc_startrandom
Start MCMC sampling from a random delimitation. 
.TP
.B \-\-mcmc_startrandom
Start MCMC sampling from the ML delimitation.
.TP
.B \-\-seed\~ "positive integer"
Specifies the seed for the pseudo-random number generator. (default: randomly
generated based on system time)
.RE
.PP
.\" ============================================================================
SVG Output:
.PP
.RS
The ML method generates one SVG file that visualizes the processed input tree
(i.e. after it is rooted by, potentially cropping, the outgroup) and marks the
subtrees corresponding to coalescent processes (the detected species groups)
with red color, while the speciation process is colored green.
.PP
The MCMC method generates one SVG file per run visualizing the processed
tree, and indicates the support value for each node, i.e., the percentage of
MCMC samples (delimitations) in which the particular node was part of the
speciation process.  A value of 1 means it was always in the speciation process
while a value of 0 means it was always in a coalescent process. The tree
branches are colored according to the support values of descendant nodes; a
support of value of 0 is colored with red, 1 with black, and values in between
are gradients of the two colors. Only support values above 0.5 are shown to
avoid packed numbers in dense branching events. In addition, if \-\-mcmc_log is
specified, an additional SVG image of log-likelihoods plots for each sampled
delimitation is created.
.PP
.TP 9
.B \-\-svg_width\~ "positive integer"
Sets the total width (including margins) of the SVG in pixels. (default: 1920)
.TP
.B \-\-svg_fontsize\~ "positive integer"
Size of font in SVG image. (default: 12)
.TP
.B \-\-svg_tipspacing\~ "positive integer"
Vertical space in pixels between taxa in SVG tree. (default: 20)
.TP
.B \-\-svg_legend_ratio \0real
Ratio (value between 0.0 and 1.0) of total tree length to be displayed as
legend line.  (default: 0.1)
.TP
.B \-\-svg_nolengend
Hide legend.
.TP
.B \-\-svg_marginleft\~ "positive integer"
Left margin in pixels. (default: 20)
.TP
.B \-\-svg_marginright\~ "positive integer"
Right margin in pixels. (default: 20)
.TP
.B \-\-svg_margintop\~ "positive integer"
Top margin in pixels. (default: 20)
.TP
.B \-\-svg_marginbottom\~ "positive integer"
Top margin in pixels. (default: 20)
.TP
.B \-\-svg_inner_radius\~ "positive integer"
Radius of inner nodes in pixels. (default: 0)
.RE
.PP
.\" ============================================================================
.SH EXAMPLES
.PP
Compute the maximum likelihood estimate using the mPTP model by discarding all
branches with length below or equal to 0.0001 
.PP
.RS
\fBmptp\fR \-\-ml \-\-multi \-\-min_br 0.0001 \-\-tree_file \fInewick.txt\fR
\-\-output_file \fIout\fR
.RE
.PP
Run an MCMC analysis of 100 million steps with the mPTP model, that logs every
one million-th step, ignores the first 2 million steps and discards all branches
with lengths smaller or equal to 0.0001. Use 777 as seed. The chain will start
from the ML delimitation (default).
.PP
.RS
\fBmptp\fR \-\-mcmc 100000000 \-\-multi \-\-min_br 0.0001 \-\-tree_file
\fInewick.txt\fR \-\-output_file \fIout\fR \-\-mcmc_log 1000000 \-\-mcmc_burnin
2000000 -seed 777
.RE
.PP
Perform an MCMC analysis of 5 runs, each of 100 million steps with the mPTP
model, log every one million-th step, ignore the first 2 million steps, and
detect the minimum branch length by specifying the FASTA file alignment.fa that
contains the alignment. Use 777 as seed. Start each run from a random
delimitation.
.PP
.RS
\fBmptp\fR \-\-mcmc 100000000 \-\-multi -\-\-mcmc_runs 5 \-\-mcmc_log 1000000
\-\-minbr_auto \fIalignment.fa\fR \-\-tree_file \fInewick.txt\fR
\-\-output_file \fIout\fR \-\-mcmc_burnin 2000000 -seed 777
\-\-mcmc_startrandom
.RE
.PP
.\"
.\" ============================================================================
.SH AUTHORS
Implementation by Tomas Flouri, Sarah Lutteropp and Paschalia Kapli. Additional
PTP and mPTP model authors include Kassian Kobert, Jiajie Zhang, Pavlos
Pavlidis, and Alexandros Stamatakis.
.SH REPORTING BUGS
Submit suggestions and bug-reports at
<https://github.com/Pas-Kapli/mptp/issues>, or e-mail Tomas Flouri
<Tomas.Flouri@h-its.org>.
.\" ============================================================================
.SH AVAILABILITY
Source code and binaries are available at
<https://github.com/Pas-Kapli/mptp>.
.\" ============================================================================
.SH COPYRIGHT
Copyright (C) 2015-2017, Tomas Flouri, Sarah Lutteropp, Paschalia Kapli
.PP
All rights reserved.
.PP
Contact: Tomas Flouri <Tomas.Flouri@h-its.org>,
Scientific Computing, Heidelberg Insititute for Theoretical Studies,
69118 Heidelberg, Germany
.PP
This software is licensed under the terms of the GNU Affero General Public
License version 3.
.PP
\fBGNU Affero General Public License version 3\fR
.PP
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option) any
later version.
.PP
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.  See the GNU Affero General Public License for more
details.
.PP
You should have received a copy of the GNU Affero General Public License along
with this program.  If not, see <http://www.gnu.org/licenses/>.
.SH VERSION HISTORY
New features and important modifications of \fBmptp\fR (short lived or minor
bug releases may not be mentioned):
.RS
.TP
.BR v0.1.0\~ "released June 27th, 2016"
First public release.
.TP
.BR v0.1.1\~ "released July 15th, 2016"
Bug fix (now LRT test is not printed in output file when using --multi)
.TP
.BR v.0.2.0\~ "released September 27th, 2016"
Fixed floating point exception error when constructing random trees, caused
from dividing by zero.  Changed allocation from malloc to calloc, as it caused
unititialized variables when converting unrooted trees to rooted when using the
MCMC method. Fixed sample size for the AIC with a correction for finite sample
sizes.
.TP
.BR v.0.2.1\~ "released October 18th, 2016"
Updated ASV to consider only coalescent roots of ML delimitation. Removed
assertion stopping mptp when using random starting delimitations for the MCMC
method.
.TP
.BR v0.2.2\~ "released January 31st, 2017"
Fixed regular expressions to allow scientific notation for branch lengths when
parsing trees.  Improved the accuracy of ASV score by also taking into account
tips forming coalescent roots.  Fixed memory leaks that occur when parsing
incorrectly formatted trees.
.TP
.BR v0.2.3\~ "released July 25th, 2017"
Replaced hsearch() with custom hashtable. Fixed minor output error messages.
.TP
.BR v0.2.4\~ "released May 14th, 2018"
If we do not manage to generate a random starting delimitation with the wanted
number of species (randomly chosen), we use the currently generated
delimitation instead.
.TP
.BR v0.2.5\~ "released Sep 9th, 2023"
Added likelihood ratio test for the multi method. Added implementation for the
incomplete gamma function, and removed dependency for GNU scientific library.
.RE
.LP