File: README.md

package info (click to toggle)
mptp 0.2.4-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 620 kB
  • sloc: ansic: 4,465; python: 1,274; yacc: 301; lex: 132; makefile: 57; sh: 1
file content (264 lines) | stat: -rw-r--r-- 11,706 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
# Species delimitation using the multi-rate Poisson Tree Processes (mPTP)

[![License](https://img.shields.io/badge/license-AGPL-blue.svg)](http://www.gnu.org/licenses/agpl-3.0.en.html)
[![Build Status](https://travis-ci.org/Pas-Kapli/mptp.svg?branch=master)](https://travis-ci.org/Pas-Kapli/mptp)

## Introduction

The aim of this project is to implement a fast species delimitation method,
based on PTP (Zhang et al. 2013). The new tool should:

* have an open source code with an appropriate open source license.
* 64-bit multi-threaded design that handles very large datasets.

We have implemented a tool called mPTP which can handle very large biodiversity
datasets.  It implements a fast method to compute the ML delimitation from an
inferred phylogenetic tree of the samples.  Using MCMC, it also computes the
support values for each clade, which can be used to assess the confidence of
the ML delimitation.

**ML delimitation** mPTP implements two flavours of the point-estimate
solution.  First, it implements the original method from (Zhang et al. 2013)
where all within-species processes are modelled with a single exponential
distribution. mPTP uses a dynamic programming implementation which estimates
the ML delimitation faster and more accurately than the original PTP. The
dynamic programming implementation has similar properties as (Gulek et al.
2010).  See the [wiki](https://github.com/Pas-Kapli/mptp/wiki) for more
information. The second method assumes a distinct exponential distribution for
the branching events of each of the delimited species allowing it to fit to a
wider range of empirical datasets.

**MCMC method** mPTP generates support values for each clades. They represent
the ratio of the number of samples for which a particular node was in the
between-species process, to the total number of samples. 

## Compilation instructions

**Cloning the repo** Clone the repo and build the executable and the documentation using
the following commands.

```bash
git clone https://github.com/Pas-Kapli/mptp.git
cd mptp 
./autogen.sh
./configure
make
make install  # as root, or run sudo make install
```

You will need [GNU Bison](http://www.gnu.org/software/bison/) and
[Flex](http://flex.sourceforge.net/) installed on your system.  When using the
cloned repository version, you will also need
[autoconf](https://www.gnu.org/software/autoconf/autoconf.html) and
[automake](https://www.gnu.org/software/automake/) installed. Optionally, you
will need the [GNU Scientific Library](http://www.gnu.org/software/gsl/) for
the likelihood ratio test. If it is not available on your system, ratio test
will be disabled.

On a Debian-based Linux system, the four packages can be installed
using the command

```bash
sudo apt-get install libgsl0-dev flex bison autotools-dev autoconf
```

Optionally, you can install the bash auto-completion for mptp. To do that,
replace the `./configure` step above with
```bash
./configure --with-bash-completions=DIR
```
where `DIR` is the directory where bash autocompletion is stored. You can use
`pkg-config` as follows:
```bash
./configure --with-bash-completions=`pkg-config --variable=completionsdir bash-completion`
```

**Source distribution** To download the source distribution from a
[release](https://github.com/Pas-Kapli/mptp/releases) and build the executable
and the documentation, use the following commands:

```bash
wget https://github.com/Pas-Kapli/mptp/releases/download/v0.2.4/mptp-src-0.2.4.tar.gz
tar zxvf mptp-src-0.2.4.tar.gz
cd mptp-src-0.2.4
./configure
make
make install  # as root, or run sudo make install
```

Note that, similarly to cloning the repository, you will need [GNU
Bison](http://www.gnu.org/software/bison/) and
[Flex](http://flex.sourceforge.net/) installed on your system, and optionally,
the [GNU Scientific Library](http://www.gnu.org/software/gsl/).  However, you
do not need [autoconf](https://www.gnu.org/software/autoconf/autoconf.html) and
[automake](https://www.gnu.org/software/automake/) installed (note the missing `./autogen`).
See also the notes for installing the bash auto-completition, as described in
the *Cloning the repo* section.


**Binary distribution** Starting with version 0.2.0, binary distribution files
(.tar.gz) for GNU/Linux on x86-64 containing pre-compiled binaries as well as
the documentation (man and pdf files) will be made available as part of each
[release](https://github.com/Pas-Kapli/mptp/releases). The included executables
currently are not compiled with [`libgsl`](http://www.gnu.org/software/gsl/)
support. This means, Likelihood Ratio Test (LRT) is disabled for the
single-rate PTP model. However, we intend to implement dynamic loading for
`libgsl` and therefore this issue will disappear in the next releases. Until then, please
consider compiling from source in order to enable `libgsl`.

To use the pre-compiled binary, download the appropriate executable for your
system using the following commands if you are using a Linux system:

```bash
wget https://github.com/Pas-Kapli/mptp/releases/download/v0.2.4/mptp-0.2.4-linux-x86_64.tar.gz
tar zxvf mptp-0.2.4-linux-x86_64.tar.gz
```

You will now have the binary distribution in a folder called
`mptp-0.2.4-linux-x86_64` in which you will find three subfolders `bin`, `man`
and `doc`. We recommend making a copy or a symbolic link to the mptp binary
`bin/mptp` in a folder included in your `$PATH`, and a copy or a symbolic link
to the mptp man page `man/mptp.1` in a folder included in your `$MANPATH`. The
PDF version of the manual is available in `doc/mptp_manual.pdf`.

## Implementation details and method description

Please see the manuscript for details:

Kapli T, Lutteropp S, Zhang J, Kobert K, Pavlidis P, Stamatakis A, Flouri T. (2016) Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo. Bioinformatics 33(11):1630-1638. doi:[10.1093/bioinformatics/btx025](https://doi.org/10.1093/bioinformatics/btx025)




## Command-line options

General options:

* `--help`
* `--version`
* `--quiet`
* `--tree_show`
* `--multi`
* `--single`
* `--ml`
* `--mcmc INT`
* `--mcmc_sample INT`
* `--mcmc_log`
* `--mcmc_burnin INT`
* `--mcmc_startnull`
* `--mcmc_startrandom`
* `--mcmc_startml`
* `--mcmc_credible REAL`
* `--mcmc_runs INT`
* `--outgroup TAXA`
* `--outgroup_crop`
* `--minbr REAL`
* `--minbr_auto FILENAME`
* `--pvalue REAL`
* `--precision INT`

Input and output options:

* `--tree_file FILENAME`
* `--output_file FILENAME`

Visualization options:

* `--svg_width INT`
* `--svg_fontsize INT`
* `--svg_tipspacing INT`
* `--svg_legend_ratio <0..1>`
* `--svg_nolegend`
* `--svg_marginleft INT`
* `--svg_marginright INT`
* `--svg_margintop INT`
* `--svg_marginbottom INT`
* `--svg_inner_radius INT`

## Usage example

```bash
mptp --ml --multi --tree_file testTree --output_file out --outgroup A,C --tree_show
mptp --mcmc 50000000 --multi --mcmc_sample 1000000 --mcmc_burnin 1000000 --tree_file tree.newick --output_file out
```

## Documentation

If `mptp` was installed according to the [Compilation
instructions](https://github.com/Pas-Kapli/mptp#compilation-instructions) you
can access the man pages by:

```bash
man mptp
```

A comprehensive documentation is also available in the [wiki](https://github.com/Pas-Kapli/mptp/wiki).

## License and third party licenses

The code is currently licensed under the [GNU Affero General Public License version 3](http://www.gnu.org/licenses/agpl-3.0.en.html).

## Code

|     File            | Description                                                                       |
| --------------------| --------------------------------------------------------------------------------- |
| **arch.c**          | Architecture specific code (Mac/Linux).                                           |
| **auto.c**          | Code for auto-detecting minimum branch length.                                    |
| **aic.c**           | Code for Bayesian Single- and multi-rate PTP.                                     |
| **mptp.c**          | Main file handling command-line parameters and executing corresponding parts.     |
| **mptp.h**          | MPTP Header file.                                                                 |
| **dp.c**            | Single- and multi-rate DP heuristics for solving the PTP problem.                 |
| **fasta.c**         | Code for reading FASTA files.                                                     |
| **lex_rtree.l**     | Lexical analyzer parsing newick rooted trees.                                     |
| **lex_utree.l**     | Lexical analyzer parsing newick unrooted trees.                                   |
| **likelihood.c**    | Likelihood rated functions.                                                       |
| **Makefile.am**     | Automake file for generating Makefile.in.                                         |
| **maps.c**          | Character mapping arrays for converting sequences to the internal representation. |
| **multirun.c**      | Functions to execute multiple MCMC runs and compute ASD of support values.        |
| **output.c**        | Output related files.                                                             |
| **parse_rtree.y**   | Functions for parsing rooted trees in newick format.                              |
| **parse_utree.y**   | Functions for parsing unrooted trees in newick format.                            |
| **random.c**        | Functions for creating a random delimitation.                                     |
| **rtree.c**         | Rooted tree manipulation functions.                                               |
| **svg.c**           | SVG visualization of delimited tree.                                              |
| **svg_landscape.c** | SVG visualization of likelihood landscape.                                        |
| **util.c**          | Various common utility functions.                                                 |
| **utree.c**         | Unrooted tree manipulation functions.                                             |

## The team

* Paschalia Kapli
* Sarah Lutteropp
* Kassian Kobert
* Pavlos Pavlides
* Jiajie Zhang
* Alexandros Stamatakis
* Tom&aacute;&scaron; Flouri

## Citing mPTP

Please cite the following publication if you use mPTP:

Kapli T, Lutteropp S, Zhang J, Kobert K, Pavlidis P, Stamatakis A, Flouri T. (2016) Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo. Bioinformatics 33(11):1630-1638. doi:[10.1093/bioinformatics/btx025](https://doi.org/10.1093/bioinformatics/btx025)

# References

* Zhang J., Kapli P., Pavlidis P., Stamatakis A. (2013)
**A general species delimitation method with applications to phylogenetic placements.**
*Bioinformatics*, 29(22):2869-2876.
doi:[10.1093/bioinformatics/btt499](http://dx.doi.org/10.1093/bioinformatics/btt499)

* Nguyen XV, Epps J., Bailey J. (2010)
**Information Theoretic Measures for Clustering Comparison: Variants, Properties, Normalization and Correction for Chance.**
*Journal of Machine Learning Research*, 11:2837-2854.
[PDF](http://www.jmlr.org/papers/volume11/vinh10a/vinh10a.pdf)

* Gulek M., Toroslu IH. (2010)
**A dynamic programming algorithm for tree-like weighted set packing problem.**
*Information Sciences*, 180(20):3974-3979.
doi:[10.1016/j.ins.2010.06.035](http://dx.doi.org/10.1016/j.ins.2010.06.035)

* Powell JR. (2012)
**Accounting for uncertainty in species delineation during the analysis of environmental DNA sequence data.**
*Methods in Ecology and Evolution*, 3(1):1-11.
doi:[10.1111/j.2041-210X.2011.00122.x](http://dx.doi.org/10.1111/j.2041-210X.2011.00122.x)