File: ancestral.rst

package info (click to toggle)
python-treetime 0.11.4-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 21,316 kB
  • sloc: python: 8,437; sh: 124; makefile: 49
file content (80 lines) | stat: -rw-r--r-- 3,022 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

Ancestral sequence reconstruction using TreeTime
------------------------------------------------

At the core of TreeTime is a class that models how sequences change along the tree.
This class allows to reconstruct likely sequences of internal nodes of the tree.
On the command-line, ancestral reconstruction can be done via the command

.. code-block:: bash

   treetime ancestral --aln data/h3n2_na/h3n2_na_20.fasta --tree data/h3n2_na/h3n2_na_20.nwk --outdir ancestral_results

This command will save a number of files into the directory `ancestral_results` and generate the output

.. code-block:: bash

   Inferred GTR model:
   Substitution rate (mu): 1.0

   Equilibrium frequencies (pi_i):
     A: 0.2983
     C: 0.1986
     G: 0.2353
     T: 0.2579
     -: 0.01

   Symmetrized rates from j->i (W_ij):
     A C G T -
     A 0 0.8273  2.8038  0.4525  1.031
     C 0.8273  0 0.5688  2.8435  1.0561
     G 2.8038  0.5688  0 0.6088  1.0462
     T 0.4525  2.8435  0.6088  0 1.0418
     - 1.031 1.0561  1.0462  1.0418  0

   Actual rates from j->i (Q_ij):
     A C G T -
     A 0 0.2468  0.8363  0.135 0.3075
     C 0.1643  0 0.1129  0.5646  0.2097
     G 0.6597  0.1338  0 0.1432  0.2462
     T 0.1167  0.7332  0.157 0 0.2686
     - 0.0103  0.0106  0.0105  0.0104  0

   --- alignment including ancestral nodes saved as
      ancestral_results/ancestral_sequences.fasta

   --- tree saved in nexus format as
      ancestral_results/annotated_tree.nexus

TreeTime has inferred a GTR model and used it to reconstruct the most likely ancestral sequences.
The reconstructed sequences will be written to a file ending in ``_ancestral.fasta`` and a tree with mutations mapped to branches will be saved in nexus format in a file ending on ``_mutation.nexus``.
Mutations are added as comments to the nexus file like ``[&mutations="G27A,A58G,A745G,G787A,C1155T,G1247A,G1272A"]``.

Amino acid sequences
^^^^^^^^^^^^^^^^^^^^

Ancestral reconstruction of amino acid sequences works analogously to nucleotide sequences.
However, the user has to either explicitly choose an amino acid substitution model (JTT92)

.. code-block:: bash

   treetime ancestral --tree data/h3n2_na/h3n2_na_20.nwk  --aln data/h3n2_na/h3n2_na_20_aa.fasta --gtr JTT92

or specify that this is a protein sequence alignment using the flag ``--aa``\ :

.. code-block:: bash

   treetime ancestral --tree data/h3n2_na/h3n2_na_20.nwk  --aln data/h3n2_na/h3n2_na_20_aa.fasta --aa

VCF files as input
^^^^^^^^^^^^^^^^^^

In addition to standard fasta files, TreeTime can ingest sequence data in form of vcf files which is common for bacterial data sets where short reads are mapped against a reference and only variable sites are reported.
In this case, an additional argument specifying the mapping reference is required.

.. code-block:: bash

   treetime ancestral --aln data/tb/lee_2015.vcf.gz --vcf-reference data/tb/tb_ref.fasta --tree data/tb/lee_2015.nwk

The ancestral reconstruction is saved as a vcf files with the name ``ancestral_sequences.vcf``.