File: testing_multi_loci.rst

package info (click to toggle)
python-cogent 1.4.1-1.2
  • links: PTS, VCS
  • area: non-free
  • in suites: squeeze
  • size: 13,260 kB
  • ctags: 20,087
  • sloc: python: 116,163; ansic: 732; makefile: 74; sh: 9
file content (98 lines) | stat: -rw-r--r-- 3,380 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Likelihood analysis of multiple loci
====================================

.. sectionauthor:: Gavin Huttley

We want to know whether an exchangeability parameter is different between alignments. We will specify a null model, under which each alignment get's it's own motif probabilities and all alignments share branch lengths and the exchangeability parameter kappa (the transition / transversion ratio). We'll split the example alignment into two-pieces.

.. doctest::

    >>> from cogent import LoadSeqs, LoadTree, LoadTable
    >>> from cogent.evolve.models import HKY85
    >>> from cogent.recalculation.scope import EACH, ALL
    >>> from cogent.maths.stats import chisqprob
    >>> aln = LoadSeqs("data/long_testseqs.fasta")
    >>> half = len(aln)/2
    >>> aln1 = aln[:half]
    >>> aln2 = aln[half:]

We provide names for those alignments, then construct the tree, model instances.

.. doctest::

    >>> loci_names = ["1st-half", "2nd-half"]
    >>> loci = [aln1, aln2]
    >>> tree = LoadTree(tip_names=aln.getSeqNames())
    >>> mod = HKY85()

To make a likelihood function with multiple alignments we provide the list of loci names. We can then specify a parameter (other than length) to be the same across the loci (using the imported ``ALL``) or different for each locus (using ``EACH``). We conduct a LR test as before.

.. doctest::

    >>> lf = mod.makeLikelihoodFunction(tree,loci=loci_names,digits=2,space=3)
    >>> lf.setParamRule("length", is_independent=False)
    >>> lf.setParamRule('kappa', loci = ALL)
    >>> lf.setAlignment(loci)
    >>> lf.optimise(local=True, show_progress=False)
    >>> print lf
    Likelihood Function Table
    =========================
       locus   motif   mprobs
    -------------------------
    1st-half       T     0.22
    1st-half       C     0.18
    1st-half       A     0.38
    1st-half       G     0.21
    2nd-half       T     0.24
    2nd-half       C     0.19
    2nd-half       A     0.35
    2nd-half       G     0.22
    -------------------------
    ==============
    kappa   length
    --------------
     3.98     0.13
    --------------
    >>> all_lnL = lf.getLogLikelihood()
    >>> all_nfp = lf.getNumFreeParams()
    >>> lf.setParamRule('kappa', loci = EACH)
    >>> lf.optimise(local=True, show_progress=False)
    >>> print lf
    Likelihood Function Table
    ================
       locus   kappa
    ----------------
    1st-half    4.33
    2nd-half    3.74
    ----------------
    =========================
       locus   motif   mprobs
    -------------------------
    1st-half       T     0.22
    1st-half       C     0.18
    1st-half       A     0.38
    1st-half       G     0.21
    2nd-half       T     0.24
    2nd-half       C     0.19
    2nd-half       A     0.35
    2nd-half       G     0.22
    -------------------------
    ======
    length
    ------
      0.13
    ------
    >>> each_lnL = lf.getLogLikelihood()
    >>> each_nfp = lf.getNumFreeParams()
    >>> LR = 2 * (each_lnL - all_lnL)
    >>> df = each_nfp - all_nfp

Just to pretty up the result display, I'll print a table consisting of the test statistics created on the fly.

    >>> print LoadTable(header=['LR', 'df', 'p'],
    ...             rows=[[LR, df, chisqprob(LR, df)]], digits=2, space=3)
    ================
      LR   df      p
    ----------------
    1.59    1   0.21
    ----------------