1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176
|
/* This is an example HY-PHY Batch File.
It reads in two NEXUS nucleotide dataset data/aspectrin1.nuc
and data/aspectrin2.nuc and performs
the relative ratio test on the 4-taxa tree, using F81 model.
Output is printed out as a Newick Style tree with branch lengths
representing the number of expected substitutions per branch (which
is the default setting for nucleotide models w/o rate variation).
Also, the likelihood ratio statistic is evaluated and the P-value
for the test is reported.
Sergei L. Kosakovsky Pond and Spencer V. Muse
December 1999.
*/
/* 1. Read in the data and store the result in DataSet variables.*/
DataSet nucleotideSequence1 = ReadDataFile ("data/aspectrin1.nuc");
DataSet nucleotideSequence2 = ReadDataFile ("data/aspectrin2.nuc");
/* 2. Filter the data, specifying that all of the data is to be used
and that it is to be treated as nucleotides. */
DataSetFilter filteredData1 = CreateFilter (nucleotideSequence1,1);
DataSetFilter filteredData2 = CreateFilter (nucleotideSequence2,1);
/* 3. Collect observed nucleotide frequencies from the filtered data. observedFreqs will
store the vector of frequencies. */
HarvestFrequencies (observedFreqs1, filteredData1, 1, 1, 1);
HarvestFrequencies (observedFreqs2, filteredData2, 1, 1, 1);
/* 4. Define the F81 substitution matrix. '*' is defined to be -(sum of off-diag row elements) */
F81RateMatrix =
{{*,mu,mu,mu}
{mu,*,mu,mu}
{mu,mu,*,mu}
{mu,mu,mu,*}};
/*5. Define the F81 models, by combining the substitution matrix with the vector of observed
(equilibrium) frequencies. We define one for each block, since the equilibrium
frequencies will differ. */
Model F81Block1 = (F81RateMatrix, observedFreqs1);
Model F81Block2 = (F81RateMatrix, observedFreqs2);
/*6. Now we can define 2 4-taxa trees - one for each data block. We use appropriate models for each one.
Note the use of TRY_NUMERIC_SEQUENCE_MATCH constant to instruct HyPhy to map sequence numbers to
tree leaves (by default, HyPhy expects sequence names and leaf names to match).*/
TRY_NUMERIC_SEQUENCE_MATCH = 1;
UseModel (F81Block1);
Tree fourTaxaTree1 = ((1,2),3,4);
UseModel (F81Block2);
Tree fourTaxaTree2 = ((1,2),3,4);
/*7. Since all the likelihood function ingredients (data, tree, equilibrium frequencies)
have been defined we are ready to construct the likelihood function. We
combine both datasets into one likelihood function. */
LikelihoodFunction theLnLik = (filteredData1, fourTaxaTree1, filteredData2, fourTaxaTree2);
/*8. Maximize the likelihood function, storing parameter values in the matrix paramValues.
We also store the resulting ln-lik and the number of model parameters. */
Optimize (paramValues, theLnLik);
unconstrainedLnLik = paramValues[1][0];
paramCount = paramValues[1][1];
/*9. Print the tree with optimal branch lengths to the console. */
fprintf (stdout, "\n0). UNCONSTRAINED MODEL:", theLnLik);
/*10. We now constrain the rate of evolution along each branch to be proportional in both trees.
R will represent the ratio. We use ReplicateConstraint to automatically
attach the same constraint to all branches of the tree. */
global R;
ReplicateConstraint ("this1.?.mu:=R*this2.?.mu", fourTaxaTree2, fourTaxaTree1);
Optimize (paramValues, theLnLik);
/*11. Now we compute the ln-lik ratio statistic and the P-Value, using the Chi^2 dist'n
with an appropriate degree of freedom. */
lnlikDelta = 2 (unconstrainedLnLik-paramValues[1][0]);
pValue = 1-CChi2 (lnlikDelta, paramCount - paramValues[1][1]);
fprintf (stdout, "\n\n1). Relative ratio constraint; the P-value is:", pValue, "\n", theLnLik);
|