1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
|
/* This is an example HY-PHY Batch File.
It reads in a '#' nucleotide dataset data/hiv.nuc and estimates
maximum ln-likelihood based on the tree contained in the data file,
using the General Reversible model with the parameters
shared by all branches.
Output is printed out as a Newick Style tree with branch lengths
representing the number of expected substitutions per branch (which
is the default setting for nucleotide models w/o rate variation).
Sergei L. Kosakovsky Pond and Spencer V. Muse
December 1999.
*/
/* 1. Read in the data and store the result in a DataSet variable.*/
DataSet nucleotideSequences = ReadDataFile ("data/hiv.nuc");
/* 2. Filter the data, specifying that all of the data is to be used
and that it is to be treated as nucleotides.*/
DataSetFilter filteredData = CreateFilter (nucleotideSequences,1);
/* 3. Collect observed nucleotide frequencies from the filtered data. observedFreqs will
store the vector of frequencies. */
HarvestFrequencies (observedFreqs, filteredData, 1, 1, 1);
/* 4. Define the KHY substitution matrix. '*' is defined to be -(sum of off-diag row elements).
AG,AT,CG,CT,GT are the shared parameters, representing the ratio
of corresponding substitution rates to the AC rate.
t is the "branch length"*/
global AG;
global AT;
global CG;
global CT;
global GT;
REVRateMatrix =
{{*,t,AG*t,AT*t}
{t,*,CG*t,CT*t}
{AG*t,CG*t,*,GT*t}
{AT*t,CT*t,GT*t,*}};
/*5. Define the REV model, by combining the substitution matrix with the vector of observed (equilibrium)
frequencies. */
Model REV = (REVRateMatrix, observedFreqs);
/*6. Now we can define the tree variable, using the tree string read from the data file,
and, by default, assigning the last defined model (REV) to all tree branches. */
Tree givenTree = DATAFILE_TREE;
/*7. Since all the likelihood function ingredients (data, tree, equilibrium frequencies)
have been defined we are ready to construct the likelihood function. */
LikelihoodFunction theLnLik = (filteredData, givenTree);
/*8. Maximize the likelihood function, storing parameter values in the matrix paramValues */
Optimize (paramValues, theLnLik);
/*9. Print the tree with optimal branch lengths to the console. */
fprintf (stdout, theLnLik);
|