File: Non-Rev.bf

package info (click to toggle)
hyphy 2.3.14%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 30,972 kB
  • sloc: cpp: 122,584; sh: 12,310; python: 3,026; xml: 467; lisp: 341; makefile: 313; ansic: 308; sql: 119
file content (100 lines) | stat: -rw-r--r-- 2,942 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
/* This is an example HY-PHY Batch File.



   It reads in a '#' nucleotide dataset data/hiv.nuc and estimates
   maximum ln-likelihood based on the tree contained in the data file,
   using the general non-reversible 12 parameter (with 3 constraints)
   model. The tree from the data file is unrooted. 

   

   Output is printed out as a Newick Style tree with branch lengths
   representing the number of expected substitutions per branch (which
   is the default setting for nucleotide models w/o rate variation).


   Sergei L. Kosakovsky Pond and Spencer V. Muse 

   November 2002. 

*/

/* 1. Read in the data and store the result in a DataSet variable.*/

DataSet 		nucleotideSequences = ReadDataFile ("data/hiv.nuc");

/* 2. Filter the data, specifying that all of the data is to be used
	  and that it is to be treated as nucleotides.*/
	 
DataSetFilter	filteredData = CreateFilter (nucleotideSequences,1);

/* 3. Collect observed nucleotide frequencies from the filtered data. observedFreqs will
	  store the vector of frequencies. */

HarvestFrequencies (observedFreqs, filteredData, 1, 1, 1);

/* 4. Define the 12 parameter non-reversible substitution matrix. 
The constraints on some of the substitution rates are necessary to
ensure that observed nucleotide frequencies are equilibrium 
frequencies for the model.

'*' is defined to be -(sum of off-diag row elements) */

/* frequency ratios */
	 
r0 = observedFreqs[0]/observedFreqs[3];
r1 = observedFreqs[1]/observedFreqs[3];
r2 = observedFreqs[2]/observedFreqs[3];

/* All the global rate parameters are defined relative to 
the rate for A->C. For instance, CG represents the ratio
of the rates C->G/A->C. */

global AG;
global AT;
global CA;
global CG;
global CT;
global GA;
global GC;
global GT;

/* note that these constraints are 
satisfied if we restrict the model to the
general reversible case */

global TA:=AT+(1-CA)*r1__+(AG-GA)*r2__;
global TC:=CT+(CA-1)*r0__+(CG-GC)*r2__;
global TG:=GT+(GA-AG)*r0__+(GC-CG)*r1__;


NRRateMatrix = 
		{{*,t,t*AG,t*AT}
		 {t*CA,*,t*CG,t*CT}
		 {t*GA,t*GC,*,t*GT}
		 {t*TA,t*TC,t*TG,*}};

/*5.  Define the Non-Rev model, by combining the substitution matrix with the vector of observed (equilibrium)
	  frequencies. */
	  
Model 	NRM = (NRRateMatrix, observedFreqs);

/*6.  Now we can define the tree variable, using the tree string read from the data file,
	  and, by default, assigning the last defined model (NRM) to all tree branches. */

Tree	givenTree = DATAFILE_TREE;

/*7.  Since all the likelihood function ingredients (data, tree, equilibrium frequencies)
	  have been defined we are ready to construct the likelihood function. */

LikelihoodFunction  theLnLik = (filteredData, givenTree);

/*8.  Maximize the likelihood function, storing parameter values in the matrix paramValues */

Optimize (paramValues, theLnLik);

/*9.  Print the tree with optimal branch lengths to the console. */

fprintf  (stdout, theLnLik);