1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147

<html>
<head>
<title>Significance Testing</title>
</head>
<body>
<h1>Significance Testing</h1>
<p>
Open the file <tt>examples/cain/ImmigrationDeath.xml</tt>. The
immigrationdeath process has a single species <em>X</em> and two reactions:
immigration and death. The immigration reaction is 0→<em>X</em>
with a propensity factor <em>k<sub>1</sub> = 1</em>.
The death reaction is <em>X</em>→0
and has the propensity factor <em>k<sub>2</sub> = 0.1</em>.
Since both reactions use massaction kinetic
laws, the propensities are <em>1</em> and <em>0.1 X</em>, respectively.
</p>
<p>
The immigrationdeath model is presented in section 6.3.D of
<a href="Bibliography.htm#gillespie1992">Markov Processes</a>.
There it is called the payroll process; hiring and leaving corresponds to
immigration and death. The text shows how to derive the mean and
variance of the population by solving first order differential
equations. These are listed below for a starting time of
<em>t<sub>0</sub></em> and an initial population of <em>X<sub>0</sub></em>.
</p>
<p align="center">
mean(<em>X</em>(<em>t</em>)) = <em>k</em><sub>1</sub>/<em>k</em><sub>2</sub> +
(<em>X</em><sub>0</sub><em>k</em><sub>1</sub>/<em>k</em><sub>2</sub>)
exp(<em>k</em><sub>2</sub>(<em>t</em><em>t</em><sub>0</sub>))
</p>
<p align="center">
var(<em>X</em>(<em>t</em>)) = (<em>k</em><sub>1</sub>/<em>k</em><sub>2</sub>)
(1exp(<em>k</em><sub>2</sub>(<em>t</em><em>t</em><sub>0</sub>)))
(1+(<em>k</em><sub>2</sub><em>X</em><sub>0</sub>/<em>k</em><sub>1</sub>)
exp(<em>k</em><sub>2</sub>(<em>t</em><em>t</em><sub>0</sub>)))
</p>
<p>
The analytical solution for the transient behavior is associated with
the "Ref., Transient" method. The steady state solution
is associated with the "Ref., Steady St." method.
Note that the solutions are listed in the simulation output panel.
Select the
"ImmigrationDeath" model and the "Direct" method
and then generate 10,000 trajectories.
Now we will compare the empirical solution
and the reference solution.
Click the plot button in the simulation output panel and then
go to the "Time
Series" tab in the plot configuration window. Unselect the
"Std Dev" field in the grid by right clicking on the column
header. Set the line style to "dot", and the marker style
to "Circle". Match the face and edge color to the line color
by left clicking on the column headers. (See the
<a href="VisualizationPlottingTimeSeries.htm">Plotting Time Series Data</a>
section for information on configuring plots.)
Change the title to "ImmigrationDeath, Mean".
Then click the "New plot button" to
plot the empirical mean. Then select the "Statistics" tab.
Right click on the "Std Dev" field in the grid to turn off
the plotting of the standard deviation as error bars. Clear
the title and axes labels. Finally, click the plot button to show the
reference solution (plotted with a solid line) along with the
empirical solution. The result is shown below.
</p>
<p align="center">
<img src="VisualizationImmigrationDeathMean.jpg">
</p>
<p>
One may do the same for the standard deviation by selecting the
"Std. Dev." radio button and following the same procedure.
The result is shown below.
</p>
<p align="center">
<img src="VisualizationImmigrationDeathStdDev.jpg">
</p>
<p>
From the plots above we see that the mean and standard deviation of
empirical solution is very close to the analytical solution. While
using the
"eyeball norm" is useful, it is better still to use
statistical tools to analyze the solutions. Specifically, we use
the onesample version of
<a href="http://en.wikipedia.org/wiki/Student's_ttest">Student's
ttest</a> to test the null hypotheses that the empirical means are
equal to the analytical means. (We use the plural because we will
apply the test for each time frame.) The test yields a
<a href="http://en.wikipedia.org/wiki/Pvalue">pvalue</a> that
is the probability of obtaining a test statistics that is at least
as extreme as the observed result, assuming that the null hypothesis
is true. For instance, suppose that the
empirical mean and analytical mean differ by an amount <em>d</em>.
A pvalue of 0.25 would mean that there is a 25% chance that the
empirical mean would differ from the analytical mean by at least
<em>d</em>. Put another way, if you generated many empirical
solutions, using the same number of trajectories in each, you would
expect about a quarter of them to differ by at least <em>d</em> from
the analytical mean. This is, of course, assuming that the null
hypothesis is true and two means are equal. A small pvalue
indicates that the null hypothesis is not true. For example, a pvalue
of 0.001 would mean that there is only a one in a thousand chance of
observing an empirical mean that differs from the analytical mean
by the calculated amount. In this context, this would lead us to
suspect that the simulation method used to generate the empirical
solution is incorrect. Of course, such an extreme difference is still
possible with a correct method. If we were to repeat the experiment
with another empirical solution and obtain another small pvalue, we
would confidently state that the method is incorrect.
</p>
<p>
Click the pvalue button <img src="pValue.png"> in the
simulation output panel to open the pvalue analysis window shown below.
In the left column select the "ImmigrationDeath, Direct"
output. In the right column select the reference solution
"ImmigrationDeath, Ref., Transient". Click the
"Calculate" button to compute the pvalue for all of the
species and all of the frames. The row headers list the frame times.
The column headers list the species.
</p>
<p>
<img src="VisualizationSignificancePValueAnalysis.png">
</p>
<p>
Click the "Plot" button to show a plot of pvalue versus
frame number. We see that the pvalues are consistent with a
correct stochastic simulation method.
</p>
<p>
<img src="VisualizationSignificancePValue.jpg">
</p>
</body>
</html>
