1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370
|
<html>
<head>
<title>File Formats</title>
</head>
<body>
<h1>File Formats</h1>
<p>
<b>XML</b><br>
Cain stores models, methods, simulation output, and random number state
in an XML format. See the <a href="Xml.htm">Cain XML File Format</a>
section for the specification.
</p>
<p>
<b>SBML</b><br>
Cain can import and export SBML models. However, it has limited ability
to parse kinetic laws; complicated expressions may not parsed. In this case
you have to enter the propensity function in the Reaction Editor.
If the SBML model has reversible reactions, they will each be split into
two irreversible reactions. (The stochastic simulation algorithms only work
for irreversible reactions.) You will need to correct the propensity
functions. Also, only mass action
kinetic laws can be exported to SBML. Other kinetic laws are omitted.
</p>
<p>
<b>Input for solvers.</b><br>
For batch processing, you can export a text file for input to one of the
solvers. The solver inputs describe the model, the simulation method,
the random number state, and number of trajectories to generate.
The different categories of solvers require slightly different inputs.
However, the input for each of the solvers starts with the following:
</p>
<pre><should print information>
<number of species>
<number of reactions>
<list of initial amounts>
<packed reactions>
<list of propensity factors>
<number of species to record>
<list of species to record>
<number of reactions to record>
<list of reactions to record>
<maximum allowed steps>
<number of solver parameters>
<list of solver parameters>
<starting time></pre>
<p>
To make the text processing easier and to make the files easier to read,
each term in brackets occupies a single line. Note the following about
the input fields:
<ul>
<li>
The first line indicates whether the solvers should print information
about the method in the first line of its output. When running jobs
interactively the value of this field is 0 and the first line of
output is blank. When generating batch jobs, the value of this field is 1.
Then the solver writes a Python dictionary whose elements describe
the method on the first of output. This dictionary allows Cain to perform
consistency checks when importing simulation output. It may also be useful
for users in identifying what model and method was in a simulation.
<li>
Each reaction in the list of
packed reactions is defined by its reactants, products, and
dependencies. The dependencies are the species indices on which the
propensity function depends.
The format for a reaction with <em>R</em> reactants, <em>P</em>
products and <em>D</em> dependencies is:
<pre><number of reactants> <index1>
<stoichiometry1> ... <indexR> <stoichiometryR>
<number of products> <index1> <stoichiometry1>
... <indexP> <stoichiometryP>
<number of dependencies> <dependency1> ... <dependencyD></pre>
An empty set of reactants, products, or dependencies is indicated
with a single zero.
<li>
A value of zero indicates there is no limit on the maximum allowed steps.
(More precisely, the limit is
<tt>std::numeric_limits<std::size_t>::max()</tt>.)
</ul>
</p>
<p>
Below are a couple examples of packed reactions (here we assume
mass-action kinetic laws):
<ul>
<li>
0 → X: 0 1 0 1 0<br>
<li>
X → 0 : 1 0 1 0 1 0<br>
<li>
X → Y : 1 0 1 1 1 1 1 0<br>
<li>
X → 2 X : 1 0 1 1 0 2 1 0<br>
<li>
X → Y, Y → X, Y → Z : 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1
1 1 1 1 2 1 1 1
</ul>
</p>
<p>
Next comes solver-specific data.
There are four kinds of data that the various solvers generate:
<ul>
<li> <b>Time series data recorded at specified frames (points in time).</b>
One specifies the frames with the following:
<pre><number of frames>
<list of frame times></pre>
<li> <b>Time series data in which every reaction event is recorded.</b>
For this one specifies the equilibration time (the amount of time to run the
simulation prior to recording) and the recording time.
<pre><equilibration time>
<recording time></pre>
<li> <b>Histograms that record the state at specified frames.</b>
These are used to study the transient behavior of a system. In this case one
specifies the frames as well as the number of bins in the histograms.
<pre><number of frames>
<list of frame times>
<number of bins in histograms>
<histogram multiplicity></pre>
<li> <b>Histograms that record the time-averaged species populations.</b>
These are used to study the steady state behavior of a system. Here one
specifies the equilibration time, the recording time, and the number of bins
in the histograms.
<pre><equilibration time>
<recording time>
<number of bins in histograms>
<histogram multiplicity></pre>
</ul>
</p>
<p>
Finally, one specifies the initial state of the Mersenne twister and the
number of trajectories to generate.
</p>
<pre><list of MT 19937 state>
for each task:
<number of trajectories></pre>
<p>
The state of the Mersenne Twister 19937 is a list of 624, 32-bit unsigned
integers followed by an array index that specifies the current position
in the list. Thus the state is defined with 625 integers. When a solver is
run in batch mode, the total number of trajectories is given on a single line.
When Cain is driving the solvers, it repeatedly directs the solver to generate
a small number of trajectories. In this way the GUI application can track the
progress of the simulation and also manage multiple solver processes.
</p>
<p>
In Cain the solvers are grouped into five categories. Below we consider
the specifics of the input data for each one.
<ul>
<li> <b>Time Series, Uniform</b><br>
These solvers generate time series data recorded at specified frames.
spaced time series data, one specifies the times at which to record
the state. Like all of the other stochastic methods, these solvers
use the Mersenne Twister state to initialize the random number
generator. The final input field is the number of trajectories to
generate.
<ul>
<li> <b>Direct, Next Reaction, or First Reaction</b><br>
These exact methods do not have any solver parameters.
<li> <b>Tau-Leaping or Hybrid Direct/Tau-Leaping</b><br>
These approximate methods have one solver parameter, either an allowed
error or a step size.
</ul>
<li> <b>Time Series, All Reactions</b><br>
The direct method is used when recording every reaction event.
There are no solver parameters.
<li> <b>Time Series, Deterministic</b><br>
These solvers use ODE integration to generate time series data recorded at
specified frames. Note that these deterministic solvers do not use random
numbers. The input file has a blank line instead of the Mersenne Twister
state.
<li> <b>Histograms, Transient Behavior</b><br>
These solvers record the state in histograms at specified time frames.
All of the solvers use exact methods; there are no solver parameters.
<li> <b>Histograms, Steady State</b><br>
These solvers record the time-averaged species populations in histograms.
Again, all of the solvers use exact methods so there are no solver parameters.
</ul>
</p>
<p>
Consider the following simple problem with one species and two reactions:
immigration 0 → X and death X → 0. Let the propensity
factors be 1 and 0.1, respectively. Let the initial population
of X be 10. We wish to use the direct method to simulate the process.
We let the system equilibrate for 100 seconds and then record the species
population and reaction counts for 20 seconds. We set the number of frames
to 11. Enter this model in Cain, set the number of trajectories to
2, and export it as a batch job with
the file name <tt>input.txt</tt>. To do this,
click the disk icon <img src="filesave.png"> in the Launcher
panel. Below is the resulting data file (with most of the Mersenne Twister
state omitted.)
</p>
<pre>1
1
2
10
0 1 0 1 1 0 1 0
1.0 0.10000000000000001
1
0
2
0 1
0
0
0.0
11
100.0 102.0 104.0 106.0 108.0 110.0 112.0 114.0 116.0 118.0 120.0
1499117434 2949980591 ... 4162027047 3277342478 449
2</pre>
<p>
<b>Solver output.</b><br>
The different categories of solvers produce different output. Each
produce an information line that is either a blank line or a Python
dictionary that contains information about the method. If present, the
dictionary may be used to check consistency. This check is unecessary
when Cain is running simulations interactively, but is useful when
importing the results of batch jobs. When doing the latter the user
selects a model and method and then specifies files than contain
simulation output. By using the dictionary Cain can check that the
specified files actually correspond to the selected model and method.
</p>
<p>
Each format is specified below.
<ul>
<li> <b>Time series data recorded at specified frames.</b><br>
Times series data is reported for each of the tasks that the solver
was given. (A task is defined by the
number of trajectories to generate.) The Mersenne Twister state at the
beginning of each trajectory is reported. This means that each trajectory
is reproducible. In the case of strange behavior or an error, one could
try to diagnose the issue. If the simulation of the trajectory is successful,
a blank line is written. The following two lines list
the species populations and reaction counts. If not
an error message is printed. At the end of each task
(set of trajectories) the Mersenne twister state is printed. This state
can then be used for the initial state of subsequent simulations.
Note that for deterministic solvers the Mersenne twister state is not
reported. Instead a blank line is written.
<pre><dictionary of information>
for each task:
<number of trajectories>
for each trajectory:
<list of initial MT 19937 state>
if successful:
<blank line>
<list of populations>
<list of reaction counts>
else:
<error message>
<list of final MT 19937 state></pre>
<li> <b>Time series data in which every reaction event is recorded.</b><br>
Note that recording the species populations at each reaction event
would be wasteful. Instead we record only the index of the reaction
and the time of the reaction. This is much more efficient than
recording the <em>N</em> species populations and <em>M</em> reaction
counts. The drawback of this approach is that that one must then use
the list of reaction indices and times to compute the state.
For each trajectory one lists the initial
populations (because the equilibration time may be nonzero), the
list of reaction indices, and the list of reaction times.
<pre><dictionary of information>
for each task:
<number of trajectories>
for each trajectory:
<list of initial MT 19937 state>
if successful:
<blank line>
<list of initial amounts>
<list of reaction indices>
<list of reaction times>
else:
<error message>
<list of final MT 19937 state></pre>
<li> <b>Histograms that record the state at specified frames.</b><br>
The output first lists the
number of trajectories in each task. These are written so that Cain
can interact with the solver. Note that the trajectories from
all tasks are combined to form a single set of trajectories. This
means that if the simulation of any trajectory fails then the set of
tasks fail. (This is a good thing. If the simulation of a trajectory
were to fail then the statistics collected over only the
successful simulations would be incorrect.) Along with the histograms,
statistical information about the populations are recorded.
Specifically, the cardinality, sum of the weights, mean and variance
are recorded. (Actually, the summed
second centered moment Σ(<em>x</em> - μ)<sup>2</sup> is reported.
This quantity may be used to compute the variance.)
Note that
histograms are recorded in two parts. This lets one estimate the
error in the solution. By computing the distance between the two
halves one can get an indication of the distance between the
combined result and a converged solution.
<pre>for each task:
<number of trajectories in task>
<dictionary of information>
if successful:
<blank line>
<total number of trajectories>
<histogram multiplicity>
for each frame:
for each recorded species:
<cardinality>
<sumOfWeights>
<mean>
<summed second centered moment>
<lower bound>
<bin width>
for each histogram:
<list of weighted probabilities>
else:
<error message>
<list of final MT 19937 state></pre>
<li> <b>Histograms that record the time-averaged species populations.</b>
<pre>for each task:
<number of trajectories in task>
<dictionary of information>
if successful:
<blank line>
<total number of trajectories>
<histogram multiplicity>
for each recorded species:
<cardinality>
<sumOfWeights>
<mean>
<summed second centered moment>
<lower bound>
<bin width>
for each histogram:
<list of weighted probabilities>
else:
<error message>
<list of final MT 19937 state></pre>
</ul>
</p>
<p>
We can use the input file that we exported above to generate two trajectories
with the direct method.
<pre>./solvers/HomogeneousDirect2DSearch.exe <input.txt >output.txt</pre>
The contents of the output file are shown below. The first line is blank.
Again most of the Mersenne Twister state is omitted.
<pre>
2
1499117434 2949980591 ... 4162027047 3277342478 449
15 13 10 10 11 8 7 8 9 10 9
102 97 105 102 106 106 108 108 112 111 113 115 115 118 116 118 119 120 121 121 122 123
78190480 1101697099 ... 623132007 3059824252 322
9 11 10 9 13 15 11 9 14 12 9
93 94 97 96 97 97 99 100 104 101 108 103 109 108 109 110 114 110 116 114 117 118
3440222609 2495225278 ... 3271950001 217431822 171
</pre>
</p>
</body>
</html>
|