# Solvers

In this section we enumerate and briefly describe each of Cain's solvers. See the Simulation Methods section for more details and derivations. In the method editor the solvers are grouped according to four heirarchical categories: time dependence, simulation output, algorithm, and option. One choose a solver by selecting items from each of these in turn.

## Time Dependence

The first category differentiates between time homogeneous and time inhomogeneous systems. In a time homogenous problem the reaction propensities do not have explicit time dependence. Time inhomogenous problems typically arise when the volume or temperature have a prescribed time dependence. This implies that the reaction rates will explicitly depend on time. Cain has many solvers for time homogeneous problems, but few for time inhomogenous ones. While conceptually very similar, one uses different algorithms for the two cases. Thus in choosing a solver one first selects the time dependence category.

## Simulation Output

Next one selects the desired simulation output. The various kinds of output are described in the Method Editor section. To recap, one can either collect time series or histogram data. For time series data one can either record the state at uniformly spaced time frames or one can record every reaction event. Although Cain is primarily for stochastic simulations, it does offer a few simple deterministic solvers. These generate time series data and are grouped in a separate category. When the species populations are recorded in histograms, the mean and standard deviation are also recorded. This is useful because unless the bin width is unity one can only approximately calculate these statistics from the histogram data. For histogram output one can either study transient or steady state behavior. For the former one records the state at uniformly spaced frames in time. For the latter one averages the populations over a specified time interval. Thus, in summary there are five simulation output categories:

• Time Series, Uniform. Here one takes snapshots of the state of uniformly spaced frames in time. One can use either exact methods like the direct method, first reaction method, and next reaction method, or approximate methods such as tau-leaping and hybrid direct/tau-leaping. Recently there has been much research in optimized exact methods. (See reference [Mauch 2011] for an analysis.) Thus we offer many different formulations of the direct and next reaction methods. You may experiment to see how the various formulations perform. However, the default formulation has good performance for most problems.
• Time Series, All Reactions. Recording every reaction event is useful when one wants a visual representation of the amount of noise in a system. By contrast, when one records the state at frames one does not get a picture of the magnitude of the random fluctuations. Note that recording every reaction is expensive, and plotting the results is even more so. For this output option we use the 2-D search formulation of the direct method.
• Time Series, Deterministic. Deterministic systems are modeled with a set of ordinary differential equations. Cain uses numerical integration to approximately solve the system. The state is recorded at uniformly spaced time frames.
• Histograms, Transient Behavior. If one records the state at time frames one can of course plot the trajectories. One can also analyze the species populations at these frames. One can compute the mean and standard deviation of the population or one can plot the distribution of populations in a histogram. Obtaining accurate statistics or distribution data usually requires generating many trajectories. In this case, the state information may require a lot of storage capacity. It is more economical to dynamically update population statistics and a histograms of the species populations as the trajectories are being generated. For this output category populations statistics and histogram data is stored at uniformly spaced frames in time. To generate the trajectories we use the 2-D search formulation of the direct method.
• Histograms, Steady State. If one records every reaction event, one can analyze the steady state behavior of a system. One could take the time average of the species populations to determine the mean and standard deviation of the steady state populations. One could also record the time weighted populations in a histogram to determine the distribution of steady state populations. However, storing every reaction event typically requires a lot of storage. It is more efficient to dynamically update statistics and histogram data as the trajectories are generated. For this output option the time averaged statistics and histogram data is collected. Again we use the 2-D search formulation of the direct method to generate the trajectories.

## Algorithms

The third category in selecting a solvers is the simulation algorithm. There are algorithms for exact and approximate stochastic simulation. Below is a list with some implementation details.

• Direct. Gillespie's direct method [Gillespie 1976] [Gillespie 1977]. A reaction dependency graph [Gibson 2000] is used to minimize the cost of recomputing propensities. Unless otherwise indicated, the sum of the propensities and associated quantities are dynamically updated. At each step the time increment is determined by drawing an exponential deviate. The mean of the distribution is the sum of the propensities. The reaction is chosen by generating a discrete deviate whose weighted probability mass function is the set of reaction propensities. There are many options for generating the discrete deviate. The 2-D search formulation is the default method.
• Next Reaction. Gibson and Bruck's next reaction method [Gibson 2000] stores reaction times in a priority queue. There are several options for implementing this data structure.
• First Reaction. Gillespie's first reaction method [Gillespie 1976] [Gillespie 1977].
• Tau-Leaping. We implement explicit tau-leaping with the step size selection presented in reference [Cao 2006a].
• Hybrid Direct/Tau-Leaping. This hybrid algorithm uses the direct method for the volatile (low population) species and tau-leaping for the rest.
• ODE, Integrate Reactions. For ODE integration we use Runge-Kutta schemes.

## Options

Below is a list of the available solvers grouped according to the heirarchical categories. Each of the solvers use sparse arrays for the state change vectors [Li 2006]. The methods that require exponential deviates use the ziggurat method [Marsaglia 2000].

• ### Time Homogeneous

• #### Time Series, Uniform

• ##### Direct
The computational complexities for generating discrete deviates and for modifying propensities are indicated for each method in terms of the number of reactions M.
• 2-D search - Generate O(M1/2). Modify O(1). The propensities are stored in a 2-D table that stores the sum of each row. The number of rows (and hence the number of columns) is O(M1/2). The discrete deviate is generated by first determining the appropriate row with a linear search and then performing a linear search within that row [Mauch 2011].
• 2-D search, sorted - Generate O(M1/2). Modify O(1). The propensities are periodically ordered so the larger values have smaller Manhattan distance from the lower corner of the table than smaller values [Mauch 2011]. For some problems this may reduce the costs of searching.
• 2-D search, bubble sort - Generate O(M1/2). Modify O(1). The propensities are dynamically ordered each time they are modified [Mauch 2011].
• Composition rejection - Generate O(1). Modify O(1). Propensities are placed in groups. In each group the propensities differ by at most a factor of two. A deviate is generated by a linear search on the groups followed by selecting a reaction within a group with the method of rejection. This solver implements a slightly modified version of the composition rejection method presented in [Slepoy 2008].
• Binary search, full CMF - Generate O(log M). Modify O(M). The cumulative mass function (CMF) is stored in an array. This allows one to generate a discrete deviate with a binary search on that array. At each time step the CMF is regenerated. This is an implementation of the logarithmic direct method [Li 2006].
• Binary search, sorted CMF - Generate O(log M). Modify O(M). In this solver the reactions are arranged in descending order according to the sum of the propensities of the influencing reactions [Mauch 2011]. This minimizes the portion of the CMF that one has to recompute after firing a reaction and updating the influenced propensities.
• Binary search, recursive CMF - Generate O(log M). Modify O(log M). Instead of storing the full CMF, a partial, recursive CMF is used. This approach is equivalent to the method presented in [Gibson 2000]. A deviate can be generated with a binary search. Modifying a propensity affects at most log2 M elements of the partial, recursive CMF.
• Linear search - Generate O(M). Modify O(1). We use a linear search on the PMF to generate a deviate. The sum of the propensities is dynamically updated.
• Linear search, delayed update - Generate O(M). Modify O(1). This solver recomputes the sum of the propensities at each time step as done in the original formulation of the direct method [Gillespie 1977].
• Linear search, sorted - Generate O(M). Modify O(1). The propensities are periodically sorted in descending order [McCollum 2005].
• Linear search, bubble sort - Generate O(M). Modify O(1). The propensities are dynamically ordered by using swaps each time a propensity is modified [Mauch 2011].
• ##### Next Reaction
Each formulation below uses a different data structure to implement the priority queue. The computational complexities for removing an element and modifying an element are given in terms of the number of reactions M.
• Hashing - Pop O(1). Modify O(1). The indexed priority queue is implemented with a hash table [Cormen 2001, Mauch 2011]. We use hashing with chaining, which means that each bin in the hash table contains a list of elements. The hash function is a linear function of the putative reaction times, truncated to an integer to obtain a bin index. The hash function is dynamically adapted to obtain the target load. (The load is the average number of elements in each bin.)
• Binary heap, pointer - Pop O(log M). Modify O(log M). This solver uses a binary heap that is implemented with an array of pointers to the putative reaction times. This is the data structure used in [Gibson 2000].
• Binary heap, pair - Pop O(log M). Modify O(log M). We use a binary heap implemented with an array of index/time pairs [Mauch 2011]. The pair version of the binary heap typically has better performance than the pointer version on 64-bit architectures. Vice versa for 32-bit architectures.
• Partition - Pop O(M1/2). Modify O(M1/2). A splitting value is used to divide the reactions two categories [Mauch 2011]. The splitting value is chosen so that initially there are O(M1/2) reactions in the lower partition. The remainder are in the upper partition. In determining the minimum putative reaction time, one only has to examine reactions in the lower partition. When the lower partition is empty, a new splitting value is chosen.
• Linear search - Pop O(M). Modify O(1). The putative reaction times are stored in an array. We use a linear search to find the minimum [Mauch 2011].
• ##### First Reaction
Gillespie's first reaction method [Gillespie 1976] [Gillespie 1977].
• Simple - A simple implementation of the method.
• Reaction influence - We use the reaction influence data structure to minimize recomputing the propensities.
• Absolute time - We use absolute times instead of waiting times for each reaction. This allows us to reduce the number of exponential deviates used. After firing a reaction, we only generate new reaction times for the influenced reactions.
• ##### Tau-Leaping
All of the tau-leaping solvers have first order accuracy. They differ in how they calculate the expected propensities, whether they correct negative populations, and how they calculate the time step. For calculating the expected propensities one may use a first, second, or fourth order Runge-Kutta scheme. (The first and second order schemes are commonly called forward and midpoint, respectively.) While all yield a first order stochastic method, using a higher order scheme is typically more efficient. Use the adaptive step size solvers unless you are studying the effect of step size. With a fixed step size it is difficult to predict the accuracy of the simulation. With the adaptive step size solvers you may choose whether or not to correct negative populations. Without correction, some portion of the simulations may fail.
• Runge-Kutta, 4th order
• Midpoint
• Forward
• Runge-Kutta, 4th order, no correction
• Midpoint, no correction
• Forward, no correction
• Runge-Kutta, 4th order, fixed step size
• Midpoint, fixed step size
• Forward, fixed step size
• ##### Hybrid Direct/Tau-Leaping
With these solvers you can choose the scheme for calculating the expected propensities.
• Runge-Kutta, 4th order
• Midpoint
• Forward
• #### Time Series, All Reactions

The direct method with a 2-D search is used when recording each reaction event.
• ##### Direct
• 2-D search
• #### Time Series, Deterministic

When generating a single deterministic trajectory one can either use a built-in solver or export the problem to a Mathematica notebook. The solvers use Runge-Kutta schemes. For ordinary work, use the adaptive step size Cash-Karp variant, which is a fifth order method. The rest of the solvers are only useful in studying how the order and step size affect the solution.
• ##### ODE, Integrate Reactions
• Runge-Kutta, Cash-Karp
• Runge-Kutta, Cash-Karp, fixed step size
• Runge-Kutta, 4th order, fixed step size
• Midpoint, fixed step size
• Forward, fixed step size
• ##### Mathematica
• NDSolve
• #### Histograms, Transient Behavior

• ##### Direct
• 2-D search

By recording the average value of species populations you can determine the steady state solution (if it exists).
• ##### Direct
Each of the following use the direct method with a 2-D search.
• Elapsed time - This is the most efficient method for most problems. The algorithm keeps track of when species change values in order to minimize the cost of recording the populations.
• Time steps - This is the traditional method. The species populations are recorded at every time step.
• All possible states - This is an implementation of the all possible states method [Lipshtat 2007]. It is usually less efficient than the traditional method.