Fitting

Nonlinear optimization

This is the core. We have a set of observations (data points), and we want to fit a model (sum of functions), that depends on adjustable parameters, to observations. Let me quote Numerical Recipes, chapter 15.0, page 656 (if you do not know the book, visit http://www.nr.com ):

The basic approach in all cases is usually the same: You choose or design a figure-of-merit function (merit function, for short) that measures the agreement between the data and the model with a particular choice of parameters. The merit function is conventionally arranged so that small values represent close agreement. The parameters of the model are then adjusted to achieve a minimum in the merit function, yielding best-fit parameters. The adjustment process is thus a problem in minimization in many dimensions. [...] however, there exist special, more efficient, methods that are specific to modeling, and we will discuss these in this chapter. There are important issues that go beyond the mere finding of best-fit parameters. Data are generally not exact. They are subject to measurement errors (called noise in the context of signal-processing). Thus, typical data never exactly fit the model that is being used, even when that model is correct. We need the means to assess whether or not the model is appropriate, that is, we need to test the goodness-of-fit against some useful statistical standard. We usually also need to know the accuracy with which parameters are determined by the data set. In other words, we need to know the likely errors of the best-fit parameters. Finally, it is not uncommon in fitting data to discover that the merit function is not unimodal, with a single minimum. In some cases, we may be interested in global rather than local questions. Not, "how good is this fit?" but rather, "how sure am I that there is not a very much better fit in some corner of parameter space?"

Our function of merit is WSSR - weighted sum of squared residuals, called also chi-square:


         chi2 
         = sumi=1N
         [(yi - y(xi;a))
         /sigmai]2
         = sumi=1N
         wi
         [yi - y(xi;a)]
         2

Weights can be are based on standard deviations, wi=1/sigma^2. You can learn why squares of residuals are minimized eg. from chapter 15.1 of Numerical Recipes. So we are looking for a global minimum of chi2. This large field of numerical research - looking for minimum or maximum - is usually called optimization; it is non-linear and global optimization. Fityk implements three very different optimization methods. All are well-known and described in many book.

Fitting related commands

You can switch between fitting methods using command:

f.method [f]

where f is one of following lower case letters: g - Genetic Algorithm, m - Levenberg-Marquardt gradient method, s - simplex method. If f is not specified, current fitting method is displayed.

Note

f.method affects the rest of f.xxx commands.

To run fit, use commands

f.run [n]

and

f.continue [n]

The first initializes and runs fitting algorithm, and the second starts fitting without the initialization. n is the maximum number of iterations. All non-linear fitting methods are iterative, and there are two common stopping criteria. The first is the number of iterations. If it is not specified with command, value of option default-max-iterations is used. And the second is the number of evaluations of objective function (WSSR), specified by value of option max-wssr-evaluations. It is approximately proportional to time of computations, because most of time in fitting process is taken by evaluating WSSR. There are also other criteria, different for each method. Note that values of all options, even those common for all methods, are set separately for every method.

If you give too small n to f.run command, and fit is stopped because of it, not because of convergence, it makes sense to use f.continue command to process further iterations. [TODO: how to stop fit interactively]

When fit is running, each iteration outputs some informations. If they are scrolling too fast, you can reduce it n times by assigning n to output-one-of option.

Setting o.set autoplot = on-fit-iteration will draw a plot after every iteration, to visualize progress. (see autoplot)

Command [1] :

f.info [[*] | [**]]

can be used to display WSSR, symmetric errors and variance-covariance matrix.

Available methods can be mixed together, eg. it is sensible to obtain initial parameter estimates using simplex method, and than fit it using Levenberg-Marquard method. Command s.history can be useful for trying various methods with different options and/or initial parameters and choosing the best solution.

Some fitting methods are using random number generator. In some situations one may want to have repeatable and predictable results of fitting, eg. to make a presentation. Seed for a new sequence of pseudo-random numbers is set at the beginning of fitting initialization (when f.run is called). If value of pseudo-random-seed option is set to -1, the seed is based on system time and sequence of pseudo-random numbers is every time different. In other case, if pseudo-random-seed option has a non-negative value, this value is used as a seed. Remember, that like all other options, value of pseudo-random-seed is independent for each fitting method.

Levenberg-Marquardt

It is a standard of nonlinear least-squares routines. It involves computing first derivatives of functions. For description of L-M method see Numerical Recipes, chapter 15.5 or Siegmund Brandt Data Analysis, chapter 10.15 (there is a Polish translation of the second). In a few words: it combines inverse-Hessian method (called Gauss-Newton method?) with steepest descent method by introducing lambda factor. When lambda is equal 0, the method is equivalent to inverse-Hessian method. When lambda increases, the shift vector is rotated toward the direction of steepest descent and the length of the shift vector decreases. (The shift vector is a vector that is added to the parameter vector.) If the better fit is found in iteration, lambda is decreased - it is divided by value of lambda-down-factor option (default: 10). Otherwise, lambda is multiplied by value of lambda-up-factor (default: 10). You can also change a value of lambda-starting-value option.

Marquardt method has one stopping criterium apart from common stopping criteria. If it happens two times in sequence, that relative progress in value of objective function (WSSR) is smaller then value of stop-rel-change option, fit is considered to be converged and is stopped.

L-M method finds a minimum quickly. The question is, if it is the global minimum. It can be a good idea to add a small random vector to the vector of parameters and try again. This small shift vector is added, when value of shake-before option is positive (by default it is 0). Value of every parameter's shift is independent and randomly drawn from distribution of type specified by value of shake-type option (see option distrib-type) in simplex method). The expected value of parameter shift is directly proportional to both value of shake-before option and width of parameter's domain.

Nelder-Mead downhill simplex method

This time I am quoting chapter 4.8.3, p. 86 of Peter Gans Data Fitting in the Chemical Sciences by the Method of Least Squares

A simplex is a geometrical entity that has n+1 vertices corresponding to variations in n parameters. For two parameters the simplex is a triangle, for three parameters the simplex is a tetrahedron and so forth. The value of the objective function is calculated at each of the vertices. An iteration consists of the following process. Locate the vertex with the highest value of the objective function and replace this vertex by one lying on the line between it and the centroid of the other vertices. Four possible replacements can be considered, which I call contraction, short reflection, reflection and expansion.[...]

It starts with an arbitrary simplex. Neither the shape nor position of this are critically important, except insofar as it may determine which one of a set of multiple minima will be reached. The simplex than expands and contracts as required in order to locate a valley if one exists. Then the size and shape of the simplex is adjusted so that progress may be made towards the minimum. Note particularly that if a pair of parameters are highly correlated, both will be simultaneously adjusted in about the correct proportion, as the shape of the simplex is adapted to the local contours.[...]

Unfortunately it does not provide estimates of the parameter errors, etc. It is therefore to be recommended as a method for obtaining initial parameter estimates that can be used in the standard least squares method.

This method is also described in previously mentioned Numerical Recipes (chapter 10.4) and Data Analysis (chapter 10.8).

After changing current fitting method to Nelder-Mead simplex, what can be done, as you already know, either using menu or using command f.method s you have a few options specific to this method, that can be changed. One of those is a stopping criterium min-fract-range. If value of expression 2(M-m)/(M+m), where M and m are values of worst and best vertices respectively (values of objective functions of vertices, to be precise), is smaller then value of min-fract-range option, fitting is stopped. In other words, it is stopped if all vertices are at almost the same level.

The rest of options is related to initialization of simplex. Before starting iterations, we have to chose set of points in space of parameters, called vertices. Unless option move-all is set, one of these points will be the current point - values that parameters have at this moment. All but this one are drawn as follows: each parameter of each vertex is drawn separately. It is drawn from distribution, that has center in center of domain of parameter, and width proportional to both width of domain and value of move-multiplier parameter. Distribution type can be set using option distrib-type as one of: uniform, Gaussian, Lorentzian and bound. The last one causes value of parameter to be either greatest or smallest value in domain of parameter - one of two bounds of domain (assuming that move-multiplier is equal 1).

Genetic Algorithms

[TODO]



[1] Syntax of this command will be changed, other error estimates or measures of goodness-of-fit will be added.