File: mpfittut.html

package info (click to toggle)
mpfit 1.84%2B2016.05.19-1
links: PTS, VCS
area: main
in suites: stretch
size: 784 kB
ctags: 45
sloc: python: 126; makefile: 16; sh: 13
file content (272 lines) | stat: -rw-r--r-- 14,625 bytes
parent folder | download | duplicates (3)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!--
     Copyright (c) 1997-2000 Craig B. Markwardt
     Author:   Craig Markwardt (craigm@cow.physics.wisc.edu)
     Modified: 2007-03-22 11:06:33.
     Generated from ``mpfittut.wml'' via WML 2.0.11 (19-Aug-2006).
               by Craig Markwardt (craigm@cow.physics.wisc.edu)
               on 2007-03-22 11:18:50.

     DO NOT EDIT THIS FILE DIRECTLY! INSTEAD EDIT ``mpfittut.wml''.
-->
<meta charset="utf-8">
<meta name="Copyright" content="1997-2000 Craig B. Markwardt">
<meta name="Author"    content="Craig Markwardt, craigm@cow.physics.wisc.edu">
<meta name="Generator" content="WML 2.0.11 (19-Aug-2006)">
<meta name="Modified"  content="2007-03-22 11:06:33">
<title>Tutorial: 1D Curve Fitting in IDL using MPFITFUN and
MPFITEXPR</title>
</head>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EF" VLINK="#55188A" ALINK="#FF0000" LANG="EN">
<h1>Tutorial: 1D Curve Fitting in IDL using MPFITFUN and
MPFITEXPR</h1>
<br><br><hr align="center" noshade><br>

<fchange>
<P>I have found curve-fitting in IDL to be somewhat of a frustrating
process. There are a number of hoops you have to jump through that
just make data analysis a pain. Furthermore, the IDL supplied curve
fitting routine, called CURVEFIT, is not as robust as I would like. I
have found that I can crash the entire IDL session with some fairly
simple data and models. I have translated MINPACK-1, a very nice curve
fitting package into IDL. The fitting programs are called MPFIT,
MPFITFUN, and MPFITEXPR (and can be downloaded <a
href="http://cow.physics.wisc.edu/~craigm/idl/fitting.html">here</a>). Here
I present a short tutorial on how to use MPFITFUN and MPFITEXPR.</P>
<P>MPFITEXPR is generally the easiest to use interactively at the
command line, while MPFITFUN is most commonly used in programs.</P>
<H2>Collecting your data</H2>
<P>The first step in any analysis process is to collect your data. You
are in charge of that, since only you know the specific details of how
your experiment is run. In general, you will have three sets of numbers:</P>
<UL>
<LI><B>The &quot;X&quot; values</B> - these are the independent variables
of the experiment.</LI>
<LI><B>The &quot;Y&quot; values</B> - these are the &quot;measured&quot;&nbsp;dependent
variables.</LI>
<LI><B>The &quot;Error&quot; values</B> - this is typically the 1-sigma
uncertainty in your measurement.</LI>
</UL>
<P>Of course you will have your own data, but I&nbsp;will provide some
<A HREF="examples/fakedata.sav">sample data </A>(6 kb), in the form of an
IDL &quot;SAVE&quot; file. It contains three variables, which you can
imagine might represent the rate measured by a
detector:&nbsp;<TTT>t,</TTT> <TTT>r</TTT>, and <TTT>rerr</TTT>,
corresponding to a <I>time</I>, <I>rate,</I> and <I>error </I>in the
rate. The error is simply the Poisson statistical error. My example
below will use these variables.</P>
<P>Here is a plot of the data with errors:</P>
<P><IMG SRC="images/xdata.gif" ALT="Plot of data and errors"></p>
<H3>What to do if you don't have error bars</H3>
<P> A proper experimenter should always assign error bars to their
data. After all, a data point with larger errors should be weighted
less in the fit, compared to a point with small error bars. However,
I can foresee that under circumstances which only you can judge, the
error bars may not be relevant. In that case, you may set the "Error"
term to unity and proceed with the fit. Be aware that your data may
not be properly weighted, and the error estimates produced by
MPFITFUN/EXPR will not be correct. <A
HREF="#refbevington">Bevington</A> (Ch. 6.4) has an approach that
allows you to assign error bars once you know the best-fit
sum-of-squares. This number is returned through the BESTNORM
keyword.</P>
<H2>Choosing a Model</H2>
<P>By fitting a curve to your data, you are assuming that a particular
<I>model</I> best represents the data. This again is up to you because
of course, only you can assign an interpretation to your own data. Interpretations
aside however, we can try to see how well a particular model fits by, well,
just fitting it!</P>
<P>In this example, it is pretty clear that there is a fairly constant
level of about 1000, with a &quot;hump&quot; near 2.5. I speculate
that a <B>constant plus Gaussian</B> will fit that curve quite
nicely. [ I in fact generated the data using that model. ]</P>
<P>How do you&nbsp;construct a model that IDL will understand? When
you use MPFITEXPR, you only need to supply an IDL <I>expression</I>
which computes the model. Here's how I would do the constant plus
Gaussian model:</P>
<pre>
IDL&gt; expr = 'P[0] +&nbsp;GAUSS1(X, P[1:3])'
</pre>
<P>The variable <TTT>expr </TTT>now contains an IDL expression which
takes a constant value &quot;<TTT>P[0]</TTT>&quot; and adds a Gaussian
&quot;<TTT>GAUSS1(X, P[1:3])</TTT>&quot;. The GAUSS1 function is a one
dimensional Gaussian curve, whose source code can be <a
href="#download">downloaded</a>.</p>
<P>There are a few important things to notice here. First, <B>the name
of the independent variable is always &quot;X&quot;</B>, no matter what
it is called in your session. When MPFITEXPR executes your expression,
it substitutes the correct independent variable for &quot;X&quot; in the
expression. Second, <B>all of the parameters are stored in a single array
variable called &quot;P&quot;</B>. Again, you are free to name the parameter
array anything you like in your own session, but in the <I>expression</I> it
must appear as &quot;P&quot;.</P>
<p>When you use MPFITFUN instead, you need to construct an IDL
function which does the same thing as the expression above. You
should deposit the following function definition into a text file
called mygauss.pro:</p>
<pre>
FUNCTION MYGAUSS, X, P
  RETURN, P[0] + GAUSS1(X, P[1:3])
END
</pre>
<p>and compile with:</p>
<pre>
IDL&gt; .comp mygauss
</pre>
<P>You will need to decide for yourself how to arrange your parameter
values. In my example, I decided that parameter 0, the first
parameter, would be the constant value, while parameters 1 through 3
would be the parameters of the Gaussian (the three parameters to
<TTT>GAUSS1</TTT> are, in order: mean, sigma, and area under curve). If
two parts of the expression require the same parameter value, then
just type it in that way! This is a very elegant way to share
parameter values between several different model components.</P>
<H2>Choosing Starting Values</H2>
<P>You need to at least give MPFITFUN/EXPR a starting point in the
parameter space. A rough guess is fine for most problems. I can enter
my guess into the IDL session like this:</P>
<pre>
IDL&gt; start = [950.D, 2.5, 1., 1000.]
</pre>
<P>Those four numbers mean that the constant value will start at 950,
and the Gaussian will start with a mean of 2.5, a sigma of 1, and an
area of 1000. Since the data is double precision, I force the starting
values to be double as well (or else MPFIT will complain). It is the
fitting program's job to iterate until it finds the best solution it
can.</P>
<P>Choosing the starting values can be somewhat of an art. For some
particularly nasty problems with deep local minima, the proper choice
of the starting parameters may mean the difference between converging
to the global minimum or a local one. Again, only you can make this
judgment.</P>
<H2>Fitting the Curve</H2>
<P>Finally we can fit the curve using MPFITEXPR or MPFITFUN on the
command line:</P>
<pre>
IDL&gt; result = MPFITEXPR(expr, t, r, rerr, start) <i>or</i>
IDL&gt; result = MPFITFUN('MYGAUSS', t, r, rerr, start)
</pre>
<P>This will tell MPFITEXPR or MPFITFUN to fit the
<I>time/rate/error</I> data using the model specified in the
expression <TTT>expr</TTT> and starting at <TTT>start</TTT>. The routine
will print diagnostic messages showing its progress, and finally it
should converge to an answer. When it is done, we can print the
results:</P>
<pre>
IDL&gt; print, result
  997.61864 2.1550703 1.4488421 3040.2411
</pre>
<P>which means that the best-fit constant level is 997, the mean of the
&quot;hump&quot; is 2.15 with a width of 1.45, and the area under the hump
is 3040. That's all there is too it!</P>
<H3>Verifying the Fit</H3>
<P>As a final step in the fitting process, we can make a plot of the
data and overlay a fitted model:</P>
<pre>
IDL&gt; ploterr, t, r, rerr
IDL&gt; oplot, t, result(0)+gauss1(t, result(1:3)), color=50, thick=5
</pre>
<P>In the <TTT>oplot</TTT> command above, I substituted the proper names
for the independent variable and the parameter array. The
<TTT>color</TTT> and <TTT>thick</TTT> keywords make the fitted curve stand
out a little better. The results are excellent:</P>
<P><IMG SRC="images/xfit.gif" ALT="Plot of fitted curve and data"></P>
<H2>Advanced Topics: Constraining Parameters</H2>
<H3>Fixing Parameters</H3> <P>Now let's say that I have learned that
the constant level should be fixed at 1000 exactly. I need to redo the
analysis, and &quot;freeze&quot; the constant to 1000. One way to do
that would be to rewrite the expression, and hard-code the value of
1000. Another more natural way to achieve the same thing is to
&quot;fix&quot; the value to 1000 within the logic of MPFIT
itself. All of the MPFIT functions understand a special keyword called
PARINFO which allows you to do this.</P>
<P>You pass an array of structures through the PARINFO keyword, one
structure for each parameter. The structure describes which parameters
should be fixed, and also whether any constraints should be imposed on
the parameter (such as lower or upper bounds). The structures must
have a few required fields. You can do this by replicating a single
one like this:</P>
<pre>
IDL&gt; pi = replicate({fixed:0, limited:[0,0], limits:[0.D,0.D]},4)
</pre>
<P>A total of <B>four</B> structures are made because there are four
parameters. Once we have the blank template, then we can fill in any
values we desire. For example, we want to fix the first parameter, the
constant:</P>
<pre>
IDL&gt; pi(0).fixed = 1
IDL&gt; start(0) = 1000
</pre>
<P>I have reset the starting value to 1000 (the desired value), and
&quot;<B>fixed&quot; that parameter by setting it to one</B>. If
<TTT>fixed</TTT> is zero for a particular parameter, then it is allowed
to vary. Now we run the fit again, but pass <TTT>pi</TTT> to the fitter
using the PARINFO keyword:</P>
<pre>
IDL&gt; result = MPFITEXPR(expr, t, r, rerr, start, PARINFO=pi)
IDL&gt; result = MPFITFUN('MYGAUSS', t, r, rerr, start, PARINFO=pi)
</pre>
<P>You interpret the results the same way as before. It should be
clear that the first parameter remained fixed at 1000 rather than
varying to 997.</P>
<H3>Specifying Constraining Bounds</H3>
<P>All of the fitting procedures here also allow you to impose lower
and upper bounding constraints on any combination of the parameters
you choose. This might be important, say, if you need to require a
certain parameter to be positive, or be constrained between two fixed
values. The technique again uses the PARINFO keyword. You see above
that in addition to the <TTT>fixed </TTT>entry, there are some others,
including <TTT>limited </TTT>and <TTT>limits</TTT>. They work in a
similar fashion to <TTT>fixed</TTT>.</P>
<P>For example, let us say we know a priori that the Gaussian mean
must be <I>above</I> a value of 2.3. I need to fill that information
into the PARINFO structure like this:</P>
<pre>
IDL&gt; pi(1).limited(0) = 1
IDL&gt; pi(1).limits(0) = 2.3
</pre>
<P>Here, for parameter number 1, I have set <TTT>limited(0)</TTT> equal
to 1. The <TTT>limited </TTT>entry has <I>two</I> values corresponding
to the lower and upper boundaries, respectively. If
<TTT>limited(0)</TTT> is set to 1, then the lower boundary is
activated. The boundary itself is found in <TTT>limits(0)</TTT>, where I
entered the value of 2.3. The same logic applies to the upper limits
(which for each parameter are specified in <TTT>limited(1)</TTT> and
<TTT>limits(1)</TTT>). You can have any combination of lower and upper
limits for each parameter. Just make sure that you set <I>both</I> the
<TTT>limited </TTT>and <TTT>limits </TTT>entries: one enables the bound,
and the other gives the actual boundary value.</P>
<H2>Concluding Remarks</H2>
<P>Well, those are the basics of fitting with MPFITEXPR and
MPFITFUN. You will need some practice before you can feel comfortable,
which is true for anything new. I have documented the usage of each
function in the header of the program file. If you need to find more
information about the techniques I used above, you may find it
there. If you are concerned about error analysis, then you will want
to check the parameters called PERROR, COVAR and BESTNORM, which
return the 1-sigma parameter errors, covariance matrix and the
best-fit chi squared value.</P>
<H3>References</H3>
<P><A NAME="refbevington">
Bevington, P. R. and Robinson, D. K. 1992, <EM>Data Reduction and
Error Analysis for the Physical Sciences</EM>, 2nd Ed., McGraw-Hill,
Inc.</a></P>
<H3><a name="download">Files</a></H3>
<table width="100%" cellspacing="0" cellpadding="2" border="0" summary="">
<tr bgcolor="#c0c0ff"><td width="90" align="right">Nov 24 2006</td><td width="50" align="right">112 kb&nbsp;</td><td width="150"><a href="/usr/share/gnudatalanguage/mpfit/mpfit.pro"><font face="Courier">mpfit.pro</font></a></td><td width="150">REQUIRED&nbsp;&nbsp;</td></tr>
<tr><td width="90" align="right">Nov 24 2006</td><td width="50" align="right">28 kb&nbsp;</td><td width="150"><a href="/usr/share/gnudatalanguage/mpfit/mpfitfun.pro"><font face="Courier">mpfitfun.pro</font></a></td><td width="150">Recommended&nbsp;&nbsp;</td></tr>
<tr bgcolor="#c0c0ff"><td width="90" align="right">Nov 24 2006</td><td width="50" align="right">30 kb&nbsp;</td><td width="150"><a href="/usr/share/gnudatalanguage/mpfit/mpfitexpr.pro"><font face="Courier">mpfitexpr.pro</font></a></td><td width="150">&nbsp;&nbsp;</td></tr>
<tr><td width="90" align="right">Aug 03 1998</td><td width="50" align="right">6 kb&nbsp;</td><td width="150"><a href="examples/fakedata.sav"><font face="Courier">fakedata.sav</font></a></td><td width="150">Sample Data&nbsp;&nbsp;</td></tr>
<tr bgcolor="#c0c0ff"><td width="90" align="right">Oct 13 2001</td><td width="50" align="right">2 kb&nbsp;</td><td width="150"><a href="examples/gauss1.pro"><font face="Courier">gauss1.pro</font></a></td><td width="150">Gaussian Model&nbsp;&nbsp;</td></tr>
</table>
</fchange>

<br><hr align="center" noshade>
<font size="-2"><i>Copyright &copy; 1997-2001 Craig B. Markwardt<br>
Last Modified on 2007-03-22 11:18:50 by Craig Markwardt<br>
</i></font><br>
</body>
</html>