Added dictionary.org file.
* Least Squares Fit
*** Outline
***** Introduction
******* What do we want to do? Why?
********* What's a least square fit?
********* Why is it useful?
******* How are we doing it?
******* Arsenal Required
********* working knowledge of arrays
********* plotting
********* file reading
***** Procedure
******* The equation (for a single point)
******* It's matrix form
******* Getting the required matrices
******* getting the solution
******* plotting
*** Script
Welcome.
In this tutorial we shall look at obtaining the least squares fit
of a given data-set. For this purpose, we shall use the same
pendulum data that we used in the tutorial on plotting from files.
To be able to follow this tutorial comfortably, you should have a
working knowledge of arrays, plotting and file reading.
A least squares fit curve is the curve for which the sum of the
squares of it's distance from the given set of points is
minimum.
Previously, when we plotted the data from pendulum.txt we got a
scatter plot of points as shown.
In our example, we know that the length of the pendulum is
proportional to the square of the time-period. But when we plot
the data using lines we get a distorted line as shown. What
we expect ideally, is something like the redline in this graph.
From the problem we know that L is directly proportional to T^2.
But experimental data invariably contains errors and hence does
not produce an ideal plot. The best fit curve for this data has
to be a linear curve and this can be obtained by performing least
square fit on the data set. We shall use the lstsq function to
obtain the least squares fit curve.
The equation of the line is of the form T^2 = mL+c. We have a set
of values for L and the corresponding T^2 values. Using this, we
wish to obtain the equation of the straight line.
In matrix form the equation is represented as shown,
Tsq = A.p where Tsq is an NX1 matrix, and A is an NX2 matrix as shown.
And p is a 2X1 matrix of the slope and Y-intercept. In order to
obtain the least square fit curve we need to find the matrix p
Let's get started. As you can see, the file pendulum.txt
is on our Desktop and hence we navigate to the Desktop by typing
cd Desktop. Let's now fire up IPython: ipython -pylab
We have already seen (in a previous tutorial), how to read a file
and obtain the data set using loadtxt(). Let's quickly get the required data
from our file.
l, t = loadtxt('pendulum.txt', unpack=True)
loadtxt() directly stores the values in the pendulum.txt into arrays l and t
Let's now calculate the values of square of the time-period.
tsq = t*t
Now we shall obtain A, in the desired form using some simple array
manipulation
A = array([l, ones_like(l)])
As we have seen in a previous tutorial, ones_like() gives an array similar
in shape to the given array, in this case l, with all the elements as 1.
Please note, this is how we create an array from an existing array.
Let's now look at the shape of A.
A.shape
This is an 2X90 matrix. But we need a 90X2 matrix, so we shall transpose it.
A = A.T
Type A, to confirm that we have obtained the desired array.
A
Also note the shape of A.
A.shape
We shall now use the lstsq function, to obtain the coefficients m
and c. lstsq returns a lot of things along with these
coefficients. We may look at the documentation of lstsq, for more
information by typing lstsq?
result = lstsq(A,tsq)
We extract the required coefficients, which is the first element
in the list of things that lstsq returns, and store them into the variable coef.
coef = result[0]
To obtain the plot of the line, we simply use the equation of the
line, we have noted before. T^2 = mL + c.
Tline = coef[0]*l + coef[1]
plot(l, Tline, 'r')
Also, it would be nice to have a plot of the points. So,
plot(l, tsq, 'o')
This brings us to the end of this tutorial. In this tutorial,
you've learnt how to obtain a least squares fit curve for a given
set of points using lstsq. There are other curve fitting functions
available in Pylab such as polyfit.
Hope you enjoyed it. Thanks.
*** Notes