least-squares.org
author asokan <asokan@fossee.in>
Tue, 18 May 2010 15:40:17 +0530
changeset 126 2eac725a5766
parent 81 2eff0ebac2dc
permissions -rw-r--r--
changes to array.txt

* Least Squares Fit
*** Outline
***** Introduction
******* What do we want to do? Why?
********* What's a least square fit?
********* Why is it useful?
******* How are we doing it?
******* Arsenal Required
********* working knowledge of arrays
********* plotting
********* file reading
***** Procedure
******* The equation (for a single point)
******* It's matrix form
******* Getting the required matrices
******* getting the solution
******* plotting
*** Script
    Welcome. 
    
    In this tutorial we shall look at obtaining the least squares fit
    of a given data-set. For this purpose, we shall use the same
    pendulum data that we used in the tutorial on plotting from files.

    To be able to follow this tutorial comfortably, you should have a
    working knowledge of arrays, plotting and file reading. 

    A least squares fit curve is the curve for which the sum of the
    squares of it's distance from the given set of points is
    minimum. 

    Previously, when we plotted the data from pendulum.txt we got a 
    scatter plot of points as shown. 

    In our example, we know that the length of the pendulum is
    proportional to the square of the time-period. But when we plot
    the data using lines we get a distorted line as shown. What
    we expect ideally, is something like the redline in this graph. 
    From the problem we know that L is directly proportional to T^2.
    But experimental data invariably contains errors and hence does
    not produce an ideal plot. The best fit curve for this data has 
    to be a linear curve and this can be obtained by performing least
    square fit on the data set. We shall use the lstsq function to
    obtain the least squares fit curve. 

    The equation of the line is of the form T^2 = mL+c. We have a set
    of values for L and the corresponding T^2 values. Using this, we
    wish to obtain the equation of the straight line. 

    In matrix form the equation is represented as shown, 
    Tsq = A.p where Tsq is an NX1 matrix, and A is an NX2 matrix as shown.
    And p is a 2X1 matrix of the slope and Y-intercept. In order to 
    obtain the least square fit curve we need to find the matrix p

    Let's get started. As you can see, the file pendulum.txt
    is on our Desktop and hence we navigate to the Desktop by typing 
    cd Desktop. Let's now fire up IPython: ipython -pylab

    We have already seen (in a previous tutorial), how to read a file
    and obtain the data set using loadtxt(). Let's quickly get the required data
    from our file. 

    l, t = loadtxt('pendulum.txt', unpack=True)

    loadtxt() directly stores the values in the pendulum.txt into arrays l and t
    Let's now calculate the values of square of the time-period. 

    tsq = t*t

    Now we shall obtain A, in the desired form using some simple array
    manipulation 

    A = array([l, ones_like(l)])

    As we have seen in a previous tutorial, ones_like() gives an array similar
    in shape to the given array, in this case l, with all the elements as 1. 
    Please note, this is how we create an array from an existing array.

    Let's now look at the shape of A. 
    A.shape
    This is an 2X90 matrix. But we need a 90X2 matrix, so we shall transpose it.

    A = A.T
    
    Type A, to confirm that we have obtained the desired array. 
    A
    Also note the shape of A. 
    A.shape

    We shall now use the lstsq function, to obtain the coefficients m
    and c. lstsq returns a lot of things along with these
    coefficients. We may look at the documentation of lstsq, for more
    information by typing lstsq? 
    result = lstsq(A,tsq)

    We extract the required coefficients, which is the first element
    in the list of things that lstsq returns, and store them into the variable coef. 
    coef = result[0]

    To obtain the plot of the line, we simply use the equation of the
    line, we have noted before. T^2 = mL + c. 

    Tline = coef[0]*l + coef[1]
    plot(l, Tline, 'r')

    Also, it would be nice to have a plot of the points. So, 
    plot(l, tsq, 'o')

    This brings us to the end of this tutorial. In this tutorial,
    you've learnt how to obtain a least squares fit curve for a given
    set of points using lstsq. There are other curve fitting functions
    available in Pylab such as polyfit.

    Hope you enjoyed it. Thanks. 

*** Notes