statistics-script
author Madhusudan.C.S <madhusudancs@gmail.com>
Mon, 13 Sep 2010 18:35:56 +0530
changeset 127 76fd286276f7
parent 46 34df59770550
permissions -rw-r--r--
Added tag vWO-ID for changeset 2eac725a5766

Hello friends and welcome to the third tutorial in the series of tutorials on "Python for scientific computing."

This session is a continuation of the tutorial on Plotting Experimental data.

We shall look at plotting experimental data using slightly advanced methods here. And then look into some statistical operations.

In the previous tutorial we learnt how to read data from a file and plot it.
We used 'for' loops and lists to get data in the desired format.
IPython -Pylab also provides a function called 'loadtxt' that can get us the same data in the desired format without much hustle.

We shall use the same pendulum.txt file that we used in the previous session.
We know that, pendulum.txt contains two columns, with length being first and time period is second column, so to get both columns in two separate variables we type

l, t = loadtxt('pendulum.txt', unpack=True)

(unpack = True) will give us all the data in the first column which is the length in l and all the data in the second column which is the time period in t. Here both l and t are arrays. We shall look into what arrays are in subsequent tutorials.

to know more about loadtxt type 

loadtxt?
This is a really powerful tool to load data directly from files which are well structured and formatted. It supports many features like getting selected columns only, or skipping rows. 

Let's back to the problem, hit q to exit. Now to get squared values of t we can simply do

tsq = t*t

Note that we don't have to use the 'for' loop anymore. This is the benefit of arrays. If we try to do the something similar using lists we won't be able to escape the use of the 'for' loop.

Let's now plot l vs tsq just as we did in the previous session

plot(l, tsq, 'o')

Let's continue with the pendulum expt to obtain the value of the acceleration due to gravity. The basic equation for finding Time period of simple pendulum is:

T = 2*pi*sqrt(L/g)

rearranging this equation we obtain the value of as
g = 4 pi squared into l by t squared.

In this case we have the values of t and l already, so to find g value for each element we can simply use:

g = 4*pi^2*L/T^2

g here is array, we can take the average of all these values to get the acceleration due to gravity('g') by

print mean(g)

Mean again is provided by pylab module which calculates the average of the given set of values.
There are other handy statistical functions available, such as median, mode, std(for standard deviation) etc.

In this small session we have covered 'better' way of loading data from text files.
Why arrays are a better choice than lists in some cases, and how they are more helpful with mathematical operations.

Hope it was useful to you. Thank you!
-----------------------------------------------------------------------------------------------------------
In this tutorial we shall learn how to compute statistics using python.
We also shall learn how to represent data in the form of pie charts.

Let us start with the most basic need in statistics, the mean.

We shall calculate the mean acceleration due to gravity using the same 'pendulum.txt' that we used in the previous session.

As we know, 'pendulum.txt' contains two values in each line. The first being length of pendulum and second the time period.
To calculate acceleration due to gravity from these values, we shall use the expression T = 2*pi*sqrt(L/g)
So re-arranging this equation, we get g = 4*pi**2*L/T**2 .

We shall calculate the value of g for each pair of L and t and then calculate mean of all those g values.

## if we do loadtxt and numpy arrays then this part will change
	First we need something to store each value of g that we are going to compute.
	So we start with initialising an empty list called `g_list'.

	Now we read each line from the file 'pendulum.txt' and calculate g value for that pair of L and t and then append the computed g to our `g_list'.

	In []: for line in open('pendulum.txt'):
	  ....     point = line.split()
	  ....     L = float(point[0])
	  ....     t = float(point[1])
	  ....     g = 4 * pi * pi * L / (t * t)
	  ....     g_list.append(g)

	The first four lines of this code must be trivial. We read the file and store the values. 
	The fifth line where we do g equals to 4 star pi star and so on is the line which calculates g for each pair of L and t values from teh file. The last line simply stores the computed g value. In technical terms appends the computed value to g_list.

	Let us type this code in and see what g_list contains.
###############################

Each value in g_list is the g value computed from a pair of L and t values.

Now we have all the values for g. We must find the mean of these values. That is the sum of all these values divided by the total no.of values.

The no.of values can be found using len(g_list)

So we are left with the problem of finding the sum.
We shall create a variable and loop over the list and add each g value to that variable.
lets call it total.

In []: total = 0 
In []: for g in g_list:
 ....:     total += g
 ....:

So at of this piece of code we will have the sum of all the g values in the variable total.

Now calculating mean of g is as simple as doing total divided by len(g_list)

In []: g_mean = total / len(g_list)
In []: print 'Mean: ', g_mean

If we observe, we have to write a loop to do very simple thing such as finding sum of a list of values.
Python has a built-in function called sum to ease things.

sum takes a list of values and returns the sum of those values.
now calculating mean is much simpler.
we don't have to write any for loop.
we can directly use mean = sum(g_list) / len(g_list)

Still calculating mean needs writing an expression.
What if we had a built-in for calculating mean directly.
We do have and it is available through the pylab library.

Now the job of calculating mean is just a function away.
Call mean(g_list) directly and it gives you the mean of values in g_list.

Isn't that sweet. Ya and that is why I use python.