statistics-script
author Shantanu <shantanu@fossee.in>
Sun, 11 Apr 2010 01:30:44 +0530
changeset 37 c2634d874e33
child 38 f248e91b1510
permissions -rw-r--r--
Added script for third session first part.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     1
Hello friends and welcome to the third tutorial in the series of tutorials on "Python for scientific computing."
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     2
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     3
In the previous tutorial we learnt how to read data from a file and plot the data
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     4
We used 'for' loops and lists to get data in desired format.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     5
IPython -Pylab also provides with a function 'loadtxt' which can get us data without much hustle.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     6
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     7
We know that, pendulum.txt contains two columns, with length being first and time period is second column, so to get both columns in two separate variables we type
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     8
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     9
l, t = loadtxt('pendulum.txt', unpack=True)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    10
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    11
(unpack = True)? will give us all of first column(length) in l and second column(time) in t
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    12
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    13
to get more help type 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    14
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    15
loadtxt?
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    16
This is really powerful tool to load data directly from files which are well structured and formatted. It supports many features like getting particular columns. 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    17
now to get squared values of t we can simply do
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    18
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    19
tsq = t*t
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    20
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    21
and we dont have to use for loop anymore. This is benefit of arrays. If we try to something similar to lists we cant escape a 'for' loop.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    22
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    23
Now to plot l vs tsq is same as we did in previous session
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    24
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    25
plot(l, tsq, 'o')
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    26
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    27
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    28
In this tutorial we shall learn how to compute statistics using python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    29
We also shall learn how to represent data in the form of pie charts.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    30
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    31
Let us start with the most basic need in statistics, the mean.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    32
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    33
We shall calculate the mean acceleration due to gravity using the same 'pendulum.txt' that we used in the previous session.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    34
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    35
As we know, 'pendulum.txt' contains two values in each line. The first being length of pendulum and second the time period.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    36
To calculate acceleration due to gravity from these values, we shall use the expression T = 2*pi*sqrt(L/g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    37
So re-arranging this equation, we get g = 4*pi**2*L/T**2 .
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    38
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    39
We shall calculate the value of g for each pair of L and t and then calculate mean of all those g values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    40
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    41
## if we do loadtxt and numpy arrays then this part will change
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    42
	First we need something to store each value of g that we are going to compute.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    43
	So we start with initialising an empty list called `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    44
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    45
	Now we read each line from the file 'pendulum.txt' and calculate g value for that pair of L and t and then append the computed g to our `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    46
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    47
	In []: for line in open('pendulum.txt'):
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    48
	  ....     point = line.split()
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    49
	  ....     L = float(point[0])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    50
	  ....     t = float(point[1])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    51
	  ....     g = 4 * pi * pi * L / (t * t)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    52
	  ....     g_list.append(g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    53
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    54
	The first four lines of this code must be trivial. We read the file and store the values. 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    55
	The fifth line where we do g equals to 4 star pi star and so on is the line which calculates g for each pair of L and t values from teh file. The last line simply stores the computed g value. In technical terms appends the computed value to g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    56
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    57
	Let us type this code in and see what g_list contains.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    58
###############################
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    59
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    60
Each value in g_list is the g value computed from a pair of L and t values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    61
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    62
Now we have all the values for g. We must find the mean of these values. That is the sum of all these values divided by the total no.of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    63
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    64
The no.of values can be found using len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    65
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    66
So we are left with the problem of finding the sum.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    67
We shall create a variable and loop over the list and add each g value to that variable.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    68
lets call it total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    69
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    70
In []: total = 0 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    71
In []: for g in g_list:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    72
 ....:     total += g
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    73
 ....:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    74
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    75
So at of this piece of code we will have the sum of all the g values in the variable total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    76
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    77
Now calculating mean of g is as simple as doing total divided by len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    78
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    79
In []: g_mean = total / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    80
In []: print 'Mean: ', g_mean
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    81
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    82
If we observe, we have to write a loop to do very simple thing such as finding sum of a list of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    83
Python has a built-in function called sum to ease things.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    84
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    85
sum takes a list of values and returns the sum of those values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    86
now calculating mean is much simpler.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    87
we don't have to write any for loop.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    88
we can directly use mean = sum(g_list) / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    89
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    90
Still calculating mean needs writing an expression.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    91
What if we had a built-in for calculating mean directly.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    92
We do have and it is available through the pylab library.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    93
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    94
Now the job of calculating mean is just a function away.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    95
Call mean(g_list) directly and it gives you the mean of values in g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    96
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    97
Isn't that sweet. Ya and that is why I use python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    98
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    99