statistics-script
author Shantanu <shantanu@fossee.in>
Wed, 28 Apr 2010 16:33:18 +0530
changeset 115 d35eccbf206d
parent 46 34df59770550
permissions -rw-r--r--
Added dictionary.org file.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     1
Hello friends and welcome to the third tutorial in the series of tutorials on "Python for scientific computing."
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     2
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
     3
This session is a continuation of the tutorial on Plotting Experimental data.
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
     4
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
     5
We shall look at plotting experimental data using slightly advanced methods here. And then look into some statistical operations.
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
     6
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     7
In the previous tutorial we learnt how to read data from a file and plot it.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     8
We used 'for' loops and lists to get data in the desired format.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     9
IPython -Pylab also provides a function called 'loadtxt' that can get us the same data in the desired format without much hustle.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    10
45
9d61db7bf2f4 Minor edits.
Shantanu <shantanu@fossee.in>
parents: 41
diff changeset
    11
We shall use the same pendulum.txt file that we used in the previous session.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    12
We know that, pendulum.txt contains two columns, with length being first and time period is second column, so to get both columns in two separate variables we type
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    13
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    14
l, t = loadtxt('pendulum.txt', unpack=True)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    15
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    16
(unpack = True) will give us all the data in the first column which is the length in l and all the data in the second column which is the time period in t. Here both l and t are arrays. We shall look into what arrays are in subsequent tutorials.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    17
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    18
to know more about loadtxt type 
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    19
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    20
loadtxt?
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    21
This is a really powerful tool to load data directly from files which are well structured and formatted. It supports many features like getting selected columns only, or skipping rows. 
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    22
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    23
Let's back to the problem, hit q to exit. Now to get squared values of t we can simply do
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    24
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    25
tsq = t*t
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    26
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    27
Note that we don't have to use the 'for' loop anymore. This is the benefit of arrays. If we try to do the something similar using lists we won't be able to escape the use of the 'for' loop.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    28
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    29
Let's now plot l vs tsq just as we did in the previous session
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    30
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    31
plot(l, tsq, 'o')
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    32
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    33
Let's continue with the pendulum expt to obtain the value of the acceleration due to gravity. The basic equation for finding Time period of simple pendulum is:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    34
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    35
T = 2*pi*sqrt(L/g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    36
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    37
rearranging this equation we obtain the value of as
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    38
g = 4 pi squared into l by t squared.
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    39
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    40
In this case we have the values of t and l already, so to find g value for each element we can simply use:
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    41
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    42
g = 4*pi^2*L/T^2
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    43
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 45
diff changeset
    44
g here is array, we can take the average of all these values to get the acceleration due to gravity('g') by
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    45
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    46
print mean(g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    47
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    48
Mean again is provided by pylab module which calculates the average of the given set of values.
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    49
There are other handy statistical functions available, such as median, mode, std(for standard deviation) etc.
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    50
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    51
In this small session we have covered 'better' way of loading data from text files.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    52
Why arrays are a better choice than lists in some cases, and how they are more helpful with mathematical operations.
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    53
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    54
Hope it was useful to you. Thank you!
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    55
-----------------------------------------------------------------------------------------------------------
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    56
In this tutorial we shall learn how to compute statistics using python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    57
We also shall learn how to represent data in the form of pie charts.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    58
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    59
Let us start with the most basic need in statistics, the mean.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    60
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    61
We shall calculate the mean acceleration due to gravity using the same 'pendulum.txt' that we used in the previous session.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    62
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    63
As we know, 'pendulum.txt' contains two values in each line. The first being length of pendulum and second the time period.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    64
To calculate acceleration due to gravity from these values, we shall use the expression T = 2*pi*sqrt(L/g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    65
So re-arranging this equation, we get g = 4*pi**2*L/T**2 .
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    66
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    67
We shall calculate the value of g for each pair of L and t and then calculate mean of all those g values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    68
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    69
## if we do loadtxt and numpy arrays then this part will change
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    70
	First we need something to store each value of g that we are going to compute.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    71
	So we start with initialising an empty list called `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    72
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    73
	Now we read each line from the file 'pendulum.txt' and calculate g value for that pair of L and t and then append the computed g to our `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    74
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    75
	In []: for line in open('pendulum.txt'):
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    76
	  ....     point = line.split()
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    77
	  ....     L = float(point[0])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    78
	  ....     t = float(point[1])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    79
	  ....     g = 4 * pi * pi * L / (t * t)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    80
	  ....     g_list.append(g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    81
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    82
	The first four lines of this code must be trivial. We read the file and store the values. 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    83
	The fifth line where we do g equals to 4 star pi star and so on is the line which calculates g for each pair of L and t values from teh file. The last line simply stores the computed g value. In technical terms appends the computed value to g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    84
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    85
	Let us type this code in and see what g_list contains.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    86
###############################
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    87
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    88
Each value in g_list is the g value computed from a pair of L and t values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    89
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    90
Now we have all the values for g. We must find the mean of these values. That is the sum of all these values divided by the total no.of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    91
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    92
The no.of values can be found using len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    93
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    94
So we are left with the problem of finding the sum.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    95
We shall create a variable and loop over the list and add each g value to that variable.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    96
lets call it total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    97
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    98
In []: total = 0 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    99
In []: for g in g_list:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   100
 ....:     total += g
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   101
 ....:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   102
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   103
So at of this piece of code we will have the sum of all the g values in the variable total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   104
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   105
Now calculating mean of g is as simple as doing total divided by len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   106
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   107
In []: g_mean = total / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   108
In []: print 'Mean: ', g_mean
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   109
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   110
If we observe, we have to write a loop to do very simple thing such as finding sum of a list of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   111
Python has a built-in function called sum to ease things.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   112
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   113
sum takes a list of values and returns the sum of those values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   114
now calculating mean is much simpler.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   115
we don't have to write any for loop.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   116
we can directly use mean = sum(g_list) / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   117
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   118
Still calculating mean needs writing an expression.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   119
What if we had a built-in for calculating mean directly.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   120
We do have and it is available through the pylab library.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   121
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   122
Now the job of calculating mean is just a function away.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   123
Call mean(g_list) directly and it gives you the mean of values in g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   124
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   125
Isn't that sweet. Ya and that is why I use python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   126
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   127