statistics-script
author Shantanu <shantanu@fossee.in>
Sun, 11 Apr 2010 01:56:59 +0530
changeset 38 f248e91b1510
parent 37 c2634d874e33
child 41 513e6a26d618
permissions -rw-r--r--
Added changes to 3.1 session script.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     1
Hello friends and welcome to the third tutorial in the series of tutorials on "Python for scientific computing."
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     2
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     3
In the previous tutorial we learnt how to read data from a file and plot the data
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     4
We used 'for' loops and lists to get data in desired format.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     5
IPython -Pylab also provides with a function 'loadtxt' which can get us data without much hustle.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     6
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     7
We know that, pendulum.txt contains two columns, with length being first and time period is second column, so to get both columns in two separate variables we type
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     8
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     9
l, t = loadtxt('pendulum.txt', unpack=True)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    10
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    11
(unpack = True)? will give us all of first column(length) in l and second column(time) in t
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    12
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    13
to get more help type 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    14
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    15
loadtxt?
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    16
This is really powerful tool to load data directly from files which are well structured and formatted. It supports many features like getting particular columns. 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    17
now to get squared values of t we can simply do
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    18
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    19
tsq = t*t
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    20
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    21
and we dont have to use for loop anymore. This is benefit of arrays. If we try to something similar to lists we cant escape a 'for' loop.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    22
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    23
Now to plot l vs tsq is same as we did in previous session
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    24
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    25
plot(l, tsq, 'o')
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    26
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    27
Additionally the basic equation for finding Time period of simple pendulum we use equation:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    28
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    29
T = 2*pi*sqrt(L/g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    30
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    31
In our case we have t and l already, so to find g value for each element we can simply use:
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    32
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    33
g = 4*pi^2*L/T^2
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    34
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    35
g is array with 90 elements, so we take average of all these values to get acceleration due to gravity('g') by
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    36
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    37
print mean(g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    39
Mean again is provided by pylab module which calculates average of given set of values.
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    40
There are other handy statistical functions available, such as median, mode, std(for standard deviation) etc.
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    41
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    42
So in this small session we have covered 'better' way of loading data from text files.
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    43
Why arrays are better choice then lists in some cases, and how they are helpful during some mathematical operations.
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    44
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    45
Thank you!
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    46
-----------------------------------------------------------------------------------------------------------
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    47
In this tutorial we shall learn how to compute statistics using python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    48
We also shall learn how to represent data in the form of pie charts.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    49
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    50
Let us start with the most basic need in statistics, the mean.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    51
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    52
We shall calculate the mean acceleration due to gravity using the same 'pendulum.txt' that we used in the previous session.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    53
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    54
As we know, 'pendulum.txt' contains two values in each line. The first being length of pendulum and second the time period.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    55
To calculate acceleration due to gravity from these values, we shall use the expression T = 2*pi*sqrt(L/g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    56
So re-arranging this equation, we get g = 4*pi**2*L/T**2 .
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    57
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    58
We shall calculate the value of g for each pair of L and t and then calculate mean of all those g values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    59
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    60
## if we do loadtxt and numpy arrays then this part will change
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    61
	First we need something to store each value of g that we are going to compute.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    62
	So we start with initialising an empty list called `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    63
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    64
	Now we read each line from the file 'pendulum.txt' and calculate g value for that pair of L and t and then append the computed g to our `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    65
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    66
	In []: for line in open('pendulum.txt'):
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    67
	  ....     point = line.split()
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    68
	  ....     L = float(point[0])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    69
	  ....     t = float(point[1])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    70
	  ....     g = 4 * pi * pi * L / (t * t)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    71
	  ....     g_list.append(g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    72
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    73
	The first four lines of this code must be trivial. We read the file and store the values. 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    74
	The fifth line where we do g equals to 4 star pi star and so on is the line which calculates g for each pair of L and t values from teh file. The last line simply stores the computed g value. In technical terms appends the computed value to g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    75
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    76
	Let us type this code in and see what g_list contains.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    77
###############################
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    78
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    79
Each value in g_list is the g value computed from a pair of L and t values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    80
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    81
Now we have all the values for g. We must find the mean of these values. That is the sum of all these values divided by the total no.of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    82
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    83
The no.of values can be found using len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    84
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    85
So we are left with the problem of finding the sum.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    86
We shall create a variable and loop over the list and add each g value to that variable.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    87
lets call it total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    88
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    89
In []: total = 0 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    90
In []: for g in g_list:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    91
 ....:     total += g
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    92
 ....:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    93
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    94
So at of this piece of code we will have the sum of all the g values in the variable total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    95
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    96
Now calculating mean of g is as simple as doing total divided by len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    97
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    98
In []: g_mean = total / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    99
In []: print 'Mean: ', g_mean
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   100
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   101
If we observe, we have to write a loop to do very simple thing such as finding sum of a list of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   102
Python has a built-in function called sum to ease things.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   103
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   104
sum takes a list of values and returns the sum of those values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   105
now calculating mean is much simpler.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   106
we don't have to write any for loop.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   107
we can directly use mean = sum(g_list) / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   108
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   109
Still calculating mean needs writing an expression.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   110
What if we had a built-in for calculating mean directly.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   111
We do have and it is available through the pylab library.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   112
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   113
Now the job of calculating mean is just a function away.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   114
Call mean(g_list) directly and it gives you the mean of values in g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   115
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   116
Isn't that sweet. Ya and that is why I use python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   117
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   118