statistics-script
author Shantanu <shantanu@fossee.in>
Sun, 11 Apr 2010 03:07:55 +0530
changeset 45 9d61db7bf2f4
parent 41 513e6a26d618
child 46 34df59770550
permissions -rw-r--r--
Minor edits.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     1
Hello friends and welcome to the third tutorial in the series of tutorials on "Python for scientific computing."
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     2
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     3
In the previous tutorial we learnt how to read data from a file and plot it.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     4
We used 'for' loops and lists to get data in the desired format.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     5
IPython -Pylab also provides a function called 'loadtxt' that can get us the same data in the desired format without much hustle.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     6
45
9d61db7bf2f4 Minor edits.
Shantanu <shantanu@fossee.in>
parents: 41
diff changeset
     7
We shall use the same pendulum.txt file that we used in the previous session.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     8
We know that, pendulum.txt contains two columns, with length being first and time period is second column, so to get both columns in two separate variables we type
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     9
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    10
l, t = loadtxt('pendulum.txt', unpack=True)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    11
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    12
(unpack = True) will give us all of first column(length) in l and second column(time) in t
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    13
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    14
to know more about loadtxt type 
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    15
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    16
loadtxt?
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    17
This is a really powerful tool to load data directly from files which are well structured and formatted. It supports many features like getting selected columns only, or skipping rows. 
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    18
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    19
Getting back to the problem, now to get squared values of t we can simply do
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    20
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    21
tsq = t*t
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    22
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    23
Note we dont have to use the 'for' loop anymore. This is the benefit of arrays. If we try to do the something similar using lists we won't be able to escape the 'for' loop.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    24
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    25
Let's now plot l vs tsq just as we did in the previous session
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    26
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    27
plot(l, tsq, 'o')
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    28
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    29
The basic equation for finding Time period of simple pendulum is:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    30
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    31
T = 2*pi*sqrt(L/g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    32
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    33
In this case we have the values of t and l already, so to find g value for each element we can simply use:
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    34
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    35
g = 4*pi^2*L/T^2
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    36
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    37
g is array with 90 elements, we can take the average of all these values to get the acceleration due to gravity('g') by
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    39
print mean(g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    40
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    41
Mean again is provided by pylab module which calculates the average of the given set of values.
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    42
There are other handy statistical functions available, such as median, mode, std(for standard deviation) etc.
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    43
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    44
In this small session we have covered 'better' way of loading data from text files.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    45
Why arrays are a better choice than lists in some cases, and how they are more helpful with mathematical operations.
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    46
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    47
Hope it was useful to you. Thank you!
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    48
-----------------------------------------------------------------------------------------------------------
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    49
In this tutorial we shall learn how to compute statistics using python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    50
We also shall learn how to represent data in the form of pie charts.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    51
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    52
Let us start with the most basic need in statistics, the mean.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    53
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    54
We shall calculate the mean acceleration due to gravity using the same 'pendulum.txt' that we used in the previous session.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    55
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    56
As we know, 'pendulum.txt' contains two values in each line. The first being length of pendulum and second the time period.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    57
To calculate acceleration due to gravity from these values, we shall use the expression T = 2*pi*sqrt(L/g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    58
So re-arranging this equation, we get g = 4*pi**2*L/T**2 .
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    59
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    60
We shall calculate the value of g for each pair of L and t and then calculate mean of all those g values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    61
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    62
## if we do loadtxt and numpy arrays then this part will change
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    63
	First we need something to store each value of g that we are going to compute.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    64
	So we start with initialising an empty list called `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    65
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    66
	Now we read each line from the file 'pendulum.txt' and calculate g value for that pair of L and t and then append the computed g to our `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    67
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    68
	In []: for line in open('pendulum.txt'):
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    69
	  ....     point = line.split()
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    70
	  ....     L = float(point[0])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    71
	  ....     t = float(point[1])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    72
	  ....     g = 4 * pi * pi * L / (t * t)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    73
	  ....     g_list.append(g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    74
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    75
	The first four lines of this code must be trivial. We read the file and store the values. 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    76
	The fifth line where we do g equals to 4 star pi star and so on is the line which calculates g for each pair of L and t values from teh file. The last line simply stores the computed g value. In technical terms appends the computed value to g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    77
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    78
	Let us type this code in and see what g_list contains.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    79
###############################
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    80
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    81
Each value in g_list is the g value computed from a pair of L and t values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    82
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    83
Now we have all the values for g. We must find the mean of these values. That is the sum of all these values divided by the total no.of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    84
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    85
The no.of values can be found using len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    86
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    87
So we are left with the problem of finding the sum.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    88
We shall create a variable and loop over the list and add each g value to that variable.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    89
lets call it total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    90
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    91
In []: total = 0 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    92
In []: for g in g_list:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    93
 ....:     total += g
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    94
 ....:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    95
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    96
So at of this piece of code we will have the sum of all the g values in the variable total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    97
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    98
Now calculating mean of g is as simple as doing total divided by len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    99
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   100
In []: g_mean = total / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   101
In []: print 'Mean: ', g_mean
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   102
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   103
If we observe, we have to write a loop to do very simple thing such as finding sum of a list of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   104
Python has a built-in function called sum to ease things.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   105
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   106
sum takes a list of values and returns the sum of those values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   107
now calculating mean is much simpler.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   108
we don't have to write any for loop.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   109
we can directly use mean = sum(g_list) / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   110
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   111
Still calculating mean needs writing an expression.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   112
What if we had a built-in for calculating mean directly.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   113
We do have and it is available through the pylab library.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   114
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   115
Now the job of calculating mean is just a function away.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   116
Call mean(g_list) directly and it gives you the mean of values in g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   117
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   118
Isn't that sweet. Ya and that is why I use python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   119
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   120