statistics-script
author baali@shantanu
Sun, 11 Apr 2010 02:59:46 +0530
changeset 44 31173328496d
parent 41 513e6a26d618
child 45 9d61db7bf2f4
permissions -rw-r--r--
Minor edits.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     1
Hello friends and welcome to the third tutorial in the series of tutorials on "Python for scientific computing."
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     2
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     3
In the previous tutorial we learnt how to read data from a file and plot it.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     4
We used 'for' loops and lists to get data in the desired format.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
     5
IPython -Pylab also provides a function called 'loadtxt' that can get us the same data in the desired format without much hustle.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     6
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     7
We know that, pendulum.txt contains two columns, with length being first and time period is second column, so to get both columns in two separate variables we type
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     8
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
     9
l, t = loadtxt('pendulum.txt', unpack=True)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    10
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    11
(unpack = True) will give us all of first column(length) in l and second column(time) in t
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    12
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    13
to know more about loadtxt type 
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    14
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    15
loadtxt?
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    16
This is a really powerful tool to load data directly from files which are well structured and formatted. It supports many features like getting selected columns only, or skipping rows. 
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    17
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    18
Getting back to the problem, now to get squared values of t we can simply do
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    19
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    20
tsq = t*t
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    21
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    22
Note we dont have to use the 'for' loop anymore. This is the benefit of arrays. If we try to do the something similar using lists we won't be able to escape the 'for' loop.
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    23
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    24
Let's now plot l vs tsq just as we did in the previous session
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    25
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    26
plot(l, tsq, 'o')
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    27
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    28
The basic equation for finding Time period of simple pendulum is:
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    29
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    30
T = 2*pi*sqrt(L/g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    31
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    32
In this case we have the values of t and l already, so to find g value for each element we can simply use:
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    33
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    34
g = 4*pi^2*L/T^2
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    35
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    36
g is array with 90 elements, we can take the average of all these values to get the acceleration due to gravity('g') by
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    37
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    38
print mean(g)
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    39
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    40
Mean again is provided by pylab module which calculates the average of the given set of values.
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    41
There are other handy statistical functions available, such as median, mode, std(for standard deviation) etc.
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    42
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    43
In this small session we have covered 'better' way of loading data from text files.
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    44
Why arrays are a better choice than lists in some cases, and how they are more helpful with mathematical operations.
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    45
41
513e6a26d618 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 38
diff changeset
    46
Hope it was useful to you. Thank you!
38
f248e91b1510 Added changes to 3.1 session script.
Shantanu <shantanu@fossee.in>
parents: 37
diff changeset
    47
-----------------------------------------------------------------------------------------------------------
37
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    48
In this tutorial we shall learn how to compute statistics using python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    49
We also shall learn how to represent data in the form of pie charts.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    50
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    51
Let us start with the most basic need in statistics, the mean.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    52
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    53
We shall calculate the mean acceleration due to gravity using the same 'pendulum.txt' that we used in the previous session.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    54
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    55
As we know, 'pendulum.txt' contains two values in each line. The first being length of pendulum and second the time period.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    56
To calculate acceleration due to gravity from these values, we shall use the expression T = 2*pi*sqrt(L/g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    57
So re-arranging this equation, we get g = 4*pi**2*L/T**2 .
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    58
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    59
We shall calculate the value of g for each pair of L and t and then calculate mean of all those g values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    60
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    61
## if we do loadtxt and numpy arrays then this part will change
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    62
	First we need something to store each value of g that we are going to compute.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    63
	So we start with initialising an empty list called `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    64
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    65
	Now we read each line from the file 'pendulum.txt' and calculate g value for that pair of L and t and then append the computed g to our `g_list'.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    66
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    67
	In []: for line in open('pendulum.txt'):
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    68
	  ....     point = line.split()
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    69
	  ....     L = float(point[0])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    70
	  ....     t = float(point[1])
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    71
	  ....     g = 4 * pi * pi * L / (t * t)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    72
	  ....     g_list.append(g)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    73
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    74
	The first four lines of this code must be trivial. We read the file and store the values. 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    75
	The fifth line where we do g equals to 4 star pi star and so on is the line which calculates g for each pair of L and t values from teh file. The last line simply stores the computed g value. In technical terms appends the computed value to g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    76
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    77
	Let us type this code in and see what g_list contains.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    78
###############################
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    79
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    80
Each value in g_list is the g value computed from a pair of L and t values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    81
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    82
Now we have all the values for g. We must find the mean of these values. That is the sum of all these values divided by the total no.of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    83
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    84
The no.of values can be found using len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    85
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    86
So we are left with the problem of finding the sum.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    87
We shall create a variable and loop over the list and add each g value to that variable.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    88
lets call it total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    89
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    90
In []: total = 0 
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    91
In []: for g in g_list:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    92
 ....:     total += g
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    93
 ....:
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    94
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    95
So at of this piece of code we will have the sum of all the g values in the variable total.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    96
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    97
Now calculating mean of g is as simple as doing total divided by len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    98
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
    99
In []: g_mean = total / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   100
In []: print 'Mean: ', g_mean
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   101
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   102
If we observe, we have to write a loop to do very simple thing such as finding sum of a list of values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   103
Python has a built-in function called sum to ease things.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   104
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   105
sum takes a list of values and returns the sum of those values.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   106
now calculating mean is much simpler.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   107
we don't have to write any for loop.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   108
we can directly use mean = sum(g_list) / len(g_list)
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   109
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   110
Still calculating mean needs writing an expression.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   111
What if we had a built-in for calculating mean directly.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   112
We do have and it is available through the pylab library.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   113
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   114
Now the job of calculating mean is just a function away.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   115
Call mean(g_list) directly and it gives you the mean of values in g_list.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   116
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   117
Isn't that sweet. Ya and that is why I use python.
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   118
c2634d874e33 Added script for third session first part.
Shantanu <shantanu@fossee.in>
parents:
diff changeset
   119