statistics-script
changeset 37 c2634d874e33
child 38 f248e91b1510
equal deleted inserted replaced
36:57ed95acb13f 37:c2634d874e33
       
     1 Hello friends and welcome to the third tutorial in the series of tutorials on "Python for scientific computing."
       
     2 
       
     3 In the previous tutorial we learnt how to read data from a file and plot the data
       
     4 We used 'for' loops and lists to get data in desired format.
       
     5 IPython -Pylab also provides with a function 'loadtxt' which can get us data without much hustle.
       
     6 
       
     7 We know that, pendulum.txt contains two columns, with length being first and time period is second column, so to get both columns in two separate variables we type
       
     8 
       
     9 l, t = loadtxt('pendulum.txt', unpack=True)
       
    10 
       
    11 (unpack = True)? will give us all of first column(length) in l and second column(time) in t
       
    12 
       
    13 to get more help type 
       
    14 
       
    15 loadtxt?
       
    16 This is really powerful tool to load data directly from files which are well structured and formatted. It supports many features like getting particular columns. 
       
    17 now to get squared values of t we can simply do
       
    18 
       
    19 tsq = t*t
       
    20 
       
    21 and we dont have to use for loop anymore. This is benefit of arrays. If we try to something similar to lists we cant escape a 'for' loop.
       
    22 
       
    23 Now to plot l vs tsq is same as we did in previous session
       
    24 
       
    25 plot(l, tsq, 'o')
       
    26 
       
    27 
       
    28 In this tutorial we shall learn how to compute statistics using python.
       
    29 We also shall learn how to represent data in the form of pie charts.
       
    30 
       
    31 Let us start with the most basic need in statistics, the mean.
       
    32 
       
    33 We shall calculate the mean acceleration due to gravity using the same 'pendulum.txt' that we used in the previous session.
       
    34 
       
    35 As we know, 'pendulum.txt' contains two values in each line. The first being length of pendulum and second the time period.
       
    36 To calculate acceleration due to gravity from these values, we shall use the expression T = 2*pi*sqrt(L/g)
       
    37 So re-arranging this equation, we get g = 4*pi**2*L/T**2 .
       
    38 
       
    39 We shall calculate the value of g for each pair of L and t and then calculate mean of all those g values.
       
    40 
       
    41 ## if we do loadtxt and numpy arrays then this part will change
       
    42 	First we need something to store each value of g that we are going to compute.
       
    43 	So we start with initialising an empty list called `g_list'.
       
    44 
       
    45 	Now we read each line from the file 'pendulum.txt' and calculate g value for that pair of L and t and then append the computed g to our `g_list'.
       
    46 
       
    47 	In []: for line in open('pendulum.txt'):
       
    48 	  ....     point = line.split()
       
    49 	  ....     L = float(point[0])
       
    50 	  ....     t = float(point[1])
       
    51 	  ....     g = 4 * pi * pi * L / (t * t)
       
    52 	  ....     g_list.append(g)
       
    53 
       
    54 	The first four lines of this code must be trivial. We read the file and store the values. 
       
    55 	The fifth line where we do g equals to 4 star pi star and so on is the line which calculates g for each pair of L and t values from teh file. The last line simply stores the computed g value. In technical terms appends the computed value to g_list.
       
    56 
       
    57 	Let us type this code in and see what g_list contains.
       
    58 ###############################
       
    59 
       
    60 Each value in g_list is the g value computed from a pair of L and t values.
       
    61 
       
    62 Now we have all the values for g. We must find the mean of these values. That is the sum of all these values divided by the total no.of values.
       
    63 
       
    64 The no.of values can be found using len(g_list)
       
    65 
       
    66 So we are left with the problem of finding the sum.
       
    67 We shall create a variable and loop over the list and add each g value to that variable.
       
    68 lets call it total.
       
    69 
       
    70 In []: total = 0 
       
    71 In []: for g in g_list:
       
    72  ....:     total += g
       
    73  ....:
       
    74 
       
    75 So at of this piece of code we will have the sum of all the g values in the variable total.
       
    76 
       
    77 Now calculating mean of g is as simple as doing total divided by len(g_list)
       
    78 
       
    79 In []: g_mean = total / len(g_list)
       
    80 In []: print 'Mean: ', g_mean
       
    81 
       
    82 If we observe, we have to write a loop to do very simple thing such as finding sum of a list of values.
       
    83 Python has a built-in function called sum to ease things.
       
    84 
       
    85 sum takes a list of values and returns the sum of those values.
       
    86 now calculating mean is much simpler.
       
    87 we don't have to write any for loop.
       
    88 we can directly use mean = sum(g_list) / len(g_list)
       
    89 
       
    90 Still calculating mean needs writing an expression.
       
    91 What if we had a built-in for calculating mean directly.
       
    92 We do have and it is available through the pylab library.
       
    93 
       
    94 Now the job of calculating mean is just a function away.
       
    95 Call mean(g_list) directly and it gives you the mean of values in g_list.
       
    96 
       
    97 Isn't that sweet. Ya and that is why I use python.
       
    98 
       
    99