# HG changeset patch # User Puneeth Chaganti # Date 1289051241 -19800 # Node ID aa8ea9119476c50dda949f9206634e747ccaa03a # Parent 5415cb1bb4af642127ba9117dd12d5bc49473697 Reviewed statistics script. diff -r 5415cb1bb4af -r aa8ea9119476 statistics/script.rst --- a/statistics/script.rst Fri Nov 05 21:42:20 2010 +0530 +++ b/statistics/script.rst Sat Nov 06 19:17:21 2010 +0530 @@ -19,7 +19,7 @@ External Reviewer : Checklist OK? : [2010-10-05] -Hello friends and welcome to the tutorial on statistics using Python +Hello friends and welcome to the tutorial on Statistics using Python {{{ Show the slide containing title }}} @@ -29,57 +29,78 @@ * Doing simple statistical operations in Python * Applying these to real world problems -You will need Ipython with pylab running on your computer -to use this tutorial. +.. #[punch: the prerequisites part may be skipped in the tutorial. It +.. will be provided separately.] + +You will need Ipython with pylab running on your computer to use this +tutorial. + +Also you will need to know about loading data using loadtxt to be able +to follow the real world application. -Also you will need to know about loading data using loadtxt to be -able to follow the real world application. +.. #[punch: since loadtxt is anyway a pre-req, I would recommend you +.. to use a data file and load data from that. that is good, since you +.. would get to deal with arrays, instead of lists. + +.. Talking of rows and columns of 2-D lists etc is confusing. Also, +.. converting to float can be avoided. The tutorial will feel more +.. natural, is what I think. -We will first start with the most necessary statistical -operation i.e finding mean. +.. The idea of separating the main problem and giving toy examples +.. doesn't sound good. Use the same problem to explain stuff. Or use a +.. smaller data-set or something. Using lists doesn't seem natural.] + + +We will first start with the most necessary statistical operation i.e +finding mean. We have a list of ages of a random group of people :: - age_list=[4,45,23,34,34,38,65,42,32,7] + age_list = [4,45,23,34,34,38,65,42,32,7] -One way of getting the mean could be getting sum of -all the elements and dividing by length of the list.:: +One way of getting the mean could be getting sum of all the ages and +dividing by the number of people in the group. :: - sum_age_list =sum(age_list) + sum_age_list = sum(age_list) -sum function gives us the sum of the elements.:: +sum function gives us the sum of the elements. Note that the +``sum_age_list`` variable is an integer and the number of people or +length of the list is also an integer. We will need to convert one of +them to a float before carrying out the division. :: - mean_using_sum=float(sum_age_list)/len(age_list) + mean_using_sum = float(sum_age_list)/len(age_list) -This obviously gives the mean age but python has another -method for getting the mean. This is the mean function:: +This obviously gives the mean age but there is a simpler way to do +this in Python - using the mean function:: mean(age_list) -Mean can be used in more ways in case of 2 dimensional lists. -Take a two dimensional list :: +Mean can be used in more ways in case of 2 dimensional lists. Take a +two dimensional list :: two_dimension=[[1,5,6,8],[1,3,4,5]] -the mean function used in default manner will give the mean of the -flattened sequence. Flattened sequence means the two lists taken -as if it was a single list of elements :: +The mean function by default gives the mean of the flattened sequence. +A Flattened sequence means a list obtained by concatenating all the +smaller lists into a large long list. In this case, the list obtained +by writing the two lists one after the other. :: mean(two_dimension) flattened_seq=[1,5,6,8,1,3,4,5] mean(flattened_seq) -As you can see both the results are same. The other way is mean -of each column.:: +As you can see both the results are same. ``mean`` function can also +give us the mean of each column, or the mean of corresponding elements +in the smaller lists. :: - mean(two_dimension,0) + mean(two_dimension, 0) array([ 1. , 4. , 5. , 6.5]) we pass an extra argument 0 in that case. -In case of getting mean along the rows the argument is 1:: +If we use an argument 1, we obtain the mean along the rows. :: - mean(two_dimension,1) + mean(two_dimension, 1) array([ 5. , 3.25]) We can see more option of mean using :: @@ -92,24 +113,26 @@ median(age_list) std(age_list) -Median and std can also be calculated for two dimensional arrays along columns and rows just like mean. +Median and std can also be calculated for two dimensional arrays along +columns and rows just like mean. - For example :: +For example :: - median(two_dimension,0) - std(two_dimension,1) + median(two_dimension, 0) + std(two_dimension, 1) -This gives us the median along the colums and standard devition along the rows. +This gives us the median along the colums and standard devition along +the rows. Now lets apply this to a real world example -We will a data file that is at the a path -``/home/fossee/sslc2.txt``.It contains record of students and their -performance in one of the State Secondary Board Examination. It has -180, 000 lines of record. We are going to read it and process this -data. We can see the content of file by double clicking on it. It -might take some time to open since it is quite a large file. Please -don't edit the data. This file has a particular structure. +We will a data file that is at the a path ``/home/fossee/sslc2.txt``. +It contains record of students and their performance in one of the +State Secondary Board Examination. It has 180, 000 lines of record. We +are going to read it and process this data. We can see the content of +file by double clicking on it. It might take some time to open since +it is quite a large file. Please don't edit the data. This file has +a particular structure. We can do :: @@ -128,7 +151,7 @@ * Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 ** Science 35 ** Social 72 * Total marks 244 -* + Now lets try and find the mean of English marks of all students. @@ -145,11 +168,11 @@ To get the median marks. :: - median(L) + median(L) Standard deviation. :: - std(L) + std(L) Now lets try and and get the mean for all the subjects :: @@ -187,10 +210,5 @@ Hope you have enjoyed and found it useful. -Thankyou +Thank you! -.. Author : Amit Sethi - Internal Reviewer 1 : - Internal Reviewer 2 : - External Reviewer : -