Reviewed statistics script.
authorPuneeth Chaganti <punchagan@fossee.in>
Sat, 06 Nov 2010 19:17:21 +0530
changeset 382 aa8ea9119476
parent 381 5415cb1bb4af
child 383 4a6d548d4369
Reviewed statistics script.
statistics/script.rst
--- a/statistics/script.rst	Fri Nov 05 21:42:20 2010 +0530
+++ b/statistics/script.rst	Sat Nov 06 19:17:21 2010 +0530
@@ -19,7 +19,7 @@
    External Reviewer   :
    Checklist OK?       : <put date stamp here, if OK> [2010-10-05]
 
-Hello friends and welcome to the tutorial on statistics using Python
+Hello friends and welcome to the tutorial on Statistics using Python
 
 {{{ Show the slide containing title }}}
 
@@ -29,57 +29,78 @@
  * Doing simple statistical operations in Python  
  * Applying these to real world problems 
 
-You will need Ipython with pylab running on your computer
-to use this tutorial.
+.. #[punch: the prerequisites part may be skipped in the tutorial. It
+.. will be provided separately.]
+
+You will need Ipython with pylab running on your computer to use this
+tutorial.
+
+Also you will need to know about loading data using loadtxt to be able
+to follow the real world application.
 
-Also you will need to know about loading data using loadtxt to be 
-able to follow the real world application.
+.. #[punch: since loadtxt is anyway a pre-req, I would recommend you
+.. to use a data file and load data from that. that is good, since you
+.. would get to deal with arrays, instead of lists. 
+
+.. Talking of rows and columns of 2-D lists etc is confusing. Also,
+.. converting to float can be avoided. The tutorial will feel more
+.. natural, is what I think. 
 
-We will first start with the most necessary statistical 
-operation i.e finding mean.
+.. The idea of separating the main problem and giving toy examples
+.. doesn't sound good. Use the same problem to explain stuff. Or use a
+.. smaller data-set or something. Using lists doesn't seem natural.]
+
+
+We will first start with the most necessary statistical operation i.e
+finding mean.
 
 We have a list of ages of a random group of people ::
    
-   age_list=[4,45,23,34,34,38,65,42,32,7]
+   age_list = [4,45,23,34,34,38,65,42,32,7]
 
-One way of getting the mean could be getting sum of 
-all the elements and dividing by length of the list.::
+One way of getting the mean could be getting sum of all the ages and
+dividing by the number of people in the group. ::
 
-    sum_age_list =sum(age_list)
+    sum_age_list = sum(age_list)
 
-sum function gives us the sum of the elements.::
+sum function gives us the sum of the elements. Note that the
+``sum_age_list`` variable is an integer and the number of people or
+length of the list is also an integer. We will need to convert one of
+them to a float before carrying out the division. ::
 
-    mean_using_sum=float(sum_age_list)/len(age_list)
+    mean_using_sum = float(sum_age_list)/len(age_list)
 
-This obviously gives the mean age but python has another 
-method for getting the mean. This is the mean function::
+This obviously gives the mean age but there is a simpler way to do
+this in Python - using the mean function::
 
        mean(age_list)
 
-Mean can be used in more ways in case of 2 dimensional lists.
-Take a two dimensional list ::
+Mean can be used in more ways in case of 2 dimensional lists.  Take a
+two dimensional list ::
      
      two_dimension=[[1,5,6,8],[1,3,4,5]]
 
-the mean function used in default manner will give the mean of the 
-flattened sequence. Flattened sequence means the two lists taken 
-as if it was a single list of elements ::
+The mean function by default gives the mean of the flattened sequence.
+A Flattened sequence means a list obtained by concatenating all the
+smaller lists into a large long list. In this case, the list obtained
+by writing the two lists one after the other. ::
 
     mean(two_dimension)
     flattened_seq=[1,5,6,8,1,3,4,5]
     mean(flattened_seq)
 
-As you can see both the results are same. The other way is mean 
-of each column.::
+As you can see both the results are same. ``mean`` function can also
+give us the mean of each column, or the mean of corresponding elements
+in the smaller lists. ::
    
-   mean(two_dimension,0)
+   mean(two_dimension, 0)
    array([ 1. ,  4. ,  5. ,  6.5])
 
 we pass an extra argument 0 in that case.
 
-In case of getting mean along the rows the argument is 1::
+If we use an argument 1, we obtain the mean along the rows. ::
    
-   mean(two_dimension,1)
+   mean(two_dimension, 1)
    array([ 5.  ,  3.25])
 
 We can see more option of mean using ::
@@ -92,24 +113,26 @@
       median(age_list)
       std(age_list)
 
-Median and std can also be calculated for two dimensional arrays along columns and rows just like mean.
+Median and std can also be calculated for two dimensional arrays along
+columns and rows just like mean.
 
-       For example ::
+For example ::
        
-       median(two_dimension,0)
-       std(two_dimension,1)
+       median(two_dimension, 0)
+       std(two_dimension, 1)
 
-This gives us the median along the colums and standard devition along the rows.
+This gives us the median along the colums and standard devition along
+the rows.
        
 Now lets apply this to a real world example 
     
-We will a data file that is at the a path
-``/home/fossee/sslc2.txt``.It contains record of students and their
-performance in one of the State Secondary Board Examination. It has
-180, 000 lines of record. We are going to read it and process this
-data.  We can see the content of file by double clicking on it. It
-might take some time to open since it is quite a large file.  Please
-don't edit the data.  This file has a particular structure.
+We will a data file that is at the a path ``/home/fossee/sslc2.txt``.
+It contains record of students and their performance in one of the
+State Secondary Board Examination. It has 180, 000 lines of record. We
+are going to read it and process this data.  We can see the content of
+file by double clicking on it. It might take some time to open since
+it is quite a large file.  Please don't edit the data.  This file has
+a particular structure.
 
 We can do ::
    
@@ -128,7 +151,7 @@
 * Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 **
 Science 35 ** Social 72
 * Total marks 244
-*
+
 
 Now lets try and find the mean of English marks of all students.
 
@@ -145,11 +168,11 @@
 
 To get the median marks. ::
    
-   median(L)
+    median(L)
    
 Standard deviation. ::
 	
-	std(L)
+    std(L)
 
 
 Now lets try and and get the mean for all the subjects ::
@@ -187,10 +210,5 @@
 
 Hope you have enjoyed and found it useful.
 
-Thankyou
+Thank you!
 
-.. Author              : Amit Sethi
-   Internal Reviewer 1 : 
-   Internal Reviewer 2 : 
-   External Reviewer   :
-