--- a/statistics/script.rst Fri Nov 05 21:42:20 2010 +0530
+++ b/statistics/script.rst Sat Nov 06 19:17:21 2010 +0530
@@ -19,7 +19,7 @@
External Reviewer :
Checklist OK? : <put date stamp here, if OK> [2010-10-05]
-Hello friends and welcome to the tutorial on statistics using Python
+Hello friends and welcome to the tutorial on Statistics using Python
{{{ Show the slide containing title }}}
@@ -29,57 +29,78 @@
* Doing simple statistical operations in Python
* Applying these to real world problems
-You will need Ipython with pylab running on your computer
-to use this tutorial.
+.. #[punch: the prerequisites part may be skipped in the tutorial. It
+.. will be provided separately.]
+
+You will need Ipython with pylab running on your computer to use this
+tutorial.
+
+Also you will need to know about loading data using loadtxt to be able
+to follow the real world application.
-Also you will need to know about loading data using loadtxt to be
-able to follow the real world application.
+.. #[punch: since loadtxt is anyway a pre-req, I would recommend you
+.. to use a data file and load data from that. that is good, since you
+.. would get to deal with arrays, instead of lists.
+
+.. Talking of rows and columns of 2-D lists etc is confusing. Also,
+.. converting to float can be avoided. The tutorial will feel more
+.. natural, is what I think.
-We will first start with the most necessary statistical
-operation i.e finding mean.
+.. The idea of separating the main problem and giving toy examples
+.. doesn't sound good. Use the same problem to explain stuff. Or use a
+.. smaller data-set or something. Using lists doesn't seem natural.]
+
+
+We will first start with the most necessary statistical operation i.e
+finding mean.
We have a list of ages of a random group of people ::
- age_list=[4,45,23,34,34,38,65,42,32,7]
+ age_list = [4,45,23,34,34,38,65,42,32,7]
-One way of getting the mean could be getting sum of
-all the elements and dividing by length of the list.::
+One way of getting the mean could be getting sum of all the ages and
+dividing by the number of people in the group. ::
- sum_age_list =sum(age_list)
+ sum_age_list = sum(age_list)
-sum function gives us the sum of the elements.::
+sum function gives us the sum of the elements. Note that the
+``sum_age_list`` variable is an integer and the number of people or
+length of the list is also an integer. We will need to convert one of
+them to a float before carrying out the division. ::
- mean_using_sum=float(sum_age_list)/len(age_list)
+ mean_using_sum = float(sum_age_list)/len(age_list)
-This obviously gives the mean age but python has another
-method for getting the mean. This is the mean function::
+This obviously gives the mean age but there is a simpler way to do
+this in Python - using the mean function::
mean(age_list)
-Mean can be used in more ways in case of 2 dimensional lists.
-Take a two dimensional list ::
+Mean can be used in more ways in case of 2 dimensional lists. Take a
+two dimensional list ::
two_dimension=[[1,5,6,8],[1,3,4,5]]
-the mean function used in default manner will give the mean of the
-flattened sequence. Flattened sequence means the two lists taken
-as if it was a single list of elements ::
+The mean function by default gives the mean of the flattened sequence.
+A Flattened sequence means a list obtained by concatenating all the
+smaller lists into a large long list. In this case, the list obtained
+by writing the two lists one after the other. ::
mean(two_dimension)
flattened_seq=[1,5,6,8,1,3,4,5]
mean(flattened_seq)
-As you can see both the results are same. The other way is mean
-of each column.::
+As you can see both the results are same. ``mean`` function can also
+give us the mean of each column, or the mean of corresponding elements
+in the smaller lists. ::
- mean(two_dimension,0)
+ mean(two_dimension, 0)
array([ 1. , 4. , 5. , 6.5])
we pass an extra argument 0 in that case.
-In case of getting mean along the rows the argument is 1::
+If we use an argument 1, we obtain the mean along the rows. ::
- mean(two_dimension,1)
+ mean(two_dimension, 1)
array([ 5. , 3.25])
We can see more option of mean using ::
@@ -92,24 +113,26 @@
median(age_list)
std(age_list)
-Median and std can also be calculated for two dimensional arrays along columns and rows just like mean.
+Median and std can also be calculated for two dimensional arrays along
+columns and rows just like mean.
- For example ::
+For example ::
- median(two_dimension,0)
- std(two_dimension,1)
+ median(two_dimension, 0)
+ std(two_dimension, 1)
-This gives us the median along the colums and standard devition along the rows.
+This gives us the median along the colums and standard devition along
+the rows.
Now lets apply this to a real world example
-We will a data file that is at the a path
-``/home/fossee/sslc2.txt``.It contains record of students and their
-performance in one of the State Secondary Board Examination. It has
-180, 000 lines of record. We are going to read it and process this
-data. We can see the content of file by double clicking on it. It
-might take some time to open since it is quite a large file. Please
-don't edit the data. This file has a particular structure.
+We will a data file that is at the a path ``/home/fossee/sslc2.txt``.
+It contains record of students and their performance in one of the
+State Secondary Board Examination. It has 180, 000 lines of record. We
+are going to read it and process this data. We can see the content of
+file by double clicking on it. It might take some time to open since
+it is quite a large file. Please don't edit the data. This file has
+a particular structure.
We can do ::
@@ -128,7 +151,7 @@
* Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 **
Science 35 ** Social 72
* Total marks 244
-*
+
Now lets try and find the mean of English marks of all students.
@@ -145,11 +168,11 @@
To get the median marks. ::
- median(L)
+ median(L)
Standard deviation. ::
- std(L)
+ std(L)
Now lets try and and get the mean for all the subjects ::
@@ -187,10 +210,5 @@
Hope you have enjoyed and found it useful.
-Thankyou
+Thank you!
-.. Author : Amit Sethi
- Internal Reviewer 1 :
- Internal Reviewer 2 :
- External Reviewer :
-