statistics/script.rst
author Anoop Jacob Thomas<anoop@fossee.in>
Sat, 18 Dec 2010 12:54:49 +0530
changeset 524 b602b4dcc87d
parent 450 d49aee7ab1b9
permissions -rw-r--r--
Made some changes to the script embellishing a plot, but it still needs changes.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
362
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     1
.. Objectives
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     2
.. ----------
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     3
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     4
.. By the end of this tutorial you will --
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     5
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     6
.. 1. Get to know simple statistics functions like mean,std etc .. (Remembering)
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     7
.. #. Apply them on a real world example. (Applying)
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     8
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
     9
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    10
.. Prerequisites
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    11
.. -------------
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    12
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    13
.. Getting started with IPython
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    14
.. Loading Data from files
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    15
.. Getting started with Lists
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    16
.. Accessing Pieces of Arrays
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    17
362
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    18
     
406
a534e9e79599 Completed basic data type based on review and improved on slides
Amit Sethi
parents: 383
diff changeset
    19
.. Author              : Amit Sethi
a534e9e79599 Completed basic data type based on review and improved on slides
Amit Sethi
parents: 383
diff changeset
    20
   Internal Reviewer   : Puneeth
362
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    21
   External Reviewer   :
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    22
   Checklist OK?       : <put date stamp here, if OK> [2010-10-05]
a77a27916f81 Added Objectives and other metadata
amit
parents: 349
diff changeset
    23
383
4a6d548d4369 Minor comments on Statistics.
Puneeth Chaganti <punchagan@fossee.in>
parents: 382
diff changeset
    24
.. #[punch; add slides, exercises!]
4a6d548d4369 Minor comments on Statistics.
Puneeth Chaganti <punchagan@fossee.in>
parents: 382
diff changeset
    25
382
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    26
Hello friends and welcome to the tutorial on Statistics using Python
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    27
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    28
{{{ Show the slide containing title }}}
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    29
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    30
{{{ Show the slide containing the outline slide }}}
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    31
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    32
In this tutorial, we shall learn
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    33
 * Doing statistical operations in Python  
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    34
   * Summing set of numbers
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    35
   * Finding there mean
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    36
   * Finding there Median
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    37
   * Finding there Standard Deviation 
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    38
   
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    39
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    40
382
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    41
.. #[punch: since loadtxt is anyway a pre-req, I would recommend you
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    42
.. to use a data file and load data from that. that is good, since you
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    43
.. would get to deal with arrays, instead of lists. 
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    44
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    45
.. Talking of rows and columns of 2-D lists etc is confusing. Also,
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    46
.. converting to float can be avoided. The tutorial will feel more
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    47
.. natural, is what I think. 
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    48
382
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    49
.. The idea of separating the main problem and giving toy examples
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    50
.. doesn't sound good. Use the same problem to explain stuff. Or use a
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    51
.. smaller data-set or something. Using lists doesn't seem natural.]
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    52
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    53
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    54
For this tutorial We will use data file that is at the a path
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    55
``/home/fossee/sslc2.txt``.  It contains record of students and their
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    56
performance in one of the State Secondary Board Examination. It has
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    57
180,000 lines of record. We are going to read it and process this
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    58
data.  We can see the content of file by double clicking on it. It
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    59
might take some time to open since it is quite a large file.  Please
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    60
don't edit the data.  This file has a particular structure.
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    61
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    62
We can do ::
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    63
   
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    64
   cat /home/fossee/sslc2.txt
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    65
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    66
to check the contents of the file.
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    67
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    68
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    69
{{{ Show the data structure on a slide }}}
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    70
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    71
Each line in the file is a set of 11 fields separated 
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    72
by semi-colons Consider a sample line from this file.  
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    73
A;015163;JOSEPH RAJ S;083;042;47;00;72;244;;; 
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    74
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    75
The following are the fields in any given line.
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    76
* Region Code which is 'A'
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    77
* Roll Number 015163
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    78
* Name JOSEPH RAJ S
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    79
* Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 **
349
9ced58c5c3b6 Added long answer type problems in all scripts
amit
parents: 321
diff changeset
    80
Science 35 ** Social 72
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    81
* Total marks 244
382
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
    82
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    83
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    84
Lets try and load this data as an array and then run various function on
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    85
it.
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    86
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    87
To get the data as an array we do. ::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    88
   
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    89
     L=loadtxt('/home/amit/sslc2.txt',usecols=(3,4,5,6,7,),delimiter=';')
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    90
     L
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    91
     
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    92
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    93
loadtxt function loads data from an external file.Delimiter specifies
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    94
the kind of character are the fields of data seperated by.  usecols
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    95
specifies the columns to be used so (3,4,5,6,7) loads those
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    96
colums. The 'comma' is added because usecols is a sequence.
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    97
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
    98
As we can see L is an array. We can get the shape of this array using::
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
    99
   
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   100
   L.shape
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   101
   (185667, 5)
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   102
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   103
Lets start applying statistics operations on these. We will start with
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   104
the most basic, summing. How do you find the sum of marks of all
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   105
subjects for the first student.
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   106
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   107
As we know from our knowledge of accessing pieces of arrays. To acess
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   108
the first row we will do ::
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   109
   
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   110
   L[0,:]
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   111
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   112
Now to sum this we can say ::
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   113
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   114
    totalmarks=sum(L[0,:]) 
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   115
    totalmarks
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   116
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   117
To get the mean we can do ::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   118
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   119
   totalmarks/len(L[0,:])
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   120
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   121
or simply ::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   122
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   123
   mean(L[0,:])
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   124
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   125
But we have such a large data set calculating one by one the mean of
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   126
each student is impossible. Is there a way to reduce the work.
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   127
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   128
For this we will look into the documentation of mean by doing::
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   129
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   130
    mean?
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   131
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   132
As we know L is a two dimensional array. We can calculate the mean
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   133
across each of the axis of the array. The axis of rows is referred by
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   134
number 0 and columns by 1. So to calculate mean accross all colums we
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   135
will pass extra parameter 1 for the axis.::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   136
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   137
    mean(L,1)
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   138
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   139
L here is the two dimensional array.
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   140
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   141
Similarly to calculate average marks scored by all the students for each
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   142
subject can be calculated using ::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   143
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   144
   mean(L,0)
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   145
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   146
Next lets now calculate the median of English marks for the all the students
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   147
We can access English marks of all students using ::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   148
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   149
   L[:,0]
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   150
   
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   151
To get the median we will do ::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   152
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   153
   median(L[:,0])
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   154
450
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   155
For all the subjects we can use the same syntax as mean and calculate
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   156
median across all rows using ::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   157
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   158
       median(L,0)
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   159
  
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   160
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   161
Similarly to calculate standard deviation for English we can do::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   162
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   163
	  std(L[:,0])
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   164
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   165
and for all rows::
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   166
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   167
    std(L,0)
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   168
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   169
Following is an exercise that you must do. 
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   170
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   171
%% %% In the given file football.txt at path /home/fossee/football.txt , one column is player name,second is goals at home and third goals away.
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   172
   1.Find the total goals for each player
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   173
   2.Mean home and away goals
d49aee7ab1b9 Rewrite of statistics script as suggested by punch and change in slides accordingly
Amit Sethi
parents: 406
diff changeset
   174
   3.Standard deviation of home and away goals 
321
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   175
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   176
{{{ Show summary slide }}}
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   177
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   178
This brings us to the end of the tutorial.
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   179
we have learnt
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   180
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   181
 * How to do the standard statistical operations sum , mean
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   182
   median and standard deviation in Python.
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   183
 * Combine text loading and the statistical operation to solve
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   184
   real world problems.
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   185
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   186
{{{ Show the "sponsored by FOSSEE" slide }}}
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   187
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   188
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   189
This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   190
2e49b1b72996 adding questions for all other LO needs to be cleaned
amit
parents:
diff changeset
   191
Hope you have enjoyed and found it useful.
349
9ced58c5c3b6 Added long answer type problems in all scripts
amit
parents: 321
diff changeset
   192
382
aa8ea9119476 Reviewed statistics script.
Puneeth Chaganti <punchagan@fossee.in>
parents: 362
diff changeset
   193
Thank you!
349
9ced58c5c3b6 Added long answer type problems in all scripts
amit
parents: 321
diff changeset
   194