statistics.txt
author asokan <asokan@fossee.in>
Tue, 18 May 2010 15:40:17 +0530
changeset 126 2eac725a5766
parent 59 b62177acce71
permissions -rw-r--r--
changes to array.txt
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
52
53700ad0e71e Edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 51
diff changeset
     1
Hello and welcome to the tutorial on handling large data files and processing them.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
     2
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
     3
Up until now we have covered:
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
     4
* How to create plots.
52
53700ad0e71e Edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 51
diff changeset
     5
* How to read data from files and process it.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
     6
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
     7
In this tutorial, we shall use these concepts and some new ones, to solve a problem/exercise. 
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
     8
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
     9
We have a file named sslc.txt on our desktop.
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    10
It contains record of students and their performance in one of the State Secondary Board Examination. It has 180, 000 lines of record. We are going to read it and process this data.
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    11
We can see the content of file by double clicking on it. It might take some time to open since it is quite a large file.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    12
Please don't edit the data.
52
53700ad0e71e Edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 51
diff changeset
    13
This file has a particular structure. Each line in the file is a set of 11 fields separated by semi-colons
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    14
Consider a sample line from this file.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    15
A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    16
The following are the fields in any given line.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    17
* Region Code which is 'A'
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    18
* Roll Number 015163
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    19
* Name JOSEPH RAJ S
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    20
* Marks of 5 subjects: 
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    21
  ** English 083
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    22
  ** Hindi 042
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    23
  ** Maths 47
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    24
  ** Science AA (Absent)
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    25
  ** Social 72
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    26
* Total marks 244
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    27
* Pass/Fail - This field is blank here because the particular candidate was absent for an exam if not it would've been one of (P/F)
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    28
* Withheld - Again blank in this case(W)
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    29
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    30
Let us now look at the problem we wish to solve:
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    31
Draw a pie chart representing the proportion of students who scored more than 90% in each region in Science.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    32
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    33
This is the result we expect:
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    34
#slide of result.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    35
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    36
In order to solve this problem, we need the following machinery:
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    37
File Reading - which we have already looked at.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    38
parsing  - which we have looked at partially.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    39
Dictionaries - we shall be introducing the concept of dictionaries here.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    40
And finally plotting - which we have been doing all along.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    41
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    42
Since this file is on our Desktop, let's navigate by typing 
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    43
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    44
cd Desktop
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    45
57
8eb98721a5af Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 51
diff changeset
    46
Let's get started, by opening the IPython prompt by typing, 
8eb98721a5af Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 51
diff changeset
    47
8eb98721a5af Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 51
diff changeset
    48
ipython -pylab
8eb98721a5af Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 51
diff changeset
    49
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    50
Let's first start off with dictionaries.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    51
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    52
We earlier used lists briefly. Back then we just created lists and appended items into them. 
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    53
x = [1, 4, 2, 7, 6]
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    54
In order to access any element in a list, we use its index number. Index starts from 0.
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    55
For eg. x[0] gives 1 and x[3] gives 7.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    56
52
53700ad0e71e Edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 51
diff changeset
    57
But, using integer indexes isn't always convenient. For example, consider a telephone directory. We give it a name and it should return a corresponding number. A list is not well suited for such problems. Python's dictionaries are better, for such problems. Dictionaries are just key-value pairs. For example:
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    58
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    59
d = {'png' : 'image',
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    60
      'txt' : 'text', 
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    61
      'py' : 'python'} 
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    62
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    63
And that is how we create a dictionary. Dictionaries are created by typing the key-value pairs within flower brackets.
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    64
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    65
d
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    66
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    67
d is a dictionary. The first element in the pair is called the `key' and the second is called the `value'. The key always has to be a string while the value can be of any type.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    68
52
53700ad0e71e Edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 51
diff changeset
    69
Lists are indexed by integers while dictionaries are indexed by strings. They are indexed using their keys as shown
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    70
In []: d['txt']
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    71
Out[]: 'text'
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    72
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    73
In []: d['png']
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    74
Out[]: 'image'
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    75
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    76
The dictionaries can be searched for the presence of a certain key by typing
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    77
'py' in d
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    78
True
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    79
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    80
'jpg' in d
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    81
False
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    82
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    83
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    84
52
53700ad0e71e Edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 51
diff changeset
    85
Please note that keys, and not values, are searched. 
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    86
'In a telephone directory one can search for a number based on a name, but not for a name based on a number'
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    87
52
53700ad0e71e Edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 51
diff changeset
    88
to obtain the list of all keys in a dictionary, type
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    89
d.keys()
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    90
['py', 'txt', 'png']
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    91
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    92
Similarly,
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    93
d.values()
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    94
['python', 'text', 'image']
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    95
is used to obtain the list of all values in a dictionary
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    96
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    97
one more thing to note about dictionaries, in this case for d, 
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    98
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
    99
d  
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
   100
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
   101
is that dictionaries do not preserve the order in which the items were entered. The order of the elements in a dictionary should not be relied upon.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   102
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   103
------------------------------------------------------------------------------------------------------------------
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   104
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   105
Parsing and string processing
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   106
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
   107
As we saw previously we shall be dealing with lines with content of the form
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   108
A;015162;JENIL T P;081;060;77;41;74;333;P;;
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   109
Here ';' is delimiter, that is ';' is used to separate the fields.
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   110
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   111
We shall create one string variable to see how can we process it to get the desired output.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   112
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   113
line = 'A;015162;JENIL T P;081;060;77;41;74;333;P;;'
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   114
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   115
Previously we saw how to split on spaces when we processed the pendulum.txt file. Let us now look at how to split a string into a list of fields based on a delimiter other than space.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   116
a = line.split(';')
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   117
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   118
Let's now check what 'a' contains.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   119
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   120
a
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   121
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   122
is list containing all the fields separately.
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   123
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   124
a[0] is the region code, a[1] the roll no., a[2] the name and so on.
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
   125
Similarly, a[6] gives us the science marks of that particular region.
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   126
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   127
So we create a dictionary of all the regions with number of students having more than 90 marks.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   128
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   129
------------------------------------------------------------------------------------------------------------------
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   130
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   131
Let's now start off with the code
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   132
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   133
We first create an empty dictionary
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   134
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   135
science = {}
53
3d2c2c0bc3e2 More edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 52
diff changeset
   136
now we read the records, one by one from the file sslc.txt
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   137
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   138
for record in open('sslc.txt'):
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   139
53
3d2c2c0bc3e2 More edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 52
diff changeset
   140
    we split each record on ';' and store it in a list by: fields equals record.split(';')
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   141
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   142
    now we get the region code of a particular entry by region_code equal to fields[0].strip.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   143
The strip() is used to remove all leading and trailing white spaces from a given string
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   144
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   145
    now we check if the region code is already there in dictionary by typing
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   146
    if region_code not in science:    
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   147
       when this statement is true, we add new entry to dictionary with initial value 0 and key being the region code.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   148
       science[region_code] = 0
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   149
       
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   150
    Note that this if statement is inside the for loop so for the if block we will have to give additional indentation.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   151
53
3d2c2c0bc3e2 More edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 52
diff changeset
   152
    we again come back to the older, 'for' loop's, indentation and get the science marks by
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   153
    score_str = fields[6].strip()
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   154
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   155
    we check if student was not absent
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   156
    if score_str != 'AA':
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   157
       then we check if his marks are above 90 or not
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   158
       if int(score_str) > 90:
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   159
       	  if yes we add 1 to the value of dictionary for that region by
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   160
       	  science[region_code] += 1
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   161
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   162
    Hit return twice to exit the for loop
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   163
59
b62177acce71 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 58
diff changeset
   164
by end of this loop we shall have our desired output in the dictionary 'science'
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   165
we can check the values by
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   166
science
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   167
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   168
now to create a pie chart we use
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   169
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   170
pie(science.values(),labels = science.keys())
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   171
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   172
the first argument to the pie function is the values to be plotted. The second is an optional argument which is used to label the regions.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   173
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   174
title('Students scoring 90% and above in science by region')
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   175
savefig('science.png')
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   176
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   177
That brings us to the end of this tutorial. We have learnt about dictionaries, some basic string parsing and plotting pie chart in this tutorial. Hope you have enjoyed it. Thank you.
53
3d2c2c0bc3e2 More edits to statistics.txt.
Puneeth Chaganti <punchagan@gmail.com>
parents: 52
diff changeset
   178
#slide of summary.