statistics.txt
author Santosh G. Vattam <vattam.santosh@gmail.com>
Tue, 13 Apr 2010 01:19:12 +0530
changeset 50 9d60720b16b0
parent 47 501e3fb21e3c
child 51 32d854e62be9
permissions -rw-r--r--
Minor edits.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
     1
Hello and welcome to the tutorial on handling large data files and processing them to get desired results.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
     2
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
     3
Till now we have covered:
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
     4
* How to create plots.
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
     5
* How to read data from file and process it.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
     6
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
     7
In this session, we will use them and some new concepts to solve a problem/exercise. 
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
     8
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
     9
We have a file named sslc1.txt. 
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    10
It contains record of students and their performance in one of the State Secondary Board Examination. It has 180, 000 lines of record. We are going to read it and process this data.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    11
We can see the content of file by opening with any text editor.
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    12
Please don't edit the data.
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    13
This file has a particular structure. Each line in the file is a set of 11 fields:
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    14
A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    15
The following are the fields in any given line.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    16
* Region Code which is 'A'
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    17
* Roll Number 015163
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    18
* Name JOSEPH RAJ S
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    19
* Marks of 5 subjects: 
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    20
  ** English 083
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    21
  ** Hindi 042
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    22
  ** Maths 47
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    23
  ** Science AA (Absent)
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    24
  ** Social 72
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    25
* Total marks 244
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    26
* Pass/Fail - This field is blank here because the particular candidate was absent for an exam if not it would've been one of (P/F)
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    27
* Withheld - Again blank in this case(W)
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    28
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    29
Let us now look at the problem we wish to solve:
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    30
Draw a pie chart representing the proportion of students who scored more than 90% in each region in Science.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    31
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    32
This is the result we expect:
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    33
#slide of result.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    34
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    35
In order to solve this problem, we need the following machinery:
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    36
File Reading - which we have already looked at.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    37
parsing  - which we have looked at partially.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    38
Dictionaries - we shall be introducing the concept of dictionaries here.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    39
And finally plotting - which we have been doing all along.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    40
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    41
Let's first start off with dictionaries.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    42
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    43
We earlier used lists briefly. Back then we just created lists and appended items into them. 
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    44
x = [1, 4, 2, 7, 6]
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    45
In order to access any element in a list, we use its index number. Index starts from 0.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    46
For eg. x[0] will give 1 and x[3] will 7.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    47
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    48
There are times when we can't access things through integer indexes. For example consider a telephone directory, we give it a name and it should return back corresponding number. List is not the best kind of data structure for such problems, and hence Python provides support for dictionaries. Dictionaries are key value pairs. Lists are indexed by integers while dictionaries are indexed by strings. For example:
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    49
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    50
d = {'png' : 'image',
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    51
      'txt' : 'text', 
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    52
      'py' : 'python'} 
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    53
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    54
d
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    55
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    56
d is a dictionary. The first element in the pair is called the `key' and the second is called the `value'. The key always has to be a string while the value can be of any type.
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    57
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    58
Dictionaries are indexed using their keys as shown
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    59
In []: d['txt']
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    60
Out[]: 'text'
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    61
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    62
In []: d['png']
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    63
Out[]: 'image'
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    64
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    65
The dictionaries can be searched for the presence of a certain key by typing
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    66
'py' in d
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    67
True
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    68
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    69
'jpg' in d
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    70
False
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    71
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    72
Please note the values cannot be searched in a dictionaries.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    73
'In a telephone directory one can search for a number based on a name, but not for a name based on a number'
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    74
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    75
to obtain the list of all keys in a dictionary type
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    76
d.keys()
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    77
['py', 'txt', 'png']
6
e1fcec83e1ab Added statistics.txt.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents:
diff changeset
    78
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    79
Similarly,
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    80
d.values()
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    81
['python', 'text', 'image']
7
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    82
is used to obtain the list of all values in a dictionary
9794cc414498 Minor edits to statistics.txt
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 6
diff changeset
    83
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    84
Let's now see what the dictionary contains
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    85
d 
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    86
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    87
Please observe that dictionaries do not preserve the order in which the items were entered. The order of the elements in a dictionary should not be relied upon.
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    88
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    89
------------------------------------------------------------------------------------------------------------------
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    90
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    91
Parsing and string processing
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    92
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    93
As we saw previously we will be dealing with lines with content of the form
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    94
A;015162;JENIL T P;081;060;77;41;74;333;P;;
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    95
Here ';' is delimiter, that is ';' is used to separate the fields.
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
    96
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
    97
We shall create one string variable to see how can we process it to get the desired output.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    98
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
    99
line = 'A;015162;JENIL T P;081;060;77;41;74;333;P;;'
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   100
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   101
Previously we saw how to split on spaces when we processed the pendulum.txt file. Let us now look at how to split a string into a list of fields based on a delimiter other than space.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   102
a = line.split(';')
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   103
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   104
Let's now check what 'a' contains.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   105
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   106
a
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   107
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   108
is list containing all the fields separately.
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   109
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   110
a[0] is the region code, a[1] the roll no., a[2] the name and so on.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   111
Similarly, a[6] will give us the science marks of that particular region.
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   112
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   113
So we create a dictionary of all the regions with number of students having more than 90 marks.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   114
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   115
------------------------------------------------------------------------------------------------------------------
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   116
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   117
Let's now start off with the code
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   118
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   119
We first create an empty dictionary
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   120
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   121
science = {}
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   122
now we read the record data one by one from the file sslc1.txt
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   123
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   124
for record in open('sslc.txt'):
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   125
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   126
    we split the record on ';' and store them in a list by: fields equals record.split(';')
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   127
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   128
    now we get the region code of a particular entry by region_code equal to fields[0].strip.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   129
The strip() is used to remove all leading and trailing white spaces from a given string
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   130
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   131
    now we check if the region code is already there in dictionary by typing
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   132
    if region_code not in science:    
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   133
       when this statement is true, we add new entry to dictionary with initial value 0 and key being the region code.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   134
       science[region_code] = 0
47
501e3fb21e3c More changes to sslc session.
Shantanu <shantanu@fossee.in>
parents: 46
diff changeset
   135
       
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   136
    Note that this if statement is inside the for loop so for the if block we will have to give additional indentation.
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   137
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   138
    we again come back to the older 'for' loop indentation and we again strip the string and to get the science marks by
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   139
    score_str = fields[6].strip()
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   140
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   141
    we check if student was not absent
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   142
    if score_str != 'AA':
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   143
       then we check if his marks are above 90 or not
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   144
       if int(score_str) > 90:
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   145
       	  if yes we add 1 to the value of dictionary for that region by
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   146
       	  science[region_code] += 1
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   147
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   148
    Hit return twice to exit the for loop
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   149
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   150
by end of this loop we will have our desired output in the dictionary 'science'
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   151
we can check the values by
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   152
science
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   153
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   154
now to create a pie chart we use
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   155
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   156
pie(science.values(),labels = science.keys())
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   157
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   158
the first argument to the pie function is the values to be plotted. The second is an optional argument which is used to label the regions.
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   159
46
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   160
title('Students scoring 90% and above in science by region')
34df59770550 Added script for sslc.txt file and presentation.
Shantanu <shantanu@fossee.in>
parents: 7
diff changeset
   161
savefig('science.png')
50
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   162
9d60720b16b0 Minor edits.
Santosh G. Vattam <vattam.santosh@gmail.com>
parents: 47
diff changeset
   163
That brings us to the end of this tutorial. We have learnt about dictionaries, some basic string parsing and plotting pie chart in this tutorial. Hope you have enjoyed it. Thank you.