Minor edits.
authorSantosh G. Vattam <vattam.santosh@gmail.com>
Tue, 13 Apr 2010 01:19:12 +0530
changeset 50 9d60720b16b0
parent 49 90c2d777fb0e
child 51 32d854e62be9
Minor edits.
ipython-tut.txt
statistics.txt
--- a/ipython-tut.txt	Tue Apr 13 00:15:25 2010 +0530
+++ b/ipython-tut.txt	Tue Apr 13 01:19:12 2010 +0530
@@ -3,3 +3,5 @@
 features that the vanilla Python interpreter does not. We have shown some of these
 features in the previous tutorials. In this tutorial we shall look at a few more
 of them.
+
+
--- a/statistics.txt	Tue Apr 13 00:15:25 2010 +0530
+++ b/statistics.txt	Tue Apr 13 01:19:12 2010 +0530
@@ -10,10 +10,9 @@
 It contains record of students and their performance in one of the State Secondary Board Examination. It has 180, 000 lines of record. We are going to read it and process this data.
 We can see the content of file by opening with any text editor.
 Please don't edit the data.
-It is arranged in a particular format.
-One particular line being:
+This file has a particular structure. Each line in the file is a set of 11 fields:
 A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;
-It has following fields:
+The following are the fields in any given line.
 * Region Code which is 'A'
 * Roll Number 015163
 * Name JOSEPH RAJ S
@@ -24,33 +23,29 @@
   ** Science AA (Absent)
   ** Social 72
 * Total marks 244
-* Pass/Fail Blank cause he was absent in one exam or else it will be(P/F)
-* Withheld Blank in this case(W)
+* Pass/Fail - This field is blank here because the particular candidate was absent for an exam if not it would've been one of (P/F)
+* Withheld - Again blank in this case(W)
 
-So problem we are going to solve is:
-Draw a pie chart representing proportion of students who scored more than 90% in each region in Science.
+Let us now look at the problem we wish to solve:
+Draw a pie chart representing the proportion of students who scored more than 90% in each region in Science.
 
-The result would be something like this:
-slide of result.
+This is the result we expect:
+#slide of result.
 
-We would be using following machinery:
-File Reading(done already)
-parsing (done partly)
-Dictionaries (new)
-Arrays
-Plot (done already)
-
-Dictionaries
+In order to solve this problem, we need the following machinery:
+File Reading - which we have already looked at.
+parsing  - which we have looked at partially.
+Dictionaries - we shall be introducing the concept of dictionaries here.
+And finally plotting - which we have been doing all along.
 
-We earlier used lists, back then we just created them and appended items to list. 
+Let's first start off with dictionaries.
+
+We earlier used lists briefly. Back then we just created lists and appended items into them. 
 x = [1, 4, 2, 7, 6]
-to access the first element we use index number, and it starts from 0 so
-x[0] will give
-1 and
-x[3] will
-7
+In order to access any element in a list, we use its index number. Index starts from 0.
+For eg. x[0] will give 1 and x[3] will 7.
 
-At times we don't have index to relate things. For example consider a telephone directory, we give it a name and it should return back corresponding number. List is not the best kind of data structure for such problems, and hence Python provides support for dictionaries. Dictionaries are key value pairs. Lists are indexed by integers while dictionaries are indexed by strings. For example:
+There are times when we can't access things through integer indexes. For example consider a telephone directory, we give it a name and it should return back corresponding number. List is not the best kind of data structure for such problems, and hence Python provides support for dictionaries. Dictionaries are key value pairs. Lists are indexed by integers while dictionaries are indexed by strings. For example:
 
 d = {'png' : 'image',
       'txt' : 'text', 
@@ -71,20 +66,23 @@
 'py' in d
 True
 
-Please note the values cannot be searched in a dictionaries.
 'jpg' in d
 False
-'In telephone directory searching number is not a option'
 
-to obtain the list of all keys in a dictionary
+Please note the values cannot be searched in a dictionaries.
+'In a telephone directory one can search for a number based on a name, but not for a name based on a number'
+
+to obtain the list of all keys in a dictionary type
 d.keys()
 ['py', 'txt', 'png']
 
+Similarly,
 d.values()
 ['python', 'text', 'image']
 is used to obtain the list of all values in a dictionary
 
-d
+Let's now see what the dictionary contains
+d 
 
 Please observe that dictionaries do not preserve the order in which the items were entered. The order of the elements in a dictionary should not be relied upon.
 
@@ -92,62 +90,62 @@
 
 Parsing and string processing
 
-As we saw previously we will be dealing with lines with such content
+As we saw previously we will be dealing with lines with content of the form
 A;015162;JENIL T P;081;060;77;41;74;333;P;;
-so ';' is delimiter we have to look for.
+Here ';' is delimiter, that is ';' is used to separate the fields.
 
-We will create one string variable to see how can we process it get the desired output.
+We shall create one string variable to see how can we process it to get the desired output.
 
 line = 'A;015162;JENIL T P;081;060;77;41;74;333;P;;'
+
+Previously we saw how to split on spaces when we processed the pendulum.txt file. Let us now look at how to split a string into a list of fields based on a delimiter other than space.
 a = line.split(';')
-we have used split earlier to split on empty spaces, but in this case we will split line for each ';'
 
-a 
+Let's now check what 'a' contains.
+
+a
 
 is list containing all the fields separately.
 
-a[0] is the region code.
-and a[6] will give us the science marks of that particular region.
+a[0] is the region code, a[1] the roll no., a[2] the name and so on.
+Similarly, a[6] will give us the science marks of that particular region.
 
-So we create a dictionary of all the regions with number of students having more then 90 marks.
-# Something like 
-# d = {'A': 729, 'C': 764, 'B': 1120,'E': 414, 'D': 603, 'F': 500}
+So we create a dictionary of all the regions with number of students having more than 90 marks.
 
 ------------------------------------------------------------------------------------------------------------------
 
-code
+Let's now start off with the code
 
 We first create an empty dictionary
 
 science = {}
-now we read the record data one by one
+now we read the record data one by one from the file sslc1.txt
 
 for record in open('sslc.txt'):
 
-    we split the record on ';' and store the list as fields equals record.split(';')
-#    fields = record.split(';')
+    we split the record on ';' and store them in a list by: fields equals record.split(';')
 
-    now get region code of particular entry by region_code equal to fields[0].strip. strip with remove all leading and trailing white spaces from the string
-#    region_code = fields[0].strip()
+    now we get the region code of a particular entry by region_code equal to fields[0].strip.
+The strip() is used to remove all leading and trailing white spaces from a given string
 
-    now we check if the region code is always there in dictionary by writing 'if' statement, 
+    now we check if the region code is already there in dictionary by typing
     if region_code not in science:    
        when this statement is true, we add new entry to dictionary with initial value 0 and key being the region code.
        science[region_code] = 0
        
-    Note that this if statement is inside the for loop so for if block we will have to give additional indentation.
+    Note that this if statement is inside the for loop so for the if block we will have to give additional indentation.
 
-    we again come back to older for loop indentation and we again strip(ing is good) the string and get science marks by
+    we again come back to the older 'for' loop indentation and we again strip the string and to get the science marks by
     score_str = fields[6].strip()
 
     we check if student was not absent
     if score_str != 'AA':
        then we check if his marks are above 90 or not
        if int(score_str) > 90:
-       	  if true we add it to the value of dictionary for that region by
+       	  if yes we add 1 to the value of dictionary for that region by
        	  science[region_code] += 1
 
-    Hit return twice
+    Hit return twice to exit the for loop
 
 by end of this loop we will have our desired output in the dictionary 'science'
 we can check the values by
@@ -156,5 +154,10 @@
 now to create a pie chart we use
 
 pie(science.values(),labels = science.keys())
+
+the first argument to the pie function is the values to be plotted. The second is an optional argument which is used to label the regions.
+
 title('Students scoring 90% and above in science by region')
 savefig('science.png')
+
+That brings us to the end of this tutorial. We have learnt about dictionaries, some basic string parsing and plotting pie chart in this tutorial. Hope you have enjoyed it. Thank you.