parsing_data/script.rst
author Nishanth <nishanth@fossee.in>
Thu, 07 Oct 2010 14:40:21 +0530
changeset 238 c507e9c413c6
child 332 b702c10e5919
permissions -rw-r--r--
Converted the parsing_data into new template form
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     1
.. Objectives
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     2
.. ----------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     3
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     4
.. A - Students and teachers from Science and engineering backgrounds
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     5
   B - 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     6
   C - 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     7
   D - 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     8
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     9
.. Prerequisites
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    10
.. -------------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    11
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    12
..   1. Getting started with lists
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    13
     
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    14
.. Author              : Nishanth Amuluru
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    15
   Internal Reviewer   : 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    16
   External Reviewer   :
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    17
   Checklist OK?       : <put date stamp here, if OK> [2010-10-05]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    18
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    19
Script
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    20
------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    21
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    22
Hello friends and welcome to the tutorial on Parsing Data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    23
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    24
{{{ Show the slide containing title }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    25
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    26
{{{ Show the slide containing the outline slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    27
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    28
In this tutorial, we shall learn
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    29
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    30
 * What we mean by parsing data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    31
 * the string operations required for parsing data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    32
 * datatype conversion
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    33
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    34
#[Puneeth]: Changed a few things, here.  
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    35
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    36
#[Puneeth]: I don't like the way the term "parsing data" has been used, all
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    37
through the script. See if that can be changed.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    38
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    39
 Lets us have a look at the problem
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    40
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    41
{{{ Show the slide containing problem statement. }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    42
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    43
There is an input file containing huge no. of records. Each record corresponds
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    44
to a student.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    45
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    46
{{{ show the slide explaining record structure }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    47
As you can see, each record consists of fields seperated by a ";". The first
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    48
record is region code, then roll number, then name, marks of second language,
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    49
first language, maths, science and social, total marks, pass/fail indicatd by P
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    50
or F and finally W if with held and empty otherwise.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    51
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    52
Our job is to calculate the mean of all the maths marks in the region "B".
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    53
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    54
#[Nishanth]: Please note that I am not telling anything about AA since they do
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    55
             not know about any if/else yet.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    56
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    57
#[Puneeth]: Should we talk pass/fail etc? I think we should make the problem
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    58
 simple and leave out all the columns after total marks. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    59
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    60
Now what is parsing data.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    61
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    62
From the input file, we can see that the data we have is in the form of
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    63
text. Parsing this data is all about reading it and converting it into a form
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    64
which can be used for computations -- in our case, sequence of numbers.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    65
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    66
#[Puneeth]: should the word tokenizing, be used? Should it be defined before
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    67
 using it?
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    68
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    69
We can clearly see that the problem involves reading files and tokenizing.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    70
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    71
#[Puneeth]: the sentence above seems kinda redundant. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    72
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    73
Let us learn about tokenizing strings. Let us define a string first. Type
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    74
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    75
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    76
    line = "parse this           string"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    77
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    78
We are now going to split this string on whitespace.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    79
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    80
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    81
    line.split()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    82
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    83
As you can see, we get a list of strings. Which means, when ``split`` is called
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    84
without any arguments, it splits on whitespace. In simple words, all the spaces
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    85
are treated as one big space.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    86
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    87
``split`` also can split on a string of our choice. This is acheived by passing
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    88
that as an argument. But first lets define a sample record from the file.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    89
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    90
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    91
    record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    92
    record.split(';')
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    93
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    94
We can see that the string is split on ';' and we get each field seperately.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    95
We can also observe that an empty string appears in the list since there are
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    96
two semi colons without anything in between.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    97
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    98
To recap, ``split`` splits on whitespace if called without an argument and
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    99
splits on the given argument if it is called with an argument.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   100
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   101
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   102
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   103
%% 1 %% split the variable line using a space as argument. Is it same as
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   104
        splitting without an argument ?
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   105
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   106
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   107
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   108
We see that when we split on space, multiple whitespaces are not clubbed as one
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   109
and there is an empty string everytime there are two consecutive spaces.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   110
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   111
Now that we know how to split a string, we can split the record and retrieve
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   112
each field seperately. But there is one problem. The region code "B" and a "B"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   113
surrounded by whitespace are treated as two different regions. We must find a
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   114
way to remove all the whitespace around a string so that "B" and a "B" with
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   115
white spaces are dealt as same.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   116
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   117
This is possible by using the ``strip`` method of strings. Let us define a
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   118
string by typing
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   119
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   120
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   121
    unstripped = "     B    "
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   122
    unstripped.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   123
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   124
We can see that strip removes all the whitespace around the sentence
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   125
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   126
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   127
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   128
%% 2 %% What happens to the white space inside the sentence when it is stripped
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   129
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   130
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   131
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   132
Type
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   133
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   134
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   135
    a_str = "         white      space            "
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   136
    a_str.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   137
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   138
We see that the whitespace inside the sentence is only removed and anything
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   139
inside remains unaffected.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   140
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   141
By now we know enough to seperate fields from the record and to strip out any
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   142
white space. The only road block we now have is conversion of string to float.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   143
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   144
The splitting and stripping operations are done on a string and their result is
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   145
also a string. hence the marks that we have are still strings and mathematical
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   146
operations are not possible on them. We must convert them into numbers
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   147
(integers or floats), before we can perform mathematical operations on them. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   148
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   149
We shall look at converting strings into floats. We define a float string
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   150
first. Type 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   151
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   152
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   153
    mark_str = "1.25"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   154
    mark = int(mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   155
    type(mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   156
    type(mark)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   157
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   158
We can see that string is converted to float. We can perform mathematical
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   159
operations on them now.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   160
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   161
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   162
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   163
%% 3 %% What happens if you do int("1.25")
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   164
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   165
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   166
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   167
It raises an error since converting a float string into integer directly is
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   168
not possible. It involves an intermediate step of converting to float.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   169
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   170
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   171
    dcml_str = "1.25"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   172
    flt = float(dcml_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   173
    flt
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   174
    number = int(flt)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   175
    number
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   176
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   177
Using ``int`` it is also possible to convert float into integers.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   178
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   179
Now that we have all the machinery required to parse the file, let us solve the
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   180
problem. We first read the file line by line and parse each record. We see if
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   181
the region code is B and store the marks accordingly.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   182
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   183
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   184
    math_marks_B = [] # an empty list to store the marks
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   185
    for line in open("/home/fossee/sslc1.txt"):
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   186
        fields = line.split(";")
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   187
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   188
        region_code = fields[0]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   189
        region_code_stripped = region_code.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   190
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   191
        math_mark_str = fields[5]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   192
        math_mark = float(math_mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   193
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   194
        if region_code == "AA":
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   195
            math_marks_B.append(math_mark)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   196
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   197
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   198
Now we have all the maths marks of region "B" in the list math_marks_B.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   199
To get the mean, we just have to sum the marks and divide by the length.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   200
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   201
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   202
        math_marks_mean = sum(math_marks_B) / len(math_marks_B)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   203
        math_marks_mean
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   204
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   205
{{{ Show summary slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   206
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   207
This brings us to the end of the tutorial.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   208
we have learnt
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   209
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   210
 * how to tokenize a string using various delimiters
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   211
 * how to get rid of extra white space around
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   212
 * how to convert from one type to another
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   213
 * how to parse input data and perform computations on it
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   214
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   215
{{{ Show the "sponsored by FOSSEE" slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   216
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   217
#[Nishanth]: Will add this line after all of us fix on one.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   218
This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   219
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   220
Hope you have enjoyed and found it useful.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   221
Thank you
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   222
 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   223