parsing_data/script.rst
author Amit Sethi
Thu, 11 Nov 2010 17:28:23 +0530
changeset 468 ac1198488c0e
parent 332 b702c10e5919
child 497 5cc7bcce8de4
permissions -rw-r--r--
Merging heads
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     1
.. Objectives
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     2
.. ----------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     3
332
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     4
.. By the end of this tutorial you will be able to
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     5
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     6
..  * Split a string using a delimiter
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     7
..  * remove the whitespace around the string
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     8
..  * convert the variables from one type to other
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     9
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    10
.. Prerequisites
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    11
.. -------------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    12
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    13
..   1. Getting started with lists
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    14
     
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    15
.. Author              : Nishanth Amuluru
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    16
   Internal Reviewer   : 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    17
   External Reviewer   :
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    18
   Checklist OK?       : <put date stamp here, if OK> [2010-10-05]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    19
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    20
Script
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    21
------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    22
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    23
Hello friends and welcome to the tutorial on Parsing Data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    24
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    25
{{{ Show the slide containing title }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    26
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    27
{{{ Show the slide containing the outline slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    28
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    29
In this tutorial, we shall learn
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    30
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    31
 * What we mean by parsing data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    32
 * the string operations required for parsing data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    33
 * datatype conversion
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    34
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    35
#[Puneeth]: Changed a few things, here.  
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    36
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    37
#[Puneeth]: I don't like the way the term "parsing data" has been used, all
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    38
through the script. See if that can be changed.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    39
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    40
 Lets us have a look at the problem
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    41
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    42
{{{ Show the slide containing problem statement. }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    43
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    44
There is an input file containing huge no. of records. Each record corresponds
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    45
to a student.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    46
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    47
{{{ show the slide explaining record structure }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    48
As you can see, each record consists of fields seperated by a ";". The first
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    49
record is region code, then roll number, then name, marks of second language,
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    50
first language, maths, science and social, total marks, pass/fail indicatd by P
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    51
or F and finally W if with held and empty otherwise.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    52
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    53
Our job is to calculate the mean of all the maths marks in the region "B".
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    54
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    55
#[Nishanth]: Please note that I am not telling anything about AA since they do
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    56
             not know about any if/else yet.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    57
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    58
#[Puneeth]: Should we talk pass/fail etc? I think we should make the problem
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    59
 simple and leave out all the columns after total marks. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    60
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    61
Now what is parsing data.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    62
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    63
From the input file, we can see that the data we have is in the form of
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    64
text. Parsing this data is all about reading it and converting it into a form
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    65
which can be used for computations -- in our case, sequence of numbers.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    66
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    67
#[Puneeth]: should the word tokenizing, be used? Should it be defined before
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    68
 using it?
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    69
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    70
We can clearly see that the problem involves reading files and tokenizing.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    71
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    72
#[Puneeth]: the sentence above seems kinda redundant. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    73
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    74
Let us learn about tokenizing strings. Let us define a string first. Type
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    75
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    76
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    77
    line = "parse this           string"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    78
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    79
We are now going to split this string on whitespace.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    80
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    81
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    82
    line.split()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    83
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    84
As you can see, we get a list of strings. Which means, when ``split`` is called
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    85
without any arguments, it splits on whitespace. In simple words, all the spaces
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    86
are treated as one big space.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    87
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    88
``split`` also can split on a string of our choice. This is acheived by passing
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    89
that as an argument. But first lets define a sample record from the file.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    90
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    91
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    92
    record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    93
    record.split(';')
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    94
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    95
We can see that the string is split on ';' and we get each field seperately.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    96
We can also observe that an empty string appears in the list since there are
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    97
two semi colons without anything in between.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    98
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    99
To recap, ``split`` splits on whitespace if called without an argument and
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   100
splits on the given argument if it is called with an argument.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   101
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   102
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   103
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   104
%% 1 %% split the variable line using a space as argument. Is it same as
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   105
        splitting without an argument ?
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   106
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   107
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   108
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   109
We see that when we split on space, multiple whitespaces are not clubbed as one
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   110
and there is an empty string everytime there are two consecutive spaces.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   111
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   112
Now that we know how to split a string, we can split the record and retrieve
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   113
each field seperately. But there is one problem. The region code "B" and a "B"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   114
surrounded by whitespace are treated as two different regions. We must find a
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   115
way to remove all the whitespace around a string so that "B" and a "B" with
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   116
white spaces are dealt as same.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   117
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   118
This is possible by using the ``strip`` method of strings. Let us define a
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   119
string by typing
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   120
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   121
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   122
    unstripped = "     B    "
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   123
    unstripped.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   124
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   125
We can see that strip removes all the whitespace around the sentence
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   126
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   127
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   128
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   129
%% 2 %% What happens to the white space inside the sentence when it is stripped
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   130
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   131
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   132
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   133
Type
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   134
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   135
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   136
    a_str = "         white      space            "
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   137
    a_str.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   138
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   139
We see that the whitespace inside the sentence is only removed and anything
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   140
inside remains unaffected.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   141
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   142
By now we know enough to seperate fields from the record and to strip out any
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   143
white space. The only road block we now have is conversion of string to float.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   144
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   145
The splitting and stripping operations are done on a string and their result is
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   146
also a string. hence the marks that we have are still strings and mathematical
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   147
operations are not possible on them. We must convert them into numbers
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   148
(integers or floats), before we can perform mathematical operations on them. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   149
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   150
We shall look at converting strings into floats. We define a float string
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   151
first. Type 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   152
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   153
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   154
    mark_str = "1.25"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   155
    mark = int(mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   156
    type(mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   157
    type(mark)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   158
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   159
We can see that string is converted to float. We can perform mathematical
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   160
operations on them now.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   161
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   162
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   163
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   164
%% 3 %% What happens if you do int("1.25")
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   165
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   166
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   167
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   168
It raises an error since converting a float string into integer directly is
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   169
not possible. It involves an intermediate step of converting to float.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   170
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   171
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   172
    dcml_str = "1.25"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   173
    flt = float(dcml_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   174
    flt
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   175
    number = int(flt)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   176
    number
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   177
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   178
Using ``int`` it is also possible to convert float into integers.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   179
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   180
Now that we have all the machinery required to parse the file, let us solve the
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   181
problem. We first read the file line by line and parse each record. We see if
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   182
the region code is B and store the marks accordingly.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   183
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   184
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   185
    math_marks_B = [] # an empty list to store the marks
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   186
    for line in open("/home/fossee/sslc1.txt"):
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   187
        fields = line.split(";")
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   188
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   189
        region_code = fields[0]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   190
        region_code_stripped = region_code.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   191
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   192
        math_mark_str = fields[5]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   193
        math_mark = float(math_mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   194
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   195
        if region_code == "AA":
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   196
            math_marks_B.append(math_mark)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   197
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   198
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   199
Now we have all the maths marks of region "B" in the list math_marks_B.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   200
To get the mean, we just have to sum the marks and divide by the length.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   201
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   202
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   203
        math_marks_mean = sum(math_marks_B) / len(math_marks_B)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   204
        math_marks_mean
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   205
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   206
{{{ Show summary slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   207
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   208
This brings us to the end of the tutorial.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   209
we have learnt
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   210
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   211
 * how to tokenize a string using various delimiters
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   212
 * how to get rid of extra white space around
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   213
 * how to convert from one type to another
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   214
 * how to parse input data and perform computations on it
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   215
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   216
{{{ Show the "sponsored by FOSSEE" slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   217
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   218
#[Nishanth]: Will add this line after all of us fix on one.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   219
This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   220
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   221
Hope you have enjoyed and found it useful.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   222
Thank you
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   223
 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   224