parsing_data/script.rst
author bhanu
Mon, 15 Nov 2010 14:30:45 +0530
changeset 497 5cc7bcce8de4
parent 332 b702c10e5919
child 498 4255f995a40c
permissions -rw-r--r--
Language check done for `data parsing`
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     1
.. Objectives
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     2
.. ----------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     3
332
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     4
.. By the end of this tutorial you will be able to
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     5
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     6
..  * Split a string using a delimiter
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     7
..  * remove the whitespace around the string
b702c10e5919 Finised plotting_using_sage
Nishanth <nishanth@fossee.in>
parents: 238
diff changeset
     8
..  * convert the variables from one type to other
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
     9
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    10
.. Prerequisites
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    11
.. -------------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    12
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    13
..   1. Getting started with lists
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    14
     
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    15
.. Author              : Nishanth Amuluru
497
5cc7bcce8de4 Language check done for `data parsing`
bhanu
parents: 332
diff changeset
    16
   Internal Reviewer   : Amit
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    17
   External Reviewer   :
497
5cc7bcce8de4 Language check done for `data parsing`
bhanu
parents: 332
diff changeset
    18
   Language Reviewer   : Bhanukiran
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    19
   Checklist OK?       : <put date stamp here, if OK> [2010-10-05]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    20
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    21
Script
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    22
------
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    23
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    24
Hello friends and welcome to the tutorial on Parsing Data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    25
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    26
{{{ Show the slide containing title }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    27
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    28
{{{ Show the slide containing the outline slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    29
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    30
In this tutorial, we shall learn
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    31
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    32
 * What we mean by parsing data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    33
 * the string operations required for parsing data
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    34
 * datatype conversion
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    35
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    36
#[Puneeth]: Changed a few things, here.  
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    37
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    38
#[Puneeth]: I don't like the way the term "parsing data" has been used, all
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    39
through the script. See if that can be changed.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    40
497
5cc7bcce8de4 Language check done for `data parsing`
bhanu
parents: 332
diff changeset
    41
 Let us have a look at the problem
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    42
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    43
{{{ Show the slide containing problem statement. }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    44
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    45
There is an input file containing huge no. of records. Each record corresponds
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    46
to a student.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    47
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    48
{{{ show the slide explaining record structure }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    49
As you can see, each record consists of fields seperated by a ";". The first
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    50
record is region code, then roll number, then name, marks of second language,
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    51
first language, maths, science and social, total marks, pass/fail indicatd by P
497
5cc7bcce8de4 Language check done for `data parsing`
bhanu
parents: 332
diff changeset
    52
or F and finally W if withheld and empty otherwise.
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    53
497
5cc7bcce8de4 Language check done for `data parsing`
bhanu
parents: 332
diff changeset
    54
Our job is to calculate the arithmetic mean of all the maths marks in the region "B".
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    55
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    56
#[Nishanth]: Please note that I am not telling anything about AA since they do
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    57
             not know about any if/else yet.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    58
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    59
#[Puneeth]: Should we talk pass/fail etc? I think we should make the problem
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    60
 simple and leave out all the columns after total marks. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    61
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    62
Now what is parsing data.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    63
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    64
From the input file, we can see that the data we have is in the form of
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    65
text. Parsing this data is all about reading it and converting it into a form
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    66
which can be used for computations -- in our case, sequence of numbers.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    67
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    68
#[Puneeth]: should the word tokenizing, be used? Should it be defined before
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    69
 using it?
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    70
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    71
We can clearly see that the problem involves reading files and tokenizing.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    72
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    73
#[Puneeth]: the sentence above seems kinda redundant. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    74
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    75
Let us learn about tokenizing strings. Let us define a string first. Type
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    76
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    77
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    78
    line = "parse this           string"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    79
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    80
We are now going to split this string on whitespace.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    81
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    82
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    83
    line.split()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    84
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    85
As you can see, we get a list of strings. Which means, when ``split`` is called
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    86
without any arguments, it splits on whitespace. In simple words, all the spaces
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    87
are treated as one big space.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    88
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    89
``split`` also can split on a string of our choice. This is acheived by passing
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    90
that as an argument. But first lets define a sample record from the file.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    91
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    92
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    93
    record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    94
    record.split(';')
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    95
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    96
We can see that the string is split on ';' and we get each field seperately.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    97
We can also observe that an empty string appears in the list since there are
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    98
two semi colons without anything in between.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
    99
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   100
To recap, ``split`` splits on whitespace if called without an argument and
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   101
splits on the given argument if it is called with an argument.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   102
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   103
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   104
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   105
%% 1 %% split the variable line using a space as argument. Is it same as
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   106
        splitting without an argument ?
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   107
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   108
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   109
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   110
We see that when we split on space, multiple whitespaces are not clubbed as one
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   111
and there is an empty string everytime there are two consecutive spaces.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   112
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   113
Now that we know how to split a string, we can split the record and retrieve
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   114
each field seperately. But there is one problem. The region code "B" and a "B"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   115
surrounded by whitespace are treated as two different regions. We must find a
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   116
way to remove all the whitespace around a string so that "B" and a "B" with
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   117
white spaces are dealt as same.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   118
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   119
This is possible by using the ``strip`` method of strings. Let us define a
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   120
string by typing
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   121
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   122
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   123
    unstripped = "     B    "
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   124
    unstripped.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   125
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   126
We can see that strip removes all the whitespace around the sentence
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   127
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   128
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   129
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   130
%% 2 %% What happens to the white space inside the sentence when it is stripped
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   131
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   132
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   133
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   134
Type
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   135
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   136
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   137
    a_str = "         white      space            "
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   138
    a_str.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   139
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   140
We see that the whitespace inside the sentence is only removed and anything
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   141
inside remains unaffected.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   142
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   143
By now we know enough to seperate fields from the record and to strip out any
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   144
white space. The only road block we now have is conversion of string to float.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   145
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   146
The splitting and stripping operations are done on a string and their result is
497
5cc7bcce8de4 Language check done for `data parsing`
bhanu
parents: 332
diff changeset
   147
also a string. Hence the marks that we have are still strings and mathematical
238
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   148
operations are not possible on them. We must convert them into numbers
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   149
(integers or floats), before we can perform mathematical operations on them. 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   150
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   151
We shall look at converting strings into floats. We define a float string
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   152
first. Type 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   153
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   154
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   155
    mark_str = "1.25"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   156
    mark = int(mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   157
    type(mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   158
    type(mark)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   159
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   160
We can see that string is converted to float. We can perform mathematical
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   161
operations on them now.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   162
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   163
{{{ Pause here and try out the following exercises }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   164
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   165
%% 3 %% What happens if you do int("1.25")
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   166
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   167
{{{ continue from paused state }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   168
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   169
It raises an error since converting a float string into integer directly is
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   170
not possible. It involves an intermediate step of converting to float.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   171
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   172
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   173
    dcml_str = "1.25"
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   174
    flt = float(dcml_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   175
    flt
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   176
    number = int(flt)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   177
    number
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   178
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   179
Using ``int`` it is also possible to convert float into integers.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   180
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   181
Now that we have all the machinery required to parse the file, let us solve the
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   182
problem. We first read the file line by line and parse each record. We see if
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   183
the region code is B and store the marks accordingly.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   184
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   185
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   186
    math_marks_B = [] # an empty list to store the marks
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   187
    for line in open("/home/fossee/sslc1.txt"):
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   188
        fields = line.split(";")
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   189
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   190
        region_code = fields[0]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   191
        region_code_stripped = region_code.strip()
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   192
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   193
        math_mark_str = fields[5]
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   194
        math_mark = float(math_mark_str)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   195
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   196
        if region_code == "AA":
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   197
            math_marks_B.append(math_mark)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   198
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   199
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   200
Now we have all the maths marks of region "B" in the list math_marks_B.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   201
To get the mean, we just have to sum the marks and divide by the length.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   202
::
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   203
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   204
        math_marks_mean = sum(math_marks_B) / len(math_marks_B)
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   205
        math_marks_mean
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   206
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   207
{{{ Show summary slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   208
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   209
This brings us to the end of the tutorial.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   210
we have learnt
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   211
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   212
 * how to tokenize a string using various delimiters
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   213
 * how to get rid of extra white space around
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   214
 * how to convert from one type to another
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   215
 * how to parse input data and perform computations on it
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   216
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   217
{{{ Show the "sponsored by FOSSEE" slide }}}
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   218
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   219
#[Nishanth]: Will add this line after all of us fix on one.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   220
This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   221
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   222
Hope you have enjoyed and found it useful.
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   223
Thank you
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   224
 
c507e9c413c6 Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff changeset
   225