parsing_data.rst
author Nishanth <nishanth@fossee.in>
Wed, 06 Oct 2010 16:15:42 +0530
changeset 221 7cd975ff5f0d
parent 219 901b78003917
permissions -rw-r--r--
Added questions
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
     1
.. Author              : Nishanth
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
     2
   Internal Reviewer 1 : 
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
     3
   Internal Reviewer 2 : 
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
     4
   External Reviewer   :
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
     5
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
     6
Hello friends and welcome to the tutorial on Parsing Data
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
     7
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
     8
{{{ Show the slide containing title }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
     9
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    10
{{{ Show the slide containing the outline slide }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    11
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    12
In this tutorial, we shall learn
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    13
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    14
 * What we mean by parsing data
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    15
 * the string operations required for parsing data
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    16
 * datatype conversion
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    17
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    18
#[Puneeth]: Changed a few things, here.  
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    19
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    20
#[Puneeth]: I don't like the way the term "parsing data" has been used, all
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    21
through the script. See if that can be changed.
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    22
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    23
 Lets us have a look at the problem
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    24
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    25
{{{ Show the slide containing problem statement. }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    26
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    27
There is an input file containing huge no. of records. Each record corresponds
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    28
to a student.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    29
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    30
{{{ show the slide explaining record structure }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    31
As you can see, each record consists of fields seperated by a ";". The first
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    32
record is region code, then roll number, then name, marks of second language,
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    33
first language, maths, science and social, total marks, pass/fail indicatd by P
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    34
or F and finally W if with held and empty otherwise.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    35
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    36
Our job is to calculate the mean of all the maths marks in the region "B".
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    37
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    38
#[Nishanth]: Please note that I am not telling anything about AA since they do
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    39
             not know about any if/else yet.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    40
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    41
#[Puneeth]: Should we talk pass/fail etc? I think we should make the problem
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    42
 simple and leave out all the columns after total marks. 
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    43
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    44
Now what is parsing data.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    45
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    46
From the input file, we can see that the data we have is in the form of
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    47
text. Parsing this data is all about reading it and converting it into a form
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    48
which can be used for computations -- in our case, sequence of numbers.
179
1d04b6c5ff44 First Review for parsing_data.rst
amit
parents: 140
diff changeset
    49
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    50
#[Puneeth]: should the word tokenizing, be used? Should it be defined before
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    51
 using it?
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    52
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    53
We can clearly see that the problem involves reading files and tokenizing.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    54
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    55
#[Puneeth]: the sentence above seems kinda redundant. 
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    56
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
    57
Let us learn about tokenizing strings. Let us define a string first. Type
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
    58
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    59
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    60
    line = "parse this           string"
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    61
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
    62
We are now going to split this string on whitespace.
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
    63
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    64
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    65
    line.split()
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    66
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    67
As you can see, we get a list of strings. Which means, when ``split`` is called
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    68
without any arguments, it splits on whitespace. In simple words, all the spaces
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    69
are treated as one big space.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    70
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    71
``split`` also can split on a string of our choice. This is acheived by passing
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
    72
that as an argument. But first lets define a sample record from the file.
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
    73
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    74
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    75
    record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;"
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    76
    record.split(';')
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    77
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    78
We can see that the string is split on ';' and we get each field seperately.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    79
We can also observe that an empty string appears in the list since there are
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    80
two semi colons without anything in between.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    81
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    82
To recap, ``split`` splits on whitespace if called without an argument and
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    83
splits on the given argument if it is called with an argument.
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    84
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    85
{{{ Pause here and try out the following exercises }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    86
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    87
%% 1 %% split the variable line using a space as argument. Is it same as
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    88
        splitting without an argument ?
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    89
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    90
{{{ continue from paused state }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    91
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    92
We see that when we split on space, multiple whitespaces are not clubbed as one
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    93
and there is an empty string everytime there are two consecutive spaces.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    94
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    95
Now that we know how to split a string, we can split the record and retrieve
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
    96
each field seperately. But there is one problem. The region code "B" and a "B"
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    97
surrounded by whitespace are treated as two different regions. We must find a
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    98
way to remove all the whitespace around a string so that "B" and a "B" with
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
    99
white spaces are dealt as same.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   100
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
   101
This is possible by using the ``strip`` method of strings. Let us define a
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   102
string by typing
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   103
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   104
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   105
    unstripped = "     B    "
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   106
    unstripped.strip()
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   107
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   108
We can see that strip removes all the whitespace around the sentence
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   109
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   110
{{{ Pause here and try out the following exercises }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   111
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   112
%% 2 %% What happens to the white space inside the sentence when it is stripped
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   113
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   114
{{{ continue from paused state }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   115
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   116
Type
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   117
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   118
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   119
    a_str = "         white      space            "
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   120
    a_str.strip()
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   121
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   122
We see that the whitespace inside the sentence is only removed and anything
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   123
inside remains unaffected.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   124
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   125
By now we know enough to seperate fields from the record and to strip out any
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   126
white space. The only road block we now have is conversion of string to float.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   127
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   128
The splitting and stripping operations are done on a string and their result is
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   129
also a string. hence the marks that we have are still strings and mathematical
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
   130
operations are not possible on them. We must convert them into numbers
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
   131
(integers or floats), before we can perform mathematical operations on them. 
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   132
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
   133
We shall look at converting strings into floats. We define a float string
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
   134
first. Type 
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   135
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   136
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   137
    mark_str = "1.25"
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   138
    mark = int(mark_str)
140
bc023595e167 added type into the script
nishanth
parents: 137
diff changeset
   139
    type(mark_str)
bc023595e167 added type into the script
nishanth
parents: 137
diff changeset
   140
    type(mark)
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   141
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   142
We can see that string is converted to float. We can perform mathematical
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   143
operations on them now.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   144
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   145
{{{ Pause here and try out the following exercises }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   146
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   147
%% 3 %% What happens if you do int("1.25")
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   148
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   149
{{{ continue from paused state }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   150
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   151
It raises an error since converting a float string into integer directly is
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   152
not possible. It involves an intermediate step of converting to float.
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   153
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   154
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   155
    dcml_str = "1.25"
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   156
    flt = float(dcml_str)
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   157
    flt
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   158
    number = int(flt)
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   159
    number
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   160
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
   161
Using ``int`` it is also possible to convert float into integers.
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   162
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   163
Now that we have all the machinery required to parse the file, let us solve the
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   164
problem. We first read the file line by line and parse each record. We see if
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   165
the region code is B and store the marks accordingly.
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   166
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   167
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   168
    math_marks_B = [] # an empty list to store the marks
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   169
    for line in open("/home/fossee/sslc1.txt"):
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   170
        fields = line.split(";")
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   171
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   172
        region_code = fields[0]
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   173
        region_code_stripped = region_code.strip()
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   174
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   175
        math_mark_str = fields[5]
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   176
        math_mark = float(math_mark_str)
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   177
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   178
        if region_code == "AA":
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   179
            math_marks_B.append(math_mark)
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   180
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   181
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   182
Now we have all the maths marks of region "B" in the list math_marks_B.
137
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   183
To get the mean, we just have to sum the marks and divide by the length.
fc545d07b0ff added a newline before :: so that a colon does not appear in html
nishanth
parents: 134
diff changeset
   184
::
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   185
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   186
        math_marks_mean = sum(math_marks_B) / len(math_marks_B)
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   187
        math_marks_mean
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   188
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   189
{{{ Show summary slide }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   190
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   191
This brings us to the end of the tutorial.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   192
we have learnt
134
543c1cc488ca corrected rst syntax
nishanth
parents: 133
diff changeset
   193
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   194
 * how to tokenize a string using various delimiters
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   195
 * how to get rid of extra white space around
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   196
 * how to convert from one type to another
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   197
 * how to parse input data and perform computations on it
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   198
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   199
{{{ Show the "sponsored by FOSSEE" slide }}}
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   200
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   201
#[Nishanth]: Will add this line after all of us fix on one.
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   202
This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   203
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   204
Hope you have enjoyed and found it useful.
194
ca81c0a67c75 Added comments and minor changes to Parsing Data.
Puneeth Chaganti <punchagan@fossee.in>
parents: 140
diff changeset
   205
Thank you
133
bc93dd9d22c5 initial commit parsing_data
nishanth
parents:
diff changeset
   206
 
219
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   207
Questions
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   208
=========
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   209
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   210
 1. How do you split the string "Guido;Rossum;Python" to get the words
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   211
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   212
   Answer: line.split(';')
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   213
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   214
 2. line.split() and line.split(' ') are same
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   215
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   216
   a. True
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   217
   #. False
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   218
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   219
   Answer: False
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   220
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   221
 3. What is the output of the following code::
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   222
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   223
      line = "Hello;;;World;;"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   224
      sub_strs = line.split()
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   225
      print len(sub_strs)
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   226
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   227
    Answer: 5
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   228
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   229
 4. What is the output of "      Hello    World    ".strip()
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   230
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   231
   a. "Hello World"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   232
   #. "Hello     World"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   233
   #. "      Hello World"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   234
   #. "Hello World     "
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   235
   
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   236
   Answer: "Hello    World"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   237
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   238
 5. What does "It is a cold night".strip("It") produce
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   239
    Hint: Read the documentation of strip
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   240
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   241
   a. "is a cold night"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   242
   #. " is a cold nigh" 
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   243
   #. "It is a cold nigh"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   244
   #. "is a cold nigh"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   245
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   246
   Answer: " is a cold nigh"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   247
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   248
 6. What does int("20") produce
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   249
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   250
   a. "20"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   251
   #. 20.0
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   252
   #. 20
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   253
   #. Error
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   254
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   255
   Answer: 20
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   256
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   257
 7. What does int("20.0") produce
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   258
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   259
   a. 20
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   260
   #. 20.0
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   261
   #. Error
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   262
   #. "20"
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   263
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   264
   Answer: Error
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   265
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   266
 8. What is the value of float(3/2)
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   267
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   268
   a. 1.0
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   269
   #. 1.5
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   270
   #. 1
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   271
   #. Error
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   272
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   273
   Answer: 1.0
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   274
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   275
 9. what doess float("3/2") produce
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   276
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   277
   a. 1.0
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   278
   #. 1.5
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   279
   #. 1
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   280
   #. Error
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   281
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   282
   Answer: Error
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   283
   
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   284
 10. See if there is a function available in pylab to calculate the mean
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   285
     Hint: Use tab completion
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   286
901b78003917 added questions
Nishanth <nishanth@fossee.in>
parents: 197
diff changeset
   287