author | Puneeth Chaganti <punchagan@fossee.in> |
Wed, 10 Nov 2010 10:24:03 +0530 | |
changeset 435 | 975677bf1b8a |
parent 332 | b702c10e5919 |
child 497 | 5cc7bcce8de4 |
permissions | -rw-r--r-- |
238
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
1 |
.. Objectives |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
2 |
.. ---------- |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
3 |
|
332 | 4 |
.. By the end of this tutorial you will be able to |
5 |
||
6 |
.. * Split a string using a delimiter |
|
7 |
.. * remove the whitespace around the string |
|
8 |
.. * convert the variables from one type to other |
|
238
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
9 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
10 |
.. Prerequisites |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
11 |
.. ------------- |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
12 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
13 |
.. 1. Getting started with lists |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
14 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
15 |
.. Author : Nishanth Amuluru |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
16 |
Internal Reviewer : |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
17 |
External Reviewer : |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
18 |
Checklist OK? : <put date stamp here, if OK> [2010-10-05] |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
19 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
20 |
Script |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
21 |
------ |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
22 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
23 |
Hello friends and welcome to the tutorial on Parsing Data |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
24 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
25 |
{{{ Show the slide containing title }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
26 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
27 |
{{{ Show the slide containing the outline slide }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
28 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
29 |
In this tutorial, we shall learn |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
30 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
31 |
* What we mean by parsing data |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
32 |
* the string operations required for parsing data |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
33 |
* datatype conversion |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
34 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
35 |
#[Puneeth]: Changed a few things, here. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
36 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
37 |
#[Puneeth]: I don't like the way the term "parsing data" has been used, all |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
38 |
through the script. See if that can be changed. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
39 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
40 |
Lets us have a look at the problem |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
41 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
42 |
{{{ Show the slide containing problem statement. }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
43 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
44 |
There is an input file containing huge no. of records. Each record corresponds |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
45 |
to a student. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
46 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
47 |
{{{ show the slide explaining record structure }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
48 |
As you can see, each record consists of fields seperated by a ";". The first |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
49 |
record is region code, then roll number, then name, marks of second language, |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
50 |
first language, maths, science and social, total marks, pass/fail indicatd by P |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
51 |
or F and finally W if with held and empty otherwise. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
52 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
53 |
Our job is to calculate the mean of all the maths marks in the region "B". |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
54 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
55 |
#[Nishanth]: Please note that I am not telling anything about AA since they do |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
56 |
not know about any if/else yet. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
57 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
58 |
#[Puneeth]: Should we talk pass/fail etc? I think we should make the problem |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
59 |
simple and leave out all the columns after total marks. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
60 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
61 |
Now what is parsing data. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
62 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
63 |
From the input file, we can see that the data we have is in the form of |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
64 |
text. Parsing this data is all about reading it and converting it into a form |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
65 |
which can be used for computations -- in our case, sequence of numbers. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
66 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
67 |
#[Puneeth]: should the word tokenizing, be used? Should it be defined before |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
68 |
using it? |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
69 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
70 |
We can clearly see that the problem involves reading files and tokenizing. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
71 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
72 |
#[Puneeth]: the sentence above seems kinda redundant. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
73 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
74 |
Let us learn about tokenizing strings. Let us define a string first. Type |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
75 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
76 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
77 |
line = "parse this string" |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
78 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
79 |
We are now going to split this string on whitespace. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
80 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
81 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
82 |
line.split() |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
83 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
84 |
As you can see, we get a list of strings. Which means, when ``split`` is called |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
85 |
without any arguments, it splits on whitespace. In simple words, all the spaces |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
86 |
are treated as one big space. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
87 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
88 |
``split`` also can split on a string of our choice. This is acheived by passing |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
89 |
that as an argument. But first lets define a sample record from the file. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
90 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
91 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
92 |
record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;" |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
93 |
record.split(';') |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
94 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
95 |
We can see that the string is split on ';' and we get each field seperately. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
96 |
We can also observe that an empty string appears in the list since there are |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
97 |
two semi colons without anything in between. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
98 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
99 |
To recap, ``split`` splits on whitespace if called without an argument and |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
100 |
splits on the given argument if it is called with an argument. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
101 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
102 |
{{{ Pause here and try out the following exercises }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
103 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
104 |
%% 1 %% split the variable line using a space as argument. Is it same as |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
105 |
splitting without an argument ? |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
106 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
107 |
{{{ continue from paused state }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
108 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
109 |
We see that when we split on space, multiple whitespaces are not clubbed as one |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
110 |
and there is an empty string everytime there are two consecutive spaces. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
111 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
112 |
Now that we know how to split a string, we can split the record and retrieve |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
113 |
each field seperately. But there is one problem. The region code "B" and a "B" |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
114 |
surrounded by whitespace are treated as two different regions. We must find a |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
115 |
way to remove all the whitespace around a string so that "B" and a "B" with |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
116 |
white spaces are dealt as same. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
117 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
118 |
This is possible by using the ``strip`` method of strings. Let us define a |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
119 |
string by typing |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
120 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
121 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
122 |
unstripped = " B " |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
123 |
unstripped.strip() |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
124 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
125 |
We can see that strip removes all the whitespace around the sentence |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
126 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
127 |
{{{ Pause here and try out the following exercises }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
128 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
129 |
%% 2 %% What happens to the white space inside the sentence when it is stripped |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
130 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
131 |
{{{ continue from paused state }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
132 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
133 |
Type |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
134 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
135 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
136 |
a_str = " white space " |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
137 |
a_str.strip() |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
138 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
139 |
We see that the whitespace inside the sentence is only removed and anything |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
140 |
inside remains unaffected. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
141 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
142 |
By now we know enough to seperate fields from the record and to strip out any |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
143 |
white space. The only road block we now have is conversion of string to float. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
144 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
145 |
The splitting and stripping operations are done on a string and their result is |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
146 |
also a string. hence the marks that we have are still strings and mathematical |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
147 |
operations are not possible on them. We must convert them into numbers |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
148 |
(integers or floats), before we can perform mathematical operations on them. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
149 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
150 |
We shall look at converting strings into floats. We define a float string |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
151 |
first. Type |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
152 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
153 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
154 |
mark_str = "1.25" |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
155 |
mark = int(mark_str) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
156 |
type(mark_str) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
157 |
type(mark) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
158 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
159 |
We can see that string is converted to float. We can perform mathematical |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
160 |
operations on them now. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
161 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
162 |
{{{ Pause here and try out the following exercises }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
163 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
164 |
%% 3 %% What happens if you do int("1.25") |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
165 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
166 |
{{{ continue from paused state }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
167 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
168 |
It raises an error since converting a float string into integer directly is |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
169 |
not possible. It involves an intermediate step of converting to float. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
170 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
171 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
172 |
dcml_str = "1.25" |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
173 |
flt = float(dcml_str) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
174 |
flt |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
175 |
number = int(flt) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
176 |
number |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
177 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
178 |
Using ``int`` it is also possible to convert float into integers. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
179 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
180 |
Now that we have all the machinery required to parse the file, let us solve the |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
181 |
problem. We first read the file line by line and parse each record. We see if |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
182 |
the region code is B and store the marks accordingly. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
183 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
184 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
185 |
math_marks_B = [] # an empty list to store the marks |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
186 |
for line in open("/home/fossee/sslc1.txt"): |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
187 |
fields = line.split(";") |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
188 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
189 |
region_code = fields[0] |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
190 |
region_code_stripped = region_code.strip() |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
191 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
192 |
math_mark_str = fields[5] |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
193 |
math_mark = float(math_mark_str) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
194 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
195 |
if region_code == "AA": |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
196 |
math_marks_B.append(math_mark) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
197 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
198 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
199 |
Now we have all the maths marks of region "B" in the list math_marks_B. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
200 |
To get the mean, we just have to sum the marks and divide by the length. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
201 |
:: |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
202 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
203 |
math_marks_mean = sum(math_marks_B) / len(math_marks_B) |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
204 |
math_marks_mean |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
205 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
206 |
{{{ Show summary slide }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
207 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
208 |
This brings us to the end of the tutorial. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
209 |
we have learnt |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
210 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
211 |
* how to tokenize a string using various delimiters |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
212 |
* how to get rid of extra white space around |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
213 |
* how to convert from one type to another |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
214 |
* how to parse input data and perform computations on it |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
215 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
216 |
{{{ Show the "sponsored by FOSSEE" slide }}} |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
217 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
218 |
#[Nishanth]: Will add this line after all of us fix on one. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
219 |
This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
220 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
221 |
Hope you have enjoyed and found it useful. |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
222 |
Thank you |
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
223 |
|
c507e9c413c6
Converted the parsing_data into new template form
Nishanth <nishanth@fossee.in>
parents:
diff
changeset
|
224 |