Converted the parsing_data into new template form
authorNishanth <nishanth@fossee.in>
Thu, 07 Oct 2010 14:40:21 +0530
changeset 238 c507e9c413c6
parent 237 6c203780bfbe
child 252 0ff3f1a97068
child 255 75fd106303dc
Converted the parsing_data into new template form
parsing_data.rst
parsing_data/questions.rst
parsing_data/quickref.tex
parsing_data/script.rst
parsing_data/slides.tex
--- a/parsing_data.rst	Thu Oct 07 14:33:45 2010 +0530
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,287 +0,0 @@
-.. Author              : Nishanth
-   Internal Reviewer 1 : 
-   Internal Reviewer 2 : 
-   External Reviewer   :
-
-Hello friends and welcome to the tutorial on Parsing Data
-
-{{{ Show the slide containing title }}}
-
-{{{ Show the slide containing the outline slide }}}
-
-In this tutorial, we shall learn
-
- * What we mean by parsing data
- * the string operations required for parsing data
- * datatype conversion
-
-#[Puneeth]: Changed a few things, here.  
-
-#[Puneeth]: I don't like the way the term "parsing data" has been used, all
-through the script. See if that can be changed.
-
- Lets us have a look at the problem
-
-{{{ Show the slide containing problem statement. }}}
-
-There is an input file containing huge no. of records. Each record corresponds
-to a student.
-
-{{{ show the slide explaining record structure }}}
-As you can see, each record consists of fields seperated by a ";". The first
-record is region code, then roll number, then name, marks of second language,
-first language, maths, science and social, total marks, pass/fail indicatd by P
-or F and finally W if with held and empty otherwise.
-
-Our job is to calculate the mean of all the maths marks in the region "B".
-
-#[Nishanth]: Please note that I am not telling anything about AA since they do
-             not know about any if/else yet.
-
-#[Puneeth]: Should we talk pass/fail etc? I think we should make the problem
- simple and leave out all the columns after total marks. 
-
-Now what is parsing data.
-
-From the input file, we can see that the data we have is in the form of
-text. Parsing this data is all about reading it and converting it into a form
-which can be used for computations -- in our case, sequence of numbers.
-
-#[Puneeth]: should the word tokenizing, be used? Should it be defined before
- using it?
-
-We can clearly see that the problem involves reading files and tokenizing.
-
-#[Puneeth]: the sentence above seems kinda redundant. 
-
-Let us learn about tokenizing strings. Let us define a string first. Type
-::
-
-    line = "parse this           string"
-
-We are now going to split this string on whitespace.
-::
-
-    line.split()
-
-As you can see, we get a list of strings. Which means, when ``split`` is called
-without any arguments, it splits on whitespace. In simple words, all the spaces
-are treated as one big space.
-
-``split`` also can split on a string of our choice. This is acheived by passing
-that as an argument. But first lets define a sample record from the file.
-::
-
-    record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;"
-    record.split(';')
-
-We can see that the string is split on ';' and we get each field seperately.
-We can also observe that an empty string appears in the list since there are
-two semi colons without anything in between.
-
-To recap, ``split`` splits on whitespace if called without an argument and
-splits on the given argument if it is called with an argument.
-
-{{{ Pause here and try out the following exercises }}}
-
-%% 1 %% split the variable line using a space as argument. Is it same as
-        splitting without an argument ?
-
-{{{ continue from paused state }}}
-
-We see that when we split on space, multiple whitespaces are not clubbed as one
-and there is an empty string everytime there are two consecutive spaces.
-
-Now that we know how to split a string, we can split the record and retrieve
-each field seperately. But there is one problem. The region code "B" and a "B"
-surrounded by whitespace are treated as two different regions. We must find a
-way to remove all the whitespace around a string so that "B" and a "B" with
-white spaces are dealt as same.
-
-This is possible by using the ``strip`` method of strings. Let us define a
-string by typing
-::
-
-    unstripped = "     B    "
-    unstripped.strip()
-
-We can see that strip removes all the whitespace around the sentence
-
-{{{ Pause here and try out the following exercises }}}
-
-%% 2 %% What happens to the white space inside the sentence when it is stripped
-
-{{{ continue from paused state }}}
-
-Type
-::
-
-    a_str = "         white      space            "
-    a_str.strip()
-
-We see that the whitespace inside the sentence is only removed and anything
-inside remains unaffected.
-
-By now we know enough to seperate fields from the record and to strip out any
-white space. The only road block we now have is conversion of string to float.
-
-The splitting and stripping operations are done on a string and their result is
-also a string. hence the marks that we have are still strings and mathematical
-operations are not possible on them. We must convert them into numbers
-(integers or floats), before we can perform mathematical operations on them. 
-
-We shall look at converting strings into floats. We define a float string
-first. Type 
-::
-
-    mark_str = "1.25"
-    mark = int(mark_str)
-    type(mark_str)
-    type(mark)
-
-We can see that string is converted to float. We can perform mathematical
-operations on them now.
-
-{{{ Pause here and try out the following exercises }}}
-
-%% 3 %% What happens if you do int("1.25")
-
-{{{ continue from paused state }}}
-
-It raises an error since converting a float string into integer directly is
-not possible. It involves an intermediate step of converting to float.
-::
-
-    dcml_str = "1.25"
-    flt = float(dcml_str)
-    flt
-    number = int(flt)
-    number
-
-Using ``int`` it is also possible to convert float into integers.
-
-Now that we have all the machinery required to parse the file, let us solve the
-problem. We first read the file line by line and parse each record. We see if
-the region code is B and store the marks accordingly.
-::
-
-    math_marks_B = [] # an empty list to store the marks
-    for line in open("/home/fossee/sslc1.txt"):
-        fields = line.split(";")
-
-        region_code = fields[0]
-        region_code_stripped = region_code.strip()
-
-        math_mark_str = fields[5]
-        math_mark = float(math_mark_str)
-
-        if region_code == "AA":
-            math_marks_B.append(math_mark)
-
-
-Now we have all the maths marks of region "B" in the list math_marks_B.
-To get the mean, we just have to sum the marks and divide by the length.
-::
-
-        math_marks_mean = sum(math_marks_B) / len(math_marks_B)
-        math_marks_mean
-
-{{{ Show summary slide }}}
-
-This brings us to the end of the tutorial.
-we have learnt
-
- * how to tokenize a string using various delimiters
- * how to get rid of extra white space around
- * how to convert from one type to another
- * how to parse input data and perform computations on it
-
-{{{ Show the "sponsored by FOSSEE" slide }}}
-
-#[Nishanth]: Will add this line after all of us fix on one.
-This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India
-
-Hope you have enjoyed and found it useful.
-Thank you
- 
-Questions
-=========
-
- 1. How do you split the string "Guido;Rossum;Python" to get the words
-
-   Answer: line.split(';')
-
- 2. line.split() and line.split(' ') are same
-
-   a. True
-   #. False
-
-   Answer: False
-
- 3. What is the output of the following code::
-
-      line = "Hello;;;World;;"
-      sub_strs = line.split()
-      print len(sub_strs)
-
-    Answer: 5
-
- 4. What is the output of "      Hello    World    ".strip()
-
-   a. "Hello World"
-   #. "Hello     World"
-   #. "      Hello World"
-   #. "Hello World     "
-   
-   Answer: "Hello    World"
-
- 5. What does "It is a cold night".strip("It") produce
-    Hint: Read the documentation of strip
-
-   a. "is a cold night"
-   #. " is a cold nigh" 
-   #. "It is a cold nigh"
-   #. "is a cold nigh"
-
-   Answer: " is a cold nigh"
-
- 6. What does int("20") produce
-
-   a. "20"
-   #. 20.0
-   #. 20
-   #. Error
-
-   Answer: 20
-
- 7. What does int("20.0") produce
-
-   a. 20
-   #. 20.0
-   #. Error
-   #. "20"
-
-   Answer: Error
-
- 8. What is the value of float(3/2)
-
-   a. 1.0
-   #. 1.5
-   #. 1
-   #. Error
-
-   Answer: 1.0
-
- 9. what doess float("3/2") produce
-
-   a. 1.0
-   #. 1.5
-   #. 1
-   #. Error
-
-   Answer: Error
-   
- 10. See if there is a function available in pylab to calculate the mean
-     Hint: Use tab completion
-
-
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/parsing_data/questions.rst	Thu Oct 07 14:40:21 2010 +0530
@@ -0,0 +1,102 @@
+Objective Questions
+-------------------
+
+ 1. How do you split the string "Guido;Rossum;Python" to get the words
+
+   Answer: line.split(';')
+
+ 2. line.split() and line.split(' ') are same
+
+   a. True
+   #. False
+
+   Answer: False
+
+ 3. What is the output of the following code::
+
+      line = "Hello;;;World;;"
+      sub_strs = line.split()
+      print len(sub_strs)
+
+    Answer: 5
+
+ 4. What is the output of "      Hello    World    ".strip()
+
+   a. "Hello World"
+   #. "Hello     World"
+   #. "      Hello World"
+   #. "Hello World     "
+   
+   Answer: "Hello    World"
+
+ 5. What does "It is a cold night".strip("It") produce
+    Hint: Read the documentation of strip
+
+   a. "is a cold night"
+   #. " is a cold nigh" 
+   #. "It is a cold nigh"
+   #. "is a cold nigh"
+
+   Answer: " is a cold nigh"
+
+ 6. What does int("20") produce
+
+   a. "20"
+   #. 20.0
+   #. 20
+   #. Error
+
+   Answer: 20
+
+ 7. What does int("20.0") produce
+
+   a. 20
+   #. 20.0
+   #. Error
+   #. "20"
+
+   Answer: Error
+
+ 8. What is the value of float(3/2)
+
+   a. 1.0
+   #. 1.5
+   #. 1
+   #. Error
+
+   Answer: 1.0
+
+ 9. what doess float("3/2") produce
+
+   a. 1.0
+   #. 1.5
+   #. 1
+   #. Error
+
+   Answer: Error
+   
+ 10. See if there is a function available in pylab to calculate the mean
+     Hint: Use tab completion
+
+Larger Questions
+================
+
+ 1. The file ``pos.txt`` contains two columns of data. The first and second
+    columns are the x and y co-ordiantes of a particle in motion, respectively.
+    Plot the trajectory of the particle.
+
+   Answer::
+
+     x_values = []
+     y_values = []
+
+     for line in open("/home/fossee/pos.txt");
+         x_str, y_str = line.split()
+         x = int(x_str)
+         y = int(y_str)
+
+         x_values.append(x)
+         y_values.append(y)
+
+         plot(x, y, 'b.')
+   
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/parsing_data/quickref.tex	Thu Oct 07 14:40:21 2010 +0530
@@ -0,0 +1,11 @@
+Creating a tuple:\\
+{\ex \lstinline|    t = (1, "hello", 2.5)|}
+
+Accessing elements of tuples:\\
+{\ex \lstinline|    t[index] Ex: t[2]|}
+
+Accessing slices of tuples:\\
+{\ex \lstinline|    t[start:stop:step]|}
+
+Swapping values:\\
+{\ex \lstinline|    a, b = b, a|}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/parsing_data/script.rst	Thu Oct 07 14:40:21 2010 +0530
@@ -0,0 +1,223 @@
+.. Objectives
+.. ----------
+
+.. A - Students and teachers from Science and engineering backgrounds
+   B - 
+   C - 
+   D - 
+
+.. Prerequisites
+.. -------------
+
+..   1. Getting started with lists
+     
+.. Author              : Nishanth Amuluru
+   Internal Reviewer   : 
+   External Reviewer   :
+   Checklist OK?       : <put date stamp here, if OK> [2010-10-05]
+
+Script
+------
+
+Hello friends and welcome to the tutorial on Parsing Data
+
+{{{ Show the slide containing title }}}
+
+{{{ Show the slide containing the outline slide }}}
+
+In this tutorial, we shall learn
+
+ * What we mean by parsing data
+ * the string operations required for parsing data
+ * datatype conversion
+
+#[Puneeth]: Changed a few things, here.  
+
+#[Puneeth]: I don't like the way the term "parsing data" has been used, all
+through the script. See if that can be changed.
+
+ Lets us have a look at the problem
+
+{{{ Show the slide containing problem statement. }}}
+
+There is an input file containing huge no. of records. Each record corresponds
+to a student.
+
+{{{ show the slide explaining record structure }}}
+As you can see, each record consists of fields seperated by a ";". The first
+record is region code, then roll number, then name, marks of second language,
+first language, maths, science and social, total marks, pass/fail indicatd by P
+or F and finally W if with held and empty otherwise.
+
+Our job is to calculate the mean of all the maths marks in the region "B".
+
+#[Nishanth]: Please note that I am not telling anything about AA since they do
+             not know about any if/else yet.
+
+#[Puneeth]: Should we talk pass/fail etc? I think we should make the problem
+ simple and leave out all the columns after total marks. 
+
+Now what is parsing data.
+
+From the input file, we can see that the data we have is in the form of
+text. Parsing this data is all about reading it and converting it into a form
+which can be used for computations -- in our case, sequence of numbers.
+
+#[Puneeth]: should the word tokenizing, be used? Should it be defined before
+ using it?
+
+We can clearly see that the problem involves reading files and tokenizing.
+
+#[Puneeth]: the sentence above seems kinda redundant. 
+
+Let us learn about tokenizing strings. Let us define a string first. Type
+::
+
+    line = "parse this           string"
+
+We are now going to split this string on whitespace.
+::
+
+    line.split()
+
+As you can see, we get a list of strings. Which means, when ``split`` is called
+without any arguments, it splits on whitespace. In simple words, all the spaces
+are treated as one big space.
+
+``split`` also can split on a string of our choice. This is acheived by passing
+that as an argument. But first lets define a sample record from the file.
+::
+
+    record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;"
+    record.split(';')
+
+We can see that the string is split on ';' and we get each field seperately.
+We can also observe that an empty string appears in the list since there are
+two semi colons without anything in between.
+
+To recap, ``split`` splits on whitespace if called without an argument and
+splits on the given argument if it is called with an argument.
+
+{{{ Pause here and try out the following exercises }}}
+
+%% 1 %% split the variable line using a space as argument. Is it same as
+        splitting without an argument ?
+
+{{{ continue from paused state }}}
+
+We see that when we split on space, multiple whitespaces are not clubbed as one
+and there is an empty string everytime there are two consecutive spaces.
+
+Now that we know how to split a string, we can split the record and retrieve
+each field seperately. But there is one problem. The region code "B" and a "B"
+surrounded by whitespace are treated as two different regions. We must find a
+way to remove all the whitespace around a string so that "B" and a "B" with
+white spaces are dealt as same.
+
+This is possible by using the ``strip`` method of strings. Let us define a
+string by typing
+::
+
+    unstripped = "     B    "
+    unstripped.strip()
+
+We can see that strip removes all the whitespace around the sentence
+
+{{{ Pause here and try out the following exercises }}}
+
+%% 2 %% What happens to the white space inside the sentence when it is stripped
+
+{{{ continue from paused state }}}
+
+Type
+::
+
+    a_str = "         white      space            "
+    a_str.strip()
+
+We see that the whitespace inside the sentence is only removed and anything
+inside remains unaffected.
+
+By now we know enough to seperate fields from the record and to strip out any
+white space. The only road block we now have is conversion of string to float.
+
+The splitting and stripping operations are done on a string and their result is
+also a string. hence the marks that we have are still strings and mathematical
+operations are not possible on them. We must convert them into numbers
+(integers or floats), before we can perform mathematical operations on them. 
+
+We shall look at converting strings into floats. We define a float string
+first. Type 
+::
+
+    mark_str = "1.25"
+    mark = int(mark_str)
+    type(mark_str)
+    type(mark)
+
+We can see that string is converted to float. We can perform mathematical
+operations on them now.
+
+{{{ Pause here and try out the following exercises }}}
+
+%% 3 %% What happens if you do int("1.25")
+
+{{{ continue from paused state }}}
+
+It raises an error since converting a float string into integer directly is
+not possible. It involves an intermediate step of converting to float.
+::
+
+    dcml_str = "1.25"
+    flt = float(dcml_str)
+    flt
+    number = int(flt)
+    number
+
+Using ``int`` it is also possible to convert float into integers.
+
+Now that we have all the machinery required to parse the file, let us solve the
+problem. We first read the file line by line and parse each record. We see if
+the region code is B and store the marks accordingly.
+::
+
+    math_marks_B = [] # an empty list to store the marks
+    for line in open("/home/fossee/sslc1.txt"):
+        fields = line.split(";")
+
+        region_code = fields[0]
+        region_code_stripped = region_code.strip()
+
+        math_mark_str = fields[5]
+        math_mark = float(math_mark_str)
+
+        if region_code == "AA":
+            math_marks_B.append(math_mark)
+
+
+Now we have all the maths marks of region "B" in the list math_marks_B.
+To get the mean, we just have to sum the marks and divide by the length.
+::
+
+        math_marks_mean = sum(math_marks_B) / len(math_marks_B)
+        math_marks_mean
+
+{{{ Show summary slide }}}
+
+This brings us to the end of the tutorial.
+we have learnt
+
+ * how to tokenize a string using various delimiters
+ * how to get rid of extra white space around
+ * how to convert from one type to another
+ * how to parse input data and perform computations on it
+
+{{{ Show the "sponsored by FOSSEE" slide }}}
+
+#[Nishanth]: Will add this line after all of us fix on one.
+This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India
+
+Hope you have enjoyed and found it useful.
+Thank you
+ 
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/parsing_data/slides.tex	Thu Oct 07 14:40:21 2010 +0530
@@ -0,0 +1,106 @@
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%Tutorial slides on Python.
+%
+% Author: FOSSEE 
+% Copyright (c) 2009, FOSSEE, IIT Bombay
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\documentclass[14pt,compress]{beamer}
+%\documentclass[draft]{beamer}
+%\documentclass[compress,handout]{beamer}
+%\usepackage{pgfpages} 
+%\pgfpagesuselayout{2 on 1}[a4paper,border shrink=5mm]
+
+% Modified from: generic-ornate-15min-45min.de.tex
+\mode<presentation>
+{
+  \usetheme{Warsaw}
+  \useoutertheme{infolines}
+  \setbeamercovered{transparent}
+}
+
+\usepackage[english]{babel}
+\usepackage[latin1]{inputenc}
+%\usepackage{times}
+\usepackage[T1]{fontenc}
+
+\usepackage{ae,aecompl}
+\usepackage{mathpazo,courier,euler}
+\usepackage[scaled=.95]{helvet}
+
+\definecolor{darkgreen}{rgb}{0,0.5,0}
+
+\usepackage{listings}
+\lstset{language=Python,
+    basicstyle=\ttfamily\bfseries,
+    commentstyle=\color{red}\itshape,
+  stringstyle=\color{darkgreen},
+  showstringspaces=false,
+  keywordstyle=\color{blue}\bfseries}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Macros
+\setbeamercolor{emphbar}{bg=blue!20, fg=black}
+\newcommand{\emphbar}[1]
+{\begin{beamercolorbox}[rounded=true]{emphbar} 
+      {#1}
+ \end{beamercolorbox}
+}
+\newcounter{time}
+\setcounter{time}{0}
+\newcommand{\inctime}[1]{\addtocounter{time}{#1}{\tiny \thetime\ m}}
+
+\newcommand{\typ}[1]{\lstinline{#1}}
+
+\newcommand{\kwrd}[1]{ \texttt{\textbf{\color{blue}{#1}}}  }
+
+% Title page
+\title{Your Title Here}
+
+\author[FOSSEE] {FOSSEE}
+
+\institute[IIT Bombay] {Department of Aerospace Engineering\\IIT Bombay}
+\date{}
+
+% DOCUMENT STARTS
+\begin{document}
+
+\begin{frame}
+  \maketitle
+\end{frame}
+
+\begin{frame}[fragile]
+  \frametitle{Outline}
+  \begin{itemize}
+    \item 
+  \end{itemize}
+\end{frame}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%              All other slides here.                  %%
+%% The same slides will be used in a classroom setting. %% 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\begin{frame}[fragile]
+  \frametitle{Summary}
+  \begin{itemize}
+    \item 
+  \end{itemize}
+\end{frame}
+
+\begin{frame}
+  \frametitle{Thank you!}  
+  \begin{block}{}
+  \begin{center}
+  This spoken tutorial has been produced by the
+  \textcolor{blue}{FOSSEE} team, which is funded by the 
+  \end{center}
+  \begin{center}
+    \textcolor{blue}{National Mission on Education through \\
+      Information \& Communication Technology \\ 
+      MHRD, Govt. of India}.
+  \end{center}  
+  \end{block}
+\end{frame}
+
+\end{document}