basic_python/strings_dicts.rst
author Shantanu <shantanu@fossee.in>
Thu, 10 Jun 2010 18:54:04 +0530
changeset 97 597ee2fcb795
parent 76 814fb2d7c1d0
permissions -rw-r--r--
Merged the branches.

=======
Strings
=======

Strings were briefly introduced previously in the introduction document. In this
section strings will be presented in greater detail. All the standard operations 
that can be performed on sequences such as indexing, slicing, multiplication, length
minimum and maximum can be performed on string variables as well. One thing to
be noted is that strings are immutable, which means that string variables are
unchangeable. Hence, all item and slice assignments on strings are illegal.
Let us look at a few example.

::

  >>> name = 'PythonFreak'
  >>> print name[3]
  h
  >>> print name[-1]
  k
  >>> print name[6:]
  Freak
  >>> name[6:0] = 'Maniac'
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: 'str' object does not support item assignment

This is quite expected, since string objects are immutable as already mentioned.
The error message is clear in mentioning that 'str' object does not support item
assignment.

String Formatting
=================

String formatting can be performed using the string formatting operator represented
as the percent (%) sign. The string placed before the % sign is formatted with 
the value placed to the right of it. Let us look at a simple example.

::

  >>> format = 'Hello %s, from PythonFreak'
  >>> str1 = 'world!'
  >>> print format % str1
  Hello world!, from PythonFreak

The %s parts of the format string are called the coversion specifiers. The coversion
specifiers mark the places where the formatting has to be performed in a string. 
In the example the %s is replaced by the value of str1. More than one value can 
also be formatted at a time by specifying the values to be formatted using tuples
and dictionaries (explained in later sections). Let us look at an example.

::

  >>> format = 'Hello %s, from %s'
  >>> values = ('world!', 'PythonFreak')
  >>> print format % values
  Hello world!, from PythonFreak

In this example it can be observed that the format string contains two conversion 
specifiers and they are formatted using the tuple of values as shown.

The s in %s specifies that the value to be replaced is of type string. Values of 
other types can be specified as well such as integers and floats. Integers are 
specified as %d and floats as %f. The precision with which the integer or the 
float values are to be represented can also be specified using a **.** (**dot**)
followed by the precision value.

String Methods
==============

Similar to list methods, strings also have a rich set of methods to perform various
operations on strings. Some of the most important and popular ones are presented
in this section.

**find**
~~~~~~~~

The **find** method is used to search for a substring within a given string. It 
returns the left most index of the first occurence of the substring. If the 
substring is not found in the string then it returns -1. Let us look at a few 
examples.

::

  >>> longstring = 'Hello world!, from PythonFreak'
  >>> longstring.find('Python')
  19
  >>> longstring.find('Perl')
  -1

**join**
~~~~~~~~

The **join** method is used to join the elements of a sequence. The sequence 
elements that are to be join ed should all be strings. Let us look at a few 
examples.

::
  
  >>> seq = ['With', 'great', 'power', 'comes', 'great', 'responsibility']
  >>> sep = ' '
  >>> sep.join(seq)
  'With great power comes great responsibility'
  >>> sep = ',!'
  >>> sep.join(seq)
  'With,!great,!power,!comes,!great,!responsibility'

*Try this yourself*

::

  >>> seq = [12,34,56,78]
  >>> sep.join(seq)

**lower**
~~~~~~~~~

The **lower** method, as the name indicates, converts the entire text of a string
to lower case. It is specially useful in cases where the programmers deal with case
insensitive data. Let us look at a few examples.

::

  >>> sometext = 'Hello world!, from PythonFreak'
  >>> sometext.lower()
  'hello world!, from pythonfreak'

**replace**
~~~~~~~~~~~

The **replace** method replaces a substring with another substring within
a given string and returns the new string. Let us look at an example.

::

  >>> sometext = 'Concise, precise and criticise is some of the words that end with ise'
  >>> sometext.replace('is', 'are')
  'Concaree, precaree and criticaree are some of the words that end with aree'

Observe here that all the occurences of the substring *is* have been replaced,
even the *is* in *concise*, *precise* and *criticise* have been replaced.

**split**
~~~~~~~~~

The **split** is one of the very important string methods. split is the opposite of the 
**join** method. It is used to split a string based on the argument passed as the
delimiter. It returns a list of strings. By default when no argument is passed it
splits with *space* (' ') as the delimiter. Let us look at an example.

::

  >>> grocerylist = 'butter, cucumber, beer(a grocery item??), wheatbread'
  >>> grocerylist.split(',')
  ['butter', ' cucumber', ' beer(a grocery item??)', ' wheatbread']
  >>> grocerylist.split()
  ['butter,', 'cucumber,', 'beer(a', 'grocery', 'item??),', 'wheatbread']

Observe here that in the second case when the delimiter argument was not set 
**split** was done with *space* as the delimiter.

**strip**
~~~~~~~~~

The **strip** method is used to remove or **strip** off any whitespaces that exist
to the left and right of a string, but not the whitespaces within a string. Let 
us look at an example.

::

  >>> spacedtext = "               Where's the text??                 "
  >>> spacedtext.strip()
  "Where's the text??"

Observe that the whitespaces between the words have not been removed.

::

  Note: Very important thing to note is that all the methods shown above do not
        transform the source string. The source string still remains the same.
	Remember that **strings are immutable**.

Introduction to the standard library
====================================

Python is often referred to as a "Batteries included!" language, mainly because 
of the Python Standard Library. The Python Standard Library provides an extensive
set of features some of which are available directly for use while some require to
import a few **modules**. The Standard Library provides various built-in functions
like:

    * **abs()**
    * **dict()**
    * **enumerate()**

The built-in constants like **True** and **False** are provided by the Standard Library.
More information about the Python Standard Library is available http://docs.python.org/library/


I/O: Reading and Writing Files
==============================

Files are very important aspects when it comes to computing and programming.
Up until now the focus has been on small programs that interacted with users
through **input()** and **raw_input()**. Generally, for computational purposes
it becomes necessary to handle files, which are usually large in size as well.
This section focuses on basics of file handling.

Opening Files
~~~~~~~~~~~~~

Files can be opened using the **open()** method. **open()** accepts 3 arguments
out of which 2 are optional. Let us look at the syntax of **open()**:

*f = open( filename, mode, buffering)*

The *filename* is a compulsory argument while the *mode* and *buffering* are 
optional. The *filename* should be a string and it should be the complete path
to the file to be opened (The path can be absolute or relative). Let us look at
an example.

::

  >>> f = open ('basic_python/interim_assessment.rst')
  
The *mode* argument specifies the mode in which the file has to be opened.
The following are the valid mode arguments:

**r** - Read mode
**w** - Write mode
**a** - Append mode
**b** - Binary mode
**+** - Read/Write mode

The read mode opens the file as a read-only document. The write mode opens the
file in the Write only mode. In the write mode, if the file existed prior to the
opening, the previous contents of the file are erased. The append mode opens the
file in the write mode but the previous contents of the file are not erased and
the current data is appended onto the file.
The binary and the read/write modes are special in the sense that they are added
onto other modes. The read/write mode opens the file in the reading and writing
mode combined. The binary mode can be used to open a files that do not contain 
text. Binary files such as images should be opened in the binary mode. Let us look
at a few examples.

::

  >>> f = open ('basic_python/interim_assessment.rst', 'r')
  >>> f = open ('armstrong.py', 'r+')

The third argument to the **open()** method is the *buffering* argument. This takes
a boolean value, *True* or *1* indicates that buffering has to be enabled on the file,
that is the file is loaded on to the main memory and the changes made to the file are 
not immediately written to the disk. If the *buffering* argument is *0* or *False* the 
changes are directly written on to the disk immediately.

Reading and Writing files
~~~~~~~~~~~~~~~~~~~~~~~~~

**write()**
-----------

**write()**, evidently, is used to write data onto a file. It takes the data to 
be written as the argument. The data can be a string, an integer, a float or any
other datatype. In order to be able to write data onto a file, the file has to
be opened in one of **w**, **a** or **+** modes.

**read()**
----------

**read()** is used to read data from a file. It takes the number of bytes of data
to be read as the argument. If nothing is specified by default it reads the entire 
contents from the current position to the end of file.

Let us look at a few examples:

::

  >>> f = open ('randomtextfile', 'w')
  >>> f.write('Hello all, this is PythonFreak. This is a random text file.')
  >>> f = open ('../randomtextfile', 'r')
  >>> f = open ('../randomtextfile', 'r')
  >>> f.read(5)
  'Hello'
  >>> f.read()
  ' all, this is PythonFreak. This is a random text file.'
  >>> f.close()

**readline()**
--------------

**readline()** is used to read a file line by line. **readline()** reads a line
of a file at a time. When an argument is passed to **readline()** it reads that
many bytes from the current line.

One other method to read a file line by line is using the **read()** and the 
**for** construct. Let us look at this block of code as an example.

::

  >>> f = open('../randomtextfile', 'r')
  >>> for line in f:
  ...     print line
  ... 
  Hello all!
  
  This is PythonFreak on the second line.
  
  This is a random text file on line 3

**close()**
-----------

One must always close all the files that have been opened. Although, files opened
will be closed automatically when the program ends. When files opened in read mode
are not closed it might lead to uselessly locked sometimes. In case of files
opened in the write mode it is more important to close the files. This is because,
Python maybe using the file in the buffering mode and when the file is not closed
the buffer maybe lost completely and the changes made to the file are lost forever.


Dictionaries
============

A dictionary in general, are designed to be able to look up meanings of words. 
Similarly, the Python dictionaries are also designed to look up for a specific
key and retrieve the corresponding value. Dictionaries are data structures that
provide key-value mappings. Dictionaries are similar to lists except that instead
of the values having integer indexes, dictionaries have keys or strings as indexes.
Let us look at an example of how to define dictionaries.

::

  >>> dct = { 'Sachin': 'Tendulkar', 'Rahul': 'Dravid', 'Anil': 'Kumble'}

The dictionary consists of pairs of strings, which are called *keys* and their
corresponding *values* separated by *:* and each of these *key-value* pairs are
comma(',') separated and the entire structure wrapped in a pair curly braces *{}*.

::

  Note: The data inside a dictionary is not ordered. The order in which you enter
  the key-value pairs is not the order in which they are stored in the dictionary.
  Python has an internal storage mechanism for that which is out of the purview 
  of this document.

**dict()**
~~~~~~~~~~

The **dict()** function is used to create dictionaries from other mappings or other
dictionaries. Let us look at an example.

::

  >>> diction = dict(mat = 133, avg = 52.53)

**String Formatting with Dictionaries:**

String formatting was discussed in the previous section and it was mentioned that
dictionaries can also be used for formatting more than one value. This section 
focuses on the formatting of strings using dictionaries. String formatting using
dictionaries is more appealing than doing the same with tuples. Here the *keyword*
can be used as a place holder and the *value* corresponding to it is replaced in
the formatted string. Let us look at an example.

::

  >>> player = { 'Name':'Rahul Dravid', 'Matches':133, 'Avg':52.53, '100s':26 }
  >>> strng = '%(Name)s has played %(Matches)d with an average of %(Avg).2f and has %(100s)d hundreds to his name.'
  >>> print strng % player
  Rahul Dravid has played 133 with an average of 52.53 and has 26 hundreds to his name.

Dictionary Methods
~~~~~~~~~~~~~~~~~~

**clear()**
-----------

The **clear()** method removes all the existing *key-value* pairs from a dictionary.
It returns *None* or rather does not return anything. It is a method that changes
the object. It has to be noted here that dictionaries are not immutable. Let us 
look at an example.

::
  
  >>> dct
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
  >>> dct.clear()
  >>> dct
  {}

**copy()**
----------

The **copy()** returns a copy of a given dictionary. Let us look at an example.

::

  >>> dct = {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
  >>> dctcopy = dct.copy()
  >>> dctcopy
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}


**get()**
---------

**get()** returns the *value* for the *key* passed as the argument and if the
*key* does not exist in the dictionary, it returns *None*. Let us look at an
example.

::

  >>> print dctcopy.get('Saurav')
  None
  >>> print dctcopy.get('Anil')
  Kumble

**has_key()**
-------------

This method returns *True* if the given *key* is in the dictionary, else it returns 
*False*.

::

  >>> dctcopy.has_key('Saurav')
  False
  >>> dctcopy.has_key('Sachin')
  True

**pop()**
---------

This method is used to retrieve the *value* of a given *key* and subsequently 
remove the *key-value* pair from the dictionary. Let us look at an example.

::

  >>> print dctcopy.pop('Sachin')
  Tendulkar
  >>> dctcopy
  {'Anil': 'Kumble', 'Rahul': 'Dravid'}

**popitem()**
-------------

This method randomly pops a *key-value* pair from a dictionary and returns it.
The *key-value* pair returned is removed from the dictionary. Let us look at an
example.

::

  >>> print dctcopy.popitem()
  ('Anil', 'Kumble')
  >>> dctcopy
  {'Rahul': 'Dravid'}

  Note that the item chosen is completely random since dictionaries are unordered
  as mentioned earlier.

**update()**
------------

The **update()** method updates the contents of one dictionary with the contents
of another dictionary. For items with existing *keys* their *values* are updated,
and the rest of the items are added. Let us look at an example.

::

  >>> dctcopy.update(dct)
  >>> dct
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
  >>> dctcopy
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}