sttp/basic_python/strings_dicts.rst
changeset 0 27e1f5bd2774
equal deleted inserted replaced
-1:000000000000 0:27e1f5bd2774
       
     1 =======
       
     2 Strings
       
     3 =======
       
     4 
       
     5 Strings were briefly introduced previously in the introduction document. In this
       
     6 section strings will be presented in greater detail. All the standard operations 
       
     7 that can be performed on sequences such as indexing, slicing, multiplication, length
       
     8 minimum and maximum can be performed on string variables as well. One thing to
       
     9 be noted is that strings are immutable, which means that string variables are
       
    10 unchangeable. Hence, all item and slice assignments on strings are illegal.
       
    11 Let us look at a few example.
       
    12 
       
    13 ::
       
    14 
       
    15   >>> name = 'PythonFreak'
       
    16   >>> print name[3]
       
    17   h
       
    18   >>> print name[-1]
       
    19   k
       
    20   >>> print name[6:]
       
    21   Freak
       
    22   >>> name[6:0] = 'Maniac'
       
    23   Traceback (most recent call last):
       
    24     File "<stdin>", line 1, in <module>
       
    25   TypeError: 'str' object does not support item assignment
       
    26 
       
    27 This is quite expected, since string objects are immutable as already mentioned.
       
    28 The error message is clear in mentioning that 'str' object does not support item
       
    29 assignment.
       
    30 
       
    31 String Formatting
       
    32 =================
       
    33 
       
    34 String formatting can be performed using the string formatting operator represented
       
    35 as the percent (%) sign. The string placed before the % sign is formatted with 
       
    36 the value placed to the right of it. Let us look at a simple example.
       
    37 
       
    38 ::
       
    39 
       
    40   >>> format = 'Hello %s, from PythonFreak'
       
    41   >>> str1 = 'world!'
       
    42   >>> print format % str1
       
    43   Hello world!, from PythonFreak
       
    44 
       
    45 The %s parts of the format string are called the coversion specifiers. The coversion
       
    46 specifiers mark the places where the formatting has to be performed in a string. 
       
    47 In the example the %s is replaced by the value of str1. More than one value can 
       
    48 also be formatted at a time by specifying the values to be formatted using tuples
       
    49 and dictionaries (explained in later sections). Let us look at an example.
       
    50 
       
    51 ::
       
    52 
       
    53   >>> format = 'Hello %s, from %s'
       
    54   >>> values = ('world!', 'PythonFreak')
       
    55   >>> print format % values
       
    56   Hello world!, from PythonFreak
       
    57 
       
    58 In this example it can be observed that the format string contains two conversion 
       
    59 specifiers and they are formatted using the tuple of values as shown.
       
    60 
       
    61 The s in %s specifies that the value to be replaced is of type string. Values of 
       
    62 other types can be specified as well such as integers and floats. Integers are 
       
    63 specified as %d and floats as %f. The precision with which the integer or the 
       
    64 float values are to be represented can also be specified using a **.** (**dot**)
       
    65 followed by the precision value.
       
    66 
       
    67 String Methods
       
    68 ==============
       
    69 
       
    70 Similar to list methods, strings also have a rich set of methods to perform various
       
    71 operations on strings. Some of the most important and popular ones are presented
       
    72 in this section.
       
    73 
       
    74 **find**
       
    75 ~~~~~~~~
       
    76 
       
    77 The **find** method is used to search for a substring within a given string. It 
       
    78 returns the left most index of the first occurence of the substring. If the 
       
    79 substring is not found in the string then it returns -1. Let us look at a few 
       
    80 examples.
       
    81 
       
    82 ::
       
    83 
       
    84   >>> longstring = 'Hello world!, from PythonFreak'
       
    85   >>> longstring.find('Python')
       
    86   19
       
    87   >>> longstring.find('Perl')
       
    88   -1
       
    89 
       
    90 **join**
       
    91 ~~~~~~~~
       
    92 
       
    93 The **join** method is used to join the elements of a sequence. The sequence 
       
    94 elements that are to be join ed should all be strings. Let us look at a few 
       
    95 examples.
       
    96 
       
    97 ::
       
    98   
       
    99   >>> seq = ['With', 'great', 'power', 'comes', 'great', 'responsibility']
       
   100   >>> sep = ' '
       
   101   >>> sep.join(seq)
       
   102   'With great power comes great responsibility'
       
   103   >>> sep = ',!'
       
   104   >>> sep.join(seq)
       
   105   'With,!great,!power,!comes,!great,!responsibility'
       
   106 
       
   107 *Try this yourself*
       
   108 
       
   109 ::
       
   110 
       
   111   >>> seq = [12,34,56,78]
       
   112   >>> sep.join(seq)
       
   113 
       
   114 **lower**
       
   115 ~~~~~~~~~
       
   116 
       
   117 The **lower** method, as the name indicates, converts the entire text of a string
       
   118 to lower case. It is specially useful in cases where the programmers deal with case
       
   119 insensitive data. Let us look at a few examples.
       
   120 
       
   121 ::
       
   122 
       
   123   >>> sometext = 'Hello world!, from PythonFreak'
       
   124   >>> sometext.lower()
       
   125   'hello world!, from pythonfreak'
       
   126 
       
   127 **replace**
       
   128 ~~~~~~~~~~~
       
   129 
       
   130 The **replace** method replaces a substring with another substring within
       
   131 a given string and returns the new string. Let us look at an example.
       
   132 
       
   133 ::
       
   134 
       
   135   >>> sometext = 'Concise, precise and criticise is some of the words that end with ise'
       
   136   >>> sometext.replace('is', 'are')
       
   137   'Concaree, precaree and criticaree are some of the words that end with aree'
       
   138 
       
   139 Observe here that all the occurences of the substring *is* have been replaced,
       
   140 even the *is* in *concise*, *precise* and *criticise* have been replaced.
       
   141 
       
   142 **split**
       
   143 ~~~~~~~~~
       
   144 
       
   145 The **split** is one of the very important string methods. split is the opposite of the 
       
   146 **join** method. It is used to split a string based on the argument passed as the
       
   147 delimiter. It returns a list of strings. By default when no argument is passed it
       
   148 splits with *space* (' ') as the delimiter. Let us look at an example.
       
   149 
       
   150 ::
       
   151 
       
   152   >>> grocerylist = 'butter, cucumber, beer(a grocery item??), wheatbread'
       
   153   >>> grocerylist.split(',')
       
   154   ['butter', ' cucumber', ' beer(a grocery item??)', ' wheatbread']
       
   155   >>> grocerylist.split()
       
   156   ['butter,', 'cucumber,', 'beer(a', 'grocery', 'item??),', 'wheatbread']
       
   157 
       
   158 Observe here that in the second case when the delimiter argument was not set 
       
   159 **split** was done with *space* as the delimiter.
       
   160 
       
   161 **strip**
       
   162 ~~~~~~~~~
       
   163 
       
   164 The **strip** method is used to remove or **strip** off any whitespaces that exist
       
   165 to the left and right of a string, but not the whitespaces within a string. Let 
       
   166 us look at an example.
       
   167 
       
   168 ::
       
   169 
       
   170   >>> spacedtext = "               Where's the text??                 "
       
   171   >>> spacedtext.strip()
       
   172   "Where's the text??"
       
   173 
       
   174 Observe that the whitespaces between the words have not been removed.
       
   175 
       
   176 ::
       
   177 
       
   178   Note: Very important thing to note is that all the methods shown above do not
       
   179         transform the source string. The source string still remains the same.
       
   180 	Remember that **strings are immutable**.
       
   181 
       
   182 Introduction to the standard library
       
   183 ====================================
       
   184 
       
   185 Python is often referred to as a "Batteries included!" language, mainly because 
       
   186 of the Python Standard Library. The Python Standard Library provides an extensive
       
   187 set of features some of which are available directly for use while some require to
       
   188 import a few **modules**. The Standard Library provides various built-in functions
       
   189 like:
       
   190 
       
   191     * **abs()**
       
   192     * **dict()**
       
   193     * **enumerate()**
       
   194 
       
   195 The built-in constants like **True** and **False** are provided by the Standard Library.
       
   196 More information about the Python Standard Library is available http://docs.python.org/library/
       
   197 
       
   198 
       
   199 I/O: Reading and Writing Files
       
   200 ==============================
       
   201 
       
   202 Files are very important aspects when it comes to computing and programming.
       
   203 Up until now the focus has been on small programs that interacted with users
       
   204 through **input()** and **raw_input()**. Generally, for computational purposes
       
   205 it becomes necessary to handle files, which are usually large in size as well.
       
   206 This section focuses on basics of file handling.
       
   207 
       
   208 Opening Files
       
   209 ~~~~~~~~~~~~~
       
   210 
       
   211 Files can be opened using the **open()** method. **open()** accepts 3 arguments
       
   212 out of which 2 are optional. Let us look at the syntax of **open()**:
       
   213 
       
   214 *f = open( filename, mode, buffering)*
       
   215 
       
   216 The *filename* is a compulsory argument while the *mode* and *buffering* are 
       
   217 optional. The *filename* should be a string and it should be the complete path
       
   218 to the file to be opened (The path can be absolute or relative). Let us look at
       
   219 an example.
       
   220 
       
   221 ::
       
   222 
       
   223   >>> f = open ('basic_python/interim_assessment.rst')
       
   224   
       
   225 The *mode* argument specifies the mode in which the file has to be opened.
       
   226 The following are the valid mode arguments:
       
   227 
       
   228 **r** - Read mode
       
   229 **w** - Write mode
       
   230 **a** - Append mode
       
   231 **b** - Binary mode
       
   232 **+** - Read/Write mode
       
   233 
       
   234 The read mode opens the file as a read-only document. The write mode opens the
       
   235 file in the Write only mode. In the write mode, if the file existed prior to the
       
   236 opening, the previous contents of the file are erased. The append mode opens the
       
   237 file in the write mode but the previous contents of the file are not erased and
       
   238 the current data is appended onto the file.
       
   239 The binary and the read/write modes are special in the sense that they are added
       
   240 onto other modes. The read/write mode opens the file in the reading and writing
       
   241 mode combined. The binary mode can be used to open a files that do not contain 
       
   242 text. Binary files such as images should be opened in the binary mode. Let us look
       
   243 at a few examples.
       
   244 
       
   245 ::
       
   246 
       
   247   >>> f = open ('basic_python/interim_assessment.rst', 'r')
       
   248   >>> f = open ('armstrong.py', 'r+')
       
   249 
       
   250 The third argument to the **open()** method is the *buffering* argument. This takes
       
   251 a boolean value, *True* or *1* indicates that buffering has to be enabled on the file,
       
   252 that is the file is loaded on to the main memory and the changes made to the file are 
       
   253 not immediately written to the disk. If the *buffering* argument is *0* or *False* the 
       
   254 changes are directly written on to the disk immediately.
       
   255 
       
   256 Reading and Writing files
       
   257 ~~~~~~~~~~~~~~~~~~~~~~~~~
       
   258 
       
   259 **write()**
       
   260 -----------
       
   261 
       
   262 **write()**, evidently, is used to write data onto a file. It takes the data to 
       
   263 be written as the argument. The data can be a string, an integer, a float or any
       
   264 other datatype. In order to be able to write data onto a file, the file has to
       
   265 be opened in one of **w**, **a** or **+** modes.
       
   266 
       
   267 **read()**
       
   268 ----------
       
   269 
       
   270 **read()** is used to read data from a file. It takes the number of bytes of data
       
   271 to be read as the argument. If nothing is specified by default it reads the entire 
       
   272 contents from the current position to the end of file.
       
   273 
       
   274 Let us look at a few examples:
       
   275 
       
   276 ::
       
   277 
       
   278   >>> f = open ('randomtextfile', 'w')
       
   279   >>> f.write('Hello all, this is PythonFreak. This is a random text file.')
       
   280   >>> f = open ('../randomtextfile', 'r')
       
   281   >>> f = open ('../randomtextfile', 'r')
       
   282   >>> f.read(5)
       
   283   'Hello'
       
   284   >>> f.read()
       
   285   ' all, this is PythonFreak. This is a random text file.'
       
   286   >>> f.close()
       
   287 
       
   288 **readline()**
       
   289 --------------
       
   290 
       
   291 **readline()** is used to read a file line by line. **readline()** reads a line
       
   292 of a file at a time. When an argument is passed to **readline()** it reads that
       
   293 many bytes from the current line.
       
   294 
       
   295 One other method to read a file line by line is using the **read()** and the 
       
   296 **for** construct. Let us look at this block of code as an example.
       
   297 
       
   298 ::
       
   299 
       
   300   >>> f = open('../randomtextfile', 'r')
       
   301   >>> for line in f:
       
   302   ...     print line
       
   303   ... 
       
   304   Hello all!
       
   305   
       
   306   This is PythonFreak on the second line.
       
   307   
       
   308   This is a random text file on line 3
       
   309 
       
   310 **close()**
       
   311 -----------
       
   312 
       
   313 One must always close all the files that have been opened. Although, files opened
       
   314 will be closed automatically when the program ends. When files opened in read mode
       
   315 are not closed it might lead to uselessly locked sometimes. In case of files
       
   316 opened in the write mode it is more important to close the files. This is because,
       
   317 Python maybe using the file in the buffering mode and when the file is not closed
       
   318 the buffer maybe lost completely and the changes made to the file are lost forever.
       
   319 
       
   320 
       
   321 Dictionaries
       
   322 ============
       
   323 
       
   324 A dictionary in general, are designed to be able to look up meanings of words. 
       
   325 Similarly, the Python dictionaries are also designed to look up for a specific
       
   326 key and retrieve the corresponding value. Dictionaries are data structures that
       
   327 provide key-value mappings. Dictionaries are similar to lists except that instead
       
   328 of the values having integer indexes, dictionaries have keys or strings as indexes.
       
   329 Let us look at an example of how to define dictionaries.
       
   330 
       
   331 ::
       
   332 
       
   333   >>> dct = { 'Sachin': 'Tendulkar', 'Rahul': 'Dravid', 'Anil': 'Kumble'}
       
   334 
       
   335 The dictionary consists of pairs of strings, which are called *keys* and their
       
   336 corresponding *values* separated by *:* and each of these *key-value* pairs are
       
   337 comma(',') separated and the entire structure wrapped in a pair curly braces *{}*.
       
   338 
       
   339 ::
       
   340 
       
   341   Note: The data inside a dictionary is not ordered. The order in which you enter
       
   342   the key-value pairs is not the order in which they are stored in the dictionary.
       
   343   Python has an internal storage mechanism for that which is out of the purview 
       
   344   of this document.
       
   345 
       
   346 **dict()**
       
   347 ~~~~~~~~~~
       
   348 
       
   349 The **dict()** function is used to create dictionaries from other mappings or other
       
   350 dictionaries. Let us look at an example.
       
   351 
       
   352 ::
       
   353 
       
   354   >>> diction = dict(mat = 133, avg = 52.53)
       
   355 
       
   356 **String Formatting with Dictionaries:**
       
   357 
       
   358 String formatting was discussed in the previous section and it was mentioned that
       
   359 dictionaries can also be used for formatting more than one value. This section 
       
   360 focuses on the formatting of strings using dictionaries. String formatting using
       
   361 dictionaries is more appealing than doing the same with tuples. Here the *keyword*
       
   362 can be used as a place holder and the *value* corresponding to it is replaced in
       
   363 the formatted string. Let us look at an example.
       
   364 
       
   365 ::
       
   366 
       
   367   >>> player = { 'Name':'Rahul Dravid', 'Matches':133, 'Avg':52.53, '100s':26 }
       
   368   >>> strng = '%(Name)s has played %(Matches)d with an average of %(Avg).2f and has %(100s)d hundreds to his name.'
       
   369   >>> print strng % player
       
   370   Rahul Dravid has played 133 with an average of 52.53 and has 26 hundreds to his name.
       
   371 
       
   372 Dictionary Methods
       
   373 ~~~~~~~~~~~~~~~~~~
       
   374 
       
   375 **clear()**
       
   376 -----------
       
   377 
       
   378 The **clear()** method removes all the existing *key-value* pairs from a dictionary.
       
   379 It returns *None* or rather does not return anything. It is a method that changes
       
   380 the object. It has to be noted here that dictionaries are not immutable. Let us 
       
   381 look at an example.
       
   382 
       
   383 ::
       
   384   
       
   385   >>> dct
       
   386   {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
       
   387   >>> dct.clear()
       
   388   >>> dct
       
   389   {}
       
   390 
       
   391 **copy()**
       
   392 ----------
       
   393 
       
   394 The **copy()** returns a copy of a given dictionary. Let us look at an example.
       
   395 
       
   396 ::
       
   397 
       
   398   >>> dct = {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
       
   399   >>> dctcopy = dct.copy()
       
   400   >>> dctcopy
       
   401   {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
       
   402 
       
   403 
       
   404 **get()**
       
   405 ---------
       
   406 
       
   407 **get()** returns the *value* for the *key* passed as the argument and if the
       
   408 *key* does not exist in the dictionary, it returns *None*. Let us look at an
       
   409 example.
       
   410 
       
   411 ::
       
   412 
       
   413   >>> print dctcopy.get('Saurav')
       
   414   None
       
   415   >>> print dctcopy.get('Anil')
       
   416   Kumble
       
   417 
       
   418 **has_key()**
       
   419 -------------
       
   420 
       
   421 This method returns *True* if the given *key* is in the dictionary, else it returns 
       
   422 *False*.
       
   423 
       
   424 ::
       
   425 
       
   426   >>> dctcopy.has_key('Saurav')
       
   427   False
       
   428   >>> dctcopy.has_key('Sachin')
       
   429   True
       
   430 
       
   431 **pop()**
       
   432 ---------
       
   433 
       
   434 This method is used to retrieve the *value* of a given *key* and subsequently 
       
   435 remove the *key-value* pair from the dictionary. Let us look at an example.
       
   436 
       
   437 ::
       
   438 
       
   439   >>> print dctcopy.pop('Sachin')
       
   440   Tendulkar
       
   441   >>> dctcopy
       
   442   {'Anil': 'Kumble', 'Rahul': 'Dravid'}
       
   443 
       
   444 **popitem()**
       
   445 -------------
       
   446 
       
   447 This method randomly pops a *key-value* pair from a dictionary and returns it.
       
   448 The *key-value* pair returned is removed from the dictionary. Let us look at an
       
   449 example.
       
   450 
       
   451 ::
       
   452 
       
   453   >>> print dctcopy.popitem()
       
   454   ('Anil', 'Kumble')
       
   455   >>> dctcopy
       
   456   {'Rahul': 'Dravid'}
       
   457 
       
   458   Note that the item chosen is completely random since dictionaries are unordered
       
   459   as mentioned earlier.
       
   460 
       
   461 **update()**
       
   462 ------------
       
   463 
       
   464 The **update()** method updates the contents of one dictionary with the contents
       
   465 of another dictionary. For items with existing *keys* their *values* are updated,
       
   466 and the rest of the items are added. Let us look at an example.
       
   467 
       
   468 ::
       
   469 
       
   470   >>> dctcopy.update(dct)
       
   471   >>> dct
       
   472   {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
       
   473   >>> dctcopy
       
   474   {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
       
   475