|
1 ======= |
|
2 Strings |
|
3 ======= |
|
4 |
|
5 Strings were briefly introduced previously in the introduction document. In this |
|
6 section strings will be presented in greater detail. All the standard operations |
|
7 that can be performed on sequences such as indexing, slicing, multiplication, length |
|
8 minimum and maximum can be performed on string variables as well. One thing to |
|
9 be noted is that strings are immutable, which means that string variables are |
|
10 unchangeable. Hence, all item and slice assignments on strings are illegal. |
|
11 Let us look at a few example. |
|
12 |
|
13 :: |
|
14 |
|
15 >>> name = 'PythonFreak' |
|
16 >>> print name[3] |
|
17 h |
|
18 >>> print name[-1] |
|
19 k |
|
20 >>> print name[6:] |
|
21 Freak |
|
22 >>> name[6:0] = 'Maniac' |
|
23 Traceback (most recent call last): |
|
24 File "<stdin>", line 1, in <module> |
|
25 TypeError: 'str' object does not support item assignment |
|
26 |
|
27 This is quite expected, since string objects are immutable as already mentioned. |
|
28 The error message is clear in mentioning that 'str' object does not support item |
|
29 assignment. |
|
30 |
|
31 String Formatting |
|
32 ================= |
|
33 |
|
34 String formatting can be performed using the string formatting operator represented |
|
35 as the percent (%) sign. The string placed before the % sign is formatted with |
|
36 the value placed to the right of it. Let us look at a simple example. |
|
37 |
|
38 :: |
|
39 |
|
40 >>> format = 'Hello %s, from PythonFreak' |
|
41 >>> str1 = 'world!' |
|
42 >>> print format % str1 |
|
43 Hello world!, from PythonFreak |
|
44 |
|
45 The %s parts of the format string are called the coversion specifiers. The coversion |
|
46 specifiers mark the places where the formatting has to be performed in a string. |
|
47 In the example the %s is replaced by the value of str1. More than one value can |
|
48 also be formatted at a time by specifying the values to be formatted using tuples |
|
49 and dictionaries (explained in later sections). Let us look at an example. |
|
50 |
|
51 :: |
|
52 |
|
53 >>> format = 'Hello %s, from %s' |
|
54 >>> values = ('world!', 'PythonFreak') |
|
55 >>> print format % values |
|
56 Hello world!, from PythonFreak |
|
57 |
|
58 In this example it can be observed that the format string contains two conversion |
|
59 specifiers and they are formatted using the tuple of values as shown. |
|
60 |
|
61 The s in %s specifies that the value to be replaced is of type string. Values of |
|
62 other types can be specified as well such as integers and floats. Integers are |
|
63 specified as %d and floats as %f. The precision with which the integer or the |
|
64 float values are to be represented can also be specified using a **.** (**dot**) |
|
65 followed by the precision value. |
|
66 |
|
67 String Methods |
|
68 ============== |
|
69 |
|
70 Similar to list methods, strings also have a rich set of methods to perform various |
|
71 operations on strings. Some of the most important and popular ones are presented |
|
72 in this section. |
|
73 |
|
74 **find** |
|
75 ~~~~~~~~ |
|
76 |
|
77 The **find** method is used to search for a substring within a given string. It |
|
78 returns the left most index of the first occurence of the substring. If the |
|
79 substring is not found in the string then it returns -1. Let us look at a few |
|
80 examples. |
|
81 |
|
82 :: |
|
83 |
|
84 >>> longstring = 'Hello world!, from PythonFreak' |
|
85 >>> longstring.find('Python') |
|
86 19 |
|
87 >>> longstring.find('Perl') |
|
88 -1 |
|
89 |
|
90 **join** |
|
91 ~~~~~~~~ |
|
92 |
|
93 The **join** method is used to join the elements of a sequence. The sequence |
|
94 elements that are to be join ed should all be strings. Let us look at a few |
|
95 examples. |
|
96 |
|
97 :: |
|
98 |
|
99 >>> seq = ['With', 'great', 'power', 'comes', 'great', 'responsibility'] |
|
100 >>> sep = ' ' |
|
101 >>> sep.join(seq) |
|
102 'With great power comes great responsibility' |
|
103 >>> sep = ',!' |
|
104 >>> sep.join(seq) |
|
105 'With,!great,!power,!comes,!great,!responsibility' |
|
106 |
|
107 *Try this yourself* |
|
108 |
|
109 :: |
|
110 |
|
111 >>> seq = [12,34,56,78] |
|
112 >>> sep.join(seq) |
|
113 |
|
114 **lower** |
|
115 ~~~~~~~~~ |
|
116 |
|
117 The **lower** method, as the name indicates, converts the entire text of a string |
|
118 to lower case. It is specially useful in cases where the programmers deal with case |
|
119 insensitive data. Let us look at a few examples. |
|
120 |
|
121 :: |
|
122 |
|
123 >>> sometext = 'Hello world!, from PythonFreak' |
|
124 >>> sometext.lower() |
|
125 'hello world!, from pythonfreak' |
|
126 |
|
127 **replace** |
|
128 ~~~~~~~~~~~ |
|
129 |
|
130 The **replace** method replaces a substring with another substring within |
|
131 a given string and returns the new string. Let us look at an example. |
|
132 |
|
133 :: |
|
134 |
|
135 >>> sometext = 'Concise, precise and criticise is some of the words that end with ise' |
|
136 >>> sometext.replace('is', 'are') |
|
137 'Concaree, precaree and criticaree are some of the words that end with aree' |
|
138 |
|
139 Observe here that all the occurences of the substring *is* have been replaced, |
|
140 even the *is* in *concise*, *precise* and *criticise* have been replaced. |
|
141 |
|
142 **split** |
|
143 ~~~~~~~~~ |
|
144 |
|
145 The **split** is one of the very important string methods. split is the opposite of the |
|
146 **join** method. It is used to split a string based on the argument passed as the |
|
147 delimiter. It returns a list of strings. By default when no argument is passed it |
|
148 splits with *space* (' ') as the delimiter. Let us look at an example. |
|
149 |
|
150 :: |
|
151 |
|
152 >>> grocerylist = 'butter, cucumber, beer(a grocery item??), wheatbread' |
|
153 >>> grocerylist.split(',') |
|
154 ['butter', ' cucumber', ' beer(a grocery item??)', ' wheatbread'] |
|
155 >>> grocerylist.split() |
|
156 ['butter,', 'cucumber,', 'beer(a', 'grocery', 'item??),', 'wheatbread'] |
|
157 |
|
158 Observe here that in the second case when the delimiter argument was not set |
|
159 **split** was done with *space* as the delimiter. |
|
160 |
|
161 **strip** |
|
162 ~~~~~~~~~ |
|
163 |
|
164 The **strip** method is used to remove or **strip** off any whitespaces that exist |
|
165 to the left and right of a string, but not the whitespaces within a string. Let |
|
166 us look at an example. |
|
167 |
|
168 :: |
|
169 |
|
170 >>> spacedtext = " Where's the text?? " |
|
171 >>> spacedtext.strip() |
|
172 "Where's the text??" |
|
173 |
|
174 Observe that the whitespaces between the words have not been removed. |
|
175 |
|
176 :: |
|
177 |
|
178 Note: Very important thing to note is that all the methods shown above do not |
|
179 transform the source string. The source string still remains the same. |
|
180 Remember that **strings are immutable**. |
|
181 |
|
182 Introduction to the standard library |
|
183 ==================================== |
|
184 |
|
185 Python is often referred to as a "Batteries included!" language, mainly because |
|
186 of the Python Standard Library. The Python Standard Library provides an extensive |
|
187 set of features some of which are available directly for use while some require to |
|
188 import a few **modules**. The Standard Library provides various built-in functions |
|
189 like: |
|
190 |
|
191 * **abs()** |
|
192 * **dict()** |
|
193 * **enumerate()** |
|
194 |
|
195 The built-in constants like **True** and **False** are provided by the Standard Library. |
|
196 More information about the Python Standard Library is available http://docs.python.org/library/ |
|
197 |
|
198 |
|
199 I/O: Reading and Writing Files |
|
200 ============================== |
|
201 |
|
202 Files are very important aspects when it comes to computing and programming. |
|
203 Up until now the focus has been on small programs that interacted with users |
|
204 through **input()** and **raw_input()**. Generally, for computational purposes |
|
205 it becomes necessary to handle files, which are usually large in size as well. |
|
206 This section focuses on basics of file handling. |
|
207 |
|
208 Opening Files |
|
209 ~~~~~~~~~~~~~ |
|
210 |
|
211 Files can be opened using the **open()** method. **open()** accepts 3 arguments |
|
212 out of which 2 are optional. Let us look at the syntax of **open()**: |
|
213 |
|
214 *f = open( filename, mode, buffering)* |
|
215 |
|
216 The *filename* is a compulsory argument while the *mode* and *buffering* are |
|
217 optional. The *filename* should be a string and it should be the complete path |
|
218 to the file to be opened (The path can be absolute or relative). Let us look at |
|
219 an example. |
|
220 |
|
221 :: |
|
222 |
|
223 >>> f = open ('basic_python/interim_assessment.rst') |
|
224 |
|
225 The *mode* argument specifies the mode in which the file has to be opened. |
|
226 The following are the valid mode arguments: |
|
227 |
|
228 **r** - Read mode |
|
229 **w** - Write mode |
|
230 **a** - Append mode |
|
231 **b** - Binary mode |
|
232 **+** - Read/Write mode |
|
233 |
|
234 The read mode opens the file as a read-only document. The write mode opens the |
|
235 file in the Write only mode. In the write mode, if the file existed prior to the |
|
236 opening, the previous contents of the file are erased. The append mode opens the |
|
237 file in the write mode but the previous contents of the file are not erased and |
|
238 the current data is appended onto the file. |
|
239 The binary and the read/write modes are special in the sense that they are added |
|
240 onto other modes. The read/write mode opens the file in the reading and writing |
|
241 mode combined. The binary mode can be used to open a files that do not contain |
|
242 text. Binary files such as images should be opened in the binary mode. Let us look |
|
243 at a few examples. |
|
244 |
|
245 :: |
|
246 |
|
247 >>> f = open ('basic_python/interim_assessment.rst', 'r') |
|
248 >>> f = open ('armstrong.py', 'r+') |
|
249 |
|
250 The third argument to the **open()** method is the *buffering* argument. This takes |
|
251 a boolean value, *True* or *1* indicates that buffering has to be enabled on the file, |
|
252 that is the file is loaded on to the main memory and the changes made to the file are |
|
253 not immediately written to the disk. If the *buffering* argument is *0* or *False* the |
|
254 changes are directly written on to the disk immediately. |
|
255 |
|
256 Reading and Writing files |
|
257 ~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
258 |
|
259 **write()** |
|
260 ----------- |
|
261 |
|
262 **write()**, evidently, is used to write data onto a file. It takes the data to |
|
263 be written as the argument. The data can be a string, an integer, a float or any |
|
264 other datatype. In order to be able to write data onto a file, the file has to |
|
265 be opened in one of **w**, **a** or **+** modes. |
|
266 |
|
267 **read()** |
|
268 ---------- |
|
269 |
|
270 **read()** is used to read data from a file. It takes the number of bytes of data |
|
271 to be read as the argument. If nothing is specified by default it reads the entire |
|
272 contents from the current position to the end of file. |
|
273 |
|
274 Let us look at a few examples: |
|
275 |
|
276 :: |
|
277 |
|
278 >>> f = open ('randomtextfile', 'w') |
|
279 >>> f.write('Hello all, this is PythonFreak. This is a random text file.') |
|
280 >>> f = open ('../randomtextfile', 'r') |
|
281 >>> f = open ('../randomtextfile', 'r') |
|
282 >>> f.read(5) |
|
283 'Hello' |
|
284 >>> f.read() |
|
285 ' all, this is PythonFreak. This is a random text file.' |
|
286 >>> f.close() |
|
287 |
|
288 **readline()** |
|
289 -------------- |
|
290 |
|
291 **readline()** is used to read a file line by line. **readline()** reads a line |
|
292 of a file at a time. When an argument is passed to **readline()** it reads that |
|
293 many bytes from the current line. |
|
294 |
|
295 One other method to read a file line by line is using the **read()** and the |
|
296 **for** construct. Let us look at this block of code as an example. |
|
297 |
|
298 :: |
|
299 |
|
300 >>> f = open('../randomtextfile', 'r') |
|
301 >>> for line in f: |
|
302 ... print line |
|
303 ... |
|
304 Hello all! |
|
305 |
|
306 This is PythonFreak on the second line. |
|
307 |
|
308 This is a random text file on line 3 |
|
309 |
|
310 **close()** |
|
311 ----------- |
|
312 |
|
313 One must always close all the files that have been opened. Although, files opened |
|
314 will be closed automatically when the program ends. When files opened in read mode |
|
315 are not closed it might lead to uselessly locked sometimes. In case of files |
|
316 opened in the write mode it is more important to close the files. This is because, |
|
317 Python maybe using the file in the buffering mode and when the file is not closed |
|
318 the buffer maybe lost completely and the changes made to the file are lost forever. |
|
319 |
|
320 |
|
321 Dictionaries |
|
322 ============ |
|
323 |
|
324 A dictionary in general, are designed to be able to look up meanings of words. |
|
325 Similarly, the Python dictionaries are also designed to look up for a specific |
|
326 key and retrieve the corresponding value. Dictionaries are data structures that |
|
327 provide key-value mappings. Dictionaries are similar to lists except that instead |
|
328 of the values having integer indexes, dictionaries have keys or strings as indexes. |
|
329 Let us look at an example of how to define dictionaries. |
|
330 |
|
331 :: |
|
332 |
|
333 >>> dct = { 'Sachin': 'Tendulkar', 'Rahul': 'Dravid', 'Anil': 'Kumble'} |
|
334 |
|
335 The dictionary consists of pairs of strings, which are called *keys* and their |
|
336 corresponding *values* separated by *:* and each of these *key-value* pairs are |
|
337 comma(',') separated and the entire structure wrapped in a pair curly braces *{}*. |
|
338 |
|
339 :: |
|
340 |
|
341 Note: The data inside a dictionary is not ordered. The order in which you enter |
|
342 the key-value pairs is not the order in which they are stored in the dictionary. |
|
343 Python has an internal storage mechanism for that which is out of the purview |
|
344 of this document. |
|
345 |
|
346 **dict()** |
|
347 ~~~~~~~~~~ |
|
348 |
|
349 The **dict()** function is used to create dictionaries from other mappings or other |
|
350 dictionaries. Let us look at an example. |
|
351 |
|
352 :: |
|
353 |
|
354 >>> diction = dict(mat = 133, avg = 52.53) |
|
355 |
|
356 **String Formatting with Dictionaries:** |
|
357 |
|
358 String formatting was discussed in the previous section and it was mentioned that |
|
359 dictionaries can also be used for formatting more than one value. This section |
|
360 focuses on the formatting of strings using dictionaries. String formatting using |
|
361 dictionaries is more appealing than doing the same with tuples. Here the *keyword* |
|
362 can be used as a place holder and the *value* corresponding to it is replaced in |
|
363 the formatted string. Let us look at an example. |
|
364 |
|
365 :: |
|
366 |
|
367 >>> player = { 'Name':'Rahul Dravid', 'Matches':133, 'Avg':52.53, '100s':26 } |
|
368 >>> strng = '%(Name)s has played %(Matches)d with an average of %(Avg).2f and has %(100s)d hundreds to his name.' |
|
369 >>> print strng % player |
|
370 Rahul Dravid has played 133 with an average of 52.53 and has 26 hundreds to his name. |
|
371 |
|
372 Dictionary Methods |
|
373 ~~~~~~~~~~~~~~~~~~ |
|
374 |
|
375 **clear()** |
|
376 ----------- |
|
377 |
|
378 The **clear()** method removes all the existing *key-value* pairs from a dictionary. |
|
379 It returns *None* or rather does not return anything. It is a method that changes |
|
380 the object. It has to be noted here that dictionaries are not immutable. Let us |
|
381 look at an example. |
|
382 |
|
383 :: |
|
384 |
|
385 >>> dct |
|
386 {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'} |
|
387 >>> dct.clear() |
|
388 >>> dct |
|
389 {} |
|
390 |
|
391 **copy()** |
|
392 ---------- |
|
393 |
|
394 The **copy()** returns a copy of a given dictionary. Let us look at an example. |
|
395 |
|
396 :: |
|
397 |
|
398 >>> dct = {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'} |
|
399 >>> dctcopy = dct.copy() |
|
400 >>> dctcopy |
|
401 {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'} |
|
402 |
|
403 |
|
404 **get()** |
|
405 --------- |
|
406 |
|
407 **get()** returns the *value* for the *key* passed as the argument and if the |
|
408 *key* does not exist in the dictionary, it returns *None*. Let us look at an |
|
409 example. |
|
410 |
|
411 :: |
|
412 |
|
413 >>> print dctcopy.get('Saurav') |
|
414 None |
|
415 >>> print dctcopy.get('Anil') |
|
416 Kumble |
|
417 |
|
418 **has_key()** |
|
419 ------------- |
|
420 |
|
421 This method returns *True* if the given *key* is in the dictionary, else it returns |
|
422 *False*. |
|
423 |
|
424 :: |
|
425 |
|
426 >>> dctcopy.has_key('Saurav') |
|
427 False |
|
428 >>> dctcopy.has_key('Sachin') |
|
429 True |
|
430 |
|
431 **pop()** |
|
432 --------- |
|
433 |
|
434 This method is used to retrieve the *value* of a given *key* and subsequently |
|
435 remove the *key-value* pair from the dictionary. Let us look at an example. |
|
436 |
|
437 :: |
|
438 |
|
439 >>> print dctcopy.pop('Sachin') |
|
440 Tendulkar |
|
441 >>> dctcopy |
|
442 {'Anil': 'Kumble', 'Rahul': 'Dravid'} |
|
443 |
|
444 **popitem()** |
|
445 ------------- |
|
446 |
|
447 This method randomly pops a *key-value* pair from a dictionary and returns it. |
|
448 The *key-value* pair returned is removed from the dictionary. Let us look at an |
|
449 example. |
|
450 |
|
451 :: |
|
452 |
|
453 >>> print dctcopy.popitem() |
|
454 ('Anil', 'Kumble') |
|
455 >>> dctcopy |
|
456 {'Rahul': 'Dravid'} |
|
457 |
|
458 Note that the item chosen is completely random since dictionaries are unordered |
|
459 as mentioned earlier. |
|
460 |
|
461 **update()** |
|
462 ------------ |
|
463 |
|
464 The **update()** method updates the contents of one dictionary with the contents |
|
465 of another dictionary. For items with existing *keys* their *values* are updated, |
|
466 and the rest of the items are added. Let us look at an example. |
|
467 |
|
468 :: |
|
469 |
|
470 >>> dctcopy.update(dct) |
|
471 >>> dct |
|
472 {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'} |
|
473 >>> dctcopy |
|
474 {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'} |
|
475 |