statistics.txt
changeset 52 53700ad0e71e
parent 51 32d854e62be9
child 53 3d2c2c0bc3e2
equal deleted inserted replaced
51:32d854e62be9 52:53700ad0e71e
     1 Hello and welcome to the tutorial on handling large data files and processing them to get desired results.
     1 Hello and welcome to the tutorial on handling large data files and processing them.
     2 
     2 
     3 Till now we have covered:
     3 Till now we have covered:
     4 * How to create plots.
     4 * How to create plots.
     5 * How to read data from file and process it.
     5 * How to read data from files and process it.
     6 
     6 
     7 In this session, we will use them and some new concepts to solve a problem/exercise. 
     7 In this session, we will use these concepts and some new ones, to solve a problem/exercise. 
     8 
     8 
     9 We have a file named sslc.txt. 
     9 We have a file named sslc.txt. 
    10 It contains record of students and their performance in one of the State Secondary Board Examination. It has 180, 000 lines of record. We are going to read it and process this data.
    10 It contains record of students and their performance in one of the State Secondary Board Examination. It has 180, 000 lines of record. We are going to read it and process this data.
    11 We can see the content of file by opening with any text editor.
    11 We can see the content of file by opening with any text editor.
    12 Please don't edit the data.
    12 Please don't edit the data.
    13 This file has a particular structure. Each line in the file is a set of 11 fields:
    13 This file has a particular structure. Each line in the file is a set of 11 fields separated by semi-colons
    14 A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;
    14 A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;
    15 The following are the fields in any given line.
    15 The following are the fields in any given line.
    16 * Region Code which is 'A'
    16 * Region Code which is 'A'
    17 * Roll Number 015163
    17 * Roll Number 015163
    18 * Name JOSEPH RAJ S
    18 * Name JOSEPH RAJ S
    41 Let's first start off with dictionaries.
    41 Let's first start off with dictionaries.
    42 
    42 
    43 We earlier used lists briefly. Back then we just created lists and appended items into them. 
    43 We earlier used lists briefly. Back then we just created lists and appended items into them. 
    44 x = [1, 4, 2, 7, 6]
    44 x = [1, 4, 2, 7, 6]
    45 In order to access any element in a list, we use its index number. Index starts from 0.
    45 In order to access any element in a list, we use its index number. Index starts from 0.
    46 For eg. x[0] will give 1 and x[3] will 7.
    46 For eg. x[0] will give 1 and x[3] will give 7.
    47 
    47 
    48 There are times when we can't access things through integer indexes. For example consider a telephone directory, we give it a name and it should return back corresponding number. List is not the best kind of data structure for such problems, and hence Python provides support for dictionaries. Dictionaries are key value pairs. Lists are indexed by integers while dictionaries are indexed by strings. For example:
    48 But, using integer indexes isn't always convenient. For example, consider a telephone directory. We give it a name and it should return a corresponding number. A list is not well suited for such problems. Python's dictionaries are better, for such problems. Dictionaries are just key-value pairs. For example:
    49 
    49 
    50 d = {'png' : 'image',
    50 d = {'png' : 'image',
    51       'txt' : 'text', 
    51       'txt' : 'text', 
    52       'py' : 'python'} 
    52       'py' : 'python'} 
    53 
    53 
    54 d
    54 d
    55 
    55 
    56 d is a dictionary. The first element in the pair is called the `key' and the second is called the `value'. The key always has to be a string while the value can be of any type.
    56 d is a dictionary. The first element in the pair is called the `key' and the second is called the `value'. The key always has to be a string while the value can be of any type.
    57 
    57 
    58 Dictionaries are indexed using their keys as shown
    58 Lists are indexed by integers while dictionaries are indexed by strings. They are indexed using their keys as shown
    59 In []: d['txt']
    59 In []: d['txt']
    60 Out[]: 'text'
    60 Out[]: 'text'
    61 
    61 
    62 In []: d['png']
    62 In []: d['png']
    63 Out[]: 'image'
    63 Out[]: 'image'
    67 True
    67 True
    68 
    68 
    69 'jpg' in d
    69 'jpg' in d
    70 False
    70 False
    71 
    71 
    72 Please note the values cannot be searched in a dictionaries.
    72 Please note that keys, and not values, are searched. 
    73 'In a telephone directory one can search for a number based on a name, but not for a name based on a number'
    73 'In a telephone directory one can search for a number based on a name, but not for a name based on a number'
    74 
    74 
    75 to obtain the list of all keys in a dictionary type
    75 to obtain the list of all keys in a dictionary, type
    76 d.keys()
    76 d.keys()
    77 ['py', 'txt', 'png']
    77 ['py', 'txt', 'png']
    78 
    78 
    79 Similarly,
    79 Similarly,
    80 d.values()
    80 d.values()