Exercise: Chapter 3 (10-13)
10.
The original one is:
>>> sent = ['The', 'dog', 'gave', 'John', 'the', 'newspaper'] >>> result = [] >>> for word in sent: ... word_len = (word, len(word)) ... result.append(word_len) ... >>> result [('The', 3), ('dog', 3), ('gave', 4), ('John', 4), ('the', 3), ('newspaper', 9)] >>>
Convert to a list comprehension:
>>> result = [(word, len(word)) for word in sent] >>> result [('The', 3), ('dog', 3), ('gave', 4), ('John', 4), ('the', 3), ('newspaper', 9)] >>>
11.
>>> raw = "I don't like sports because I could not be an hero when I was playing childhood."
re.split() can be used for this purpose.
>>> re.split(r's', raw) ["I don't like ", 'port', ' becau', 'e I could not be an hero when I wa', ' play ing childhood.'] >>> re.split(r'[ns]', raw) ['I do', "'t like ", 'port', ' becau', 'e I could ', 'ot be a', ' hero whe', ' I wa', ' playi', 'g childhood.'] >>>
12.
>>> raw
"I don't like sports because I could not be an hero when I was playing childhood."
Let's use same raw data.
>>> for char in raw: ... print char ... I d o n ' t l i k e s p o r t s b e c a u s e I c o u l d n o t b e a n h e r o w h e n I w a s p l a y i n g c h i l d h o o d . >>>
13.
>>> raw.split() ['I', "don't", 'like', 'sports', 'because', 'I', 'could', 'not', 'be', 'an', 'he ro', 'when', 'I', 'was', 'playing', 'childhood.'] >>> raw.split(' ') ['I', "don't", 'like', 'sports', 'because', 'I', 'could', 'not', 'be', 'an', 'he ro', 'when', 'I', 'was', 'playing', 'childhood.']
No differences for the same raw data. Change a little bit (raw2). Put multiple spaces (4 times) after 'like', put a single tab between 'because' and 'I' and double tabs between 'hero' and 'when'.
>>> raw2 = "I don't like sports because I could not be an hero when I was playing childhood." >>> raw2.split() ['I', "don't", 'like', 'sports', 'because', 'I', 'could', 'not', 'be', 'an', 'he ro', 'when', 'I', 'was', 'playing', 'childhood.'] >>> raw2.split(' ') ['I', "don't", 'like', '', '', '', 'sports', 'because', '\tI', 'could', '', 'not ', 'be', 'an', 'hero', '\t\twhen', 'I', 'was', 'playing', 'childhood.'] >>>
According to this result, single/multiple space(s) and tabs are reconginzed as a splitter with raw2.split(). For the second one(split(' ')), tabs are displayed as "\t". For consecutive spaces, it is displayed like non-character '' is inserted between spaces.