Exercise: Chapter 3 (10-13)

10.

The original one is:

>>> sent = ['The', 'dog', 'gave', 'John', 'the', 'newspaper']
>>> result = []
>>> for word in sent:
...     word_len = (word, len(word))
...     result.append(word_len)
...
>>> result
[('The', 3), ('dog', 3), ('gave', 4), ('John', 4), ('the', 3), ('newspaper', 9)]

>>>

Convert to a list comprehension:

>>> result = [(word, len(word)) for word in sent]
>>> result
[('The', 3), ('dog', 3), ('gave', 4), ('John', 4), ('the', 3), ('newspaper', 9)]

>>>

11.

>>> raw = "I don't like sports because I could not be an hero when I was playing childhood."

re.split() can be used for this purpose.

>>> re.split(r's', raw)
["I don't like ", 'port', ' becau', 'e I could not be an hero when I wa', ' play
ing childhood.']
>>> re.split(r'[ns]', raw)
['I do', "'t like ", 'port', ' becau', 'e I could ', 'ot be a', ' hero whe', ' I
 wa', ' playi', 'g childhood.']
>>>

12.

>>> raw
"I don't like sports because I could not be an hero when I was playing childhood."

Let's use same raw data.

>>> for char in raw:
...     print char
...
I

d
o
n
'
t

l
i
k
e

s
p
o
r
t
s

b
e
c
a
u
s
e

I

c
o
u
l
d

n
o
t

b
e

a
n

h
e
r
o

w
h
e
n

I

w
a
s

p
l
a
y
i
n
g

c
h
i
l
d
h
o
o
d
.
>>>

13.

>>> raw.split()
['I', "don't", 'like', 'sports', 'because', 'I', 'could', 'not', 'be', 'an', 'he
ro', 'when', 'I', 'was', 'playing', 'childhood.']
>>> raw.split(' ')
['I', "don't", 'like', 'sports', 'because', 'I', 'could', 'not', 'be', 'an', 'he
ro', 'when', 'I', 'was', 'playing', 'childhood.']

No differences for the same raw data. Change a little bit (raw2). Put multiple spaces (4 times) after 'like', put a single tab between 'because' and 'I' and double tabs between 'hero' and 'when'.

>>> raw2 = "I don't like    sports because      I could  not be an hero
        when I was playing childhood."
>>> raw2.split()
['I', "don't", 'like', 'sports', 'because', 'I', 'could', 'not', 'be', 'an', 'he
ro', 'when', 'I', 'was', 'playing', 'childhood.']
>>> raw2.split(' ')
['I', "don't", 'like', '', '', '', 'sports', 'because', '\tI', 'could', '', 'not
', 'be', 'an', 'hero', '\t\twhen', 'I', 'was', 'playing', 'childhood.']
>>>

According to this result, single/multiple space(s) and tabs are reconginzed as a splitter with raw2.split(). For the second one(split(' ')), tabs are displayed as "\t". For consecutive spaces, it is displayed like non-character '' is inserted between spaces.