Exercise: Chapter 3 (1 - 6)

1.

>>> s = 'colorless'
>>> print s[:4] + 'u' + s[4:]
colourless

2.

>>> 'dogs'[:-1]
'dog'
>>> 'dishes'[:-2]
'dish'
>>> 'running'[:-4]
'run'
>>> 'nationality'[:-5]
'nation'
>>> 'undo'[:-2]
'un'
>>> 'undo'[2:]
'do'
>>> 'preheat'[3:]
'heat'

3.

>>> strg = 'abcdefghijklmnopqrstuvwxyz'
>>> strg[26]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> strg[-2]
'y'

I am not sure I understand this question correctly. If use numbers smaller than stating point[0] (go far too left), the value should be negative. The negative value in index means "from the end". In my example, second character form the end ('y') was selected.

4.

>>> monty = 'Monty Python'
>>> monty[6:11:2]
'Pto'
>>> monty[2:10:3]
'n t'
>>> monty[::4]
'Myt'

5.

>>> monty[::-1]
'nohtyP ytnoM'

It was reversed!

6.

a. [a-zA-Z]+
alphabet at least and more than one time

>>> nltk.re_show(r'[a-zA-Z]+', 'a abc aBcd ABcd ABCD a1234 12A34 aB1234')
{a} {abc} {aBcd} {ABcd} {ABCD} {a}1234 12{A}34 {aB}1234

b. [A-Z][a-z]*
Start with upper case after that lower case is coming but lower cases can be omitted. (*)

>>> nltk.re_show(r'[A-Z][a-z]*', 'a abc aBcd ABcd ABCD a1234 12A34 aB1234')
a abc a{Bcd} {A}{Bcd} {A}{B}{C}{D} a1234 12{A}34 a{B}1234
||< 

c. p[aeiou]{,2}t
start with 'p' and end with 't' between them 0 to 2 vowels(aeiou) can be inserted.

>|python|
>>> nltk.re_show(r'p[aeiou]{,2}t', 'pit pet peat pool good puuut pt')
{pit} {pet} {peat} pool good puuut {pt}

d. \d+(\.\d+)?
'\d' means numbers, therefore numbers with decimals and under decimal point is optional.

>>> nltk.re_show(r'\d+(\.\d+)?', '0 0.2 12.0003 -4 -5.2 5,000')
{0} {0.2} {12.0003} -{4} -{5.2} {5},{000}
>>> nltk.re_show(r'-?\d+(\.\d+)?', '0 0.2 12.0003 -4 -5.2 5,000')
{0} {0.2} {12.0003} {-4} {-5.2} {5},{000}

Just adjusted to include optional negative sign(-) in the second example.

e. ([^aeiou][aeiou][^aeiou])*
combination of non-vowel + vowel + non-vowel is repeated. Because of last (*), this condition was made optional.

>>> nltk.re_show(r'([^aeiou][aeiou][^aeiou])*', 'appeal push pool shose neck 1234 gogleb')
{}a{}p{}p{}e{}a{}l{} {pus}h{} {}p{}o{}o{}l{} {}s{hos}e{} {nec}k{} {}1{}2{}3{}4{} {gogleb}
>>> nltk.re_show(r'([^aeiou][aeiou][^aeiou])+', 'appeal push pool shose neck 1234 gogleb')
appeal {pus}h pool s{hos}e {nec}k 1234 {gogleb}

From my point of view, to use '+' is more natural like the second example.

f. \w+|[^\w\s]+
non-space characters are repeated at least one time.

>>> nltk.re_show(r'\w+|[^\w\s]+', '123 abc 1_2_3_A_B_C .....        ')
{123} {abc} {1_2_3_A_B_C} {.....}