Exercise: Chapter 3 (1 - 6)
1.
>>> s = 'colorless' >>> print s[:4] + 'u' + s[4:] colourless
2.
>>> 'dogs'[:-1] 'dog' >>> 'dishes'[:-2] 'dish' >>> 'running'[:-4] 'run' >>> 'nationality'[:-5] 'nation' >>> 'undo'[:-2] 'un' >>> 'undo'[2:] 'do' >>> 'preheat'[3:] 'heat'
3.
>>> strg = 'abcdefghijklmnopqrstuvwxyz' >>> strg[26] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: string index out of range >>> strg[-2] 'y'
I am not sure I understand this question correctly. If use numbers smaller than stating point[0] (go far too left), the value should be negative. The negative value in index means "from the end". In my example, second character form the end ('y') was selected.
4.
>>> monty = 'Monty Python' >>> monty[6:11:2] 'Pto' >>> monty[2:10:3] 'n t' >>> monty[::4] 'Myt'
5.
>>> monty[::-1] 'nohtyP ytnoM'
It was reversed!
6.
a. [a-zA-Z]+
alphabet at least and more than one time
>>> nltk.re_show(r'[a-zA-Z]+', 'a abc aBcd ABcd ABCD a1234 12A34 aB1234') {a} {abc} {aBcd} {ABcd} {ABCD} {a}1234 12{A}34 {aB}1234
b. [A-Z][a-z]*
Start with upper case after that lower case is coming but lower cases can be omitted. (*)
>>> nltk.re_show(r'[A-Z][a-z]*', 'a abc aBcd ABcd ABCD a1234 12A34 aB1234') a abc a{Bcd} {A}{Bcd} {A}{B}{C}{D} a1234 12{A}34 a{B}1234 ||< c. p[aeiou]{,2}t start with 'p' and end with 't' between them 0 to 2 vowels(aeiou) can be inserted. >|python| >>> nltk.re_show(r'p[aeiou]{,2}t', 'pit pet peat pool good puuut pt') {pit} {pet} {peat} pool good puuut {pt}
d. \d+(\.\d+)?
'\d' means numbers, therefore numbers with decimals and under decimal point is optional.
>>> nltk.re_show(r'\d+(\.\d+)?', '0 0.2 12.0003 -4 -5.2 5,000') {0} {0.2} {12.0003} -{4} -{5.2} {5},{000} >>> nltk.re_show(r'-?\d+(\.\d+)?', '0 0.2 12.0003 -4 -5.2 5,000') {0} {0.2} {12.0003} {-4} {-5.2} {5},{000}
Just adjusted to include optional negative sign(-) in the second example.
e. ([^aeiou][aeiou][^aeiou])*
combination of non-vowel + vowel + non-vowel is repeated. Because of last (*), this condition was made optional.
>>> nltk.re_show(r'([^aeiou][aeiou][^aeiou])*', 'appeal push pool shose neck 1234 gogleb') {}a{}p{}p{}e{}a{}l{} {pus}h{} {}p{}o{}o{}l{} {}s{hos}e{} {nec}k{} {}1{}2{}3{}4{} {gogleb} >>> nltk.re_show(r'([^aeiou][aeiou][^aeiou])+', 'appeal push pool shose neck 1234 gogleb') appeal {pus}h pool s{hos}e {nec}k 1234 {gogleb}
From my point of view, to use '+' is more natural like the second example.
f. \w+|[^\w\s]+
non-space characters are repeated at least one time.
>>> nltk.re_show(r'\w+|[^\w\s]+', '123 abc 1_2_3_A_B_C ..... ') {123} {abc} {1_2_3_A_B_C} {.....}