Slicing text (3.2.3-3.2.6)
Continue from yesterday as of chapter 3.2.3 of the whale book.
Slicing can be used not only in list but also in text. This is already checked in chapter 1 as well.
>>> print monty Monty Python >>> monty[0] 'M' >>> monty[3] 't' >>> monty[5] ' ' >>> monty[20] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: string index out of range >>>
Negative value is also acceptable.
>>> monty[-1] 'n' >>> monty[5] ' ' >>> monty[-7] ' ' >>> monty[-6:] 'Python' >>>
This example is processing characters one by one in text. Also comparing the result with/without(,) at the end of print.
>>> sent = 'colorless green ideas sleep furiously' >>> for char in sent: ... print char, ... c o l o r l e s s g r e e n i d e a s s l e e p f u r i o u s l y >>> for char in sent: ... print char ... c o l o r l e s s g r e e n i d e a s s l e e p f u r i o u s l y >>>
With comma(,), the system not change to a new line. Continuously output just after previous one in the same line.
This is another example to process one by one. Count frequency of each alphabet.
>>> from nltk.corpus import gutenberg >>> raw = gutenberg.raw('melville-moby_dick.txt') >>> fdist = nltk.FreqDist(ch.lower() for ch in raw if ch.isalpha()) >>> fdist.keys() ['e', 't', 'a', 'o', 'n', 'i', 's', 'h', 'r', 'l', 'd', 'u', 'm', 'c', 'w', 'f', 'g', 'p', 'b', 'y', 'v', 'k', 'q', 'j', 'x', 'z'] >>> fdist.plot()
Using ranges for slicing. One tricky thing is that the characters are extracted from beginning of the range to one before the end of the range.
>>> monty[6:10] 'Pyth' >>> monty[-12:-7] 'Monty' >>> monty[:5] 'Monty' >>> monty[6:] 'Python'
For example, monty[6:10] extracts text from monty[6] to monty[9], monty[10] is not included. Therefore the output will not be 'pytho' but 'pyth' as displayed above. If omitting the beginning of the range, the system will pick up from the first element[0]. If omitting the end of the range, to be extracted to the end of the element.
in is useful to check whether specific characters (or words) are included.
>>> phrase = 'And now for something completely different' >>> if 'thing' in phrase: ... print 'found "thing"' ... found "thing"
find returns the location of the specific characters.
>>> monty.find('Python') 6 >>> monty[6:12] 'Python' >>>
To get help document:
>>> help(str)
Difference between list and str (3.2.6):
>>> query 'Who knows?' >>> beatles ['john', 'paul', 'george', 'ringo'] >>> query[2] 'o' >>> beatles[2] 'george' >>> query[:2] 'Wh' >>> beatles[:2] ['john', 'paul'] >>> query + "I don't" "Who knows?I don't" >>> beatles + 'brian' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can only concatenate list (not "str") to list >>> beatles + ['brian'] ['john', 'paul', 'george', 'ringo', 'brian'] >>> >>> beatles[0] = "John Lennon" >>> del beatles[-1] >>> beatles ['John Lennon', 'paul', 'george'] >>> query[0] = 'F' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment >>>