Entries from 2013-05-17 to 1 day

2013-05-17

Unicode text processing (3.3)

NLTK

In my case, I will handle double-byte languages like Japanese and Chinese. In terms of that, Unicode handling will be mandatory. Chapter 3.3 of the whale book is for Unicode handling. >>> path = nltk.data.find('corpora/unicode_samples/poli…

2013-05-17

Slicing text (3.2.3-3.2.6)

NLTK

Continue from yesterday as of chapter 3.2.3 of the whale book.Slicing can be used not only in list but also in text. This is already checked in chapter 1 as well. >>> print monty Monty Python >>> monty[0] 'M' >>> monty[3] 't' >>> monty[5] …