Entries from 2013-06-02 to 1 day

Japanese corpus (12.1.1)

>>> import nltk >>> from nltk.corpus.reader import * >>> from nltk.corpus.reader.util import * >>> from nltk.text import Text >>> >>> jp_sent_tokenizer = nltk.RegexpTokenizer(u' 「」!?。]*[!?。]') >>> jp_chartype_tokenizer = nltk.Regex…

Combining Different Sequence Types (4.2.2-4.2.3)

Let's continue. >>> words = 'I turned off the spectroroute'.split() >>> wordlens = [(len(word), word) for word in words] >>> wordlens.sort() >>> ' '.join(w for(_, w) in wordlens) 'I off the turned spectroroute' >>> The first line is to spl…