Entries from 2013-06-02 to 1 day

2013-06-02

Japanese corpus (12.1.1)

NLTK

>>> import nltk >>> from nltk.corpus.reader import * >>> from nltk.corpus.reader.util import * >>> from nltk.text import Text >>> >>> jp_sent_tokenizer = nltk.RegexpTokenizer(u' 「」！？。]*[！？。]') >>> jp_chartype_tokenizer = nltk.Regex…

2013-06-02

Combining Different Sequence Types (4.2.2-4.2.3)

NLTK

Let's continue. >>> words = 'I turned off the spectroroute'.split() >>> wordlens = [(len(word), word) for word in words] >>> wordlens.sort() >>> ' '.join(w for(_, w) in wordlens) 'I off the turned spectroroute' >>> The first line is to spl…

2013-06-02

Deutschina's Tech Diary

Entries from 2013-06-02 to 1 day

Japanese corpus (12.1.1)

Combining Different Sequence Types (4.2.2-4.2.3)

■