Entries from 2013-05-01 to 1 month

WordNet

O'Reilly's textbook (hereafter "Whale book") chapter 2.5:These two sentences have the same meaning even though the last word (motorcar, automobile) is different.a. Benz is credited with the invention of the motorcar. b. Benz is credited wi…

Comparing Vocabulary List

Going through O'Reilly's textbook chapter 2.4.3: The title of this post might not be same as English version of the textbook as I am using Japanese version. Swadesh Vocabulary List: >>> from nltk.corpus import swadesh >>> swadesh.fileids()…

Vocabulary resources 2

Continuing O'Reilly's textbook chapter 2.4.2: >>> entries = nltk.corpus.cmudict.entries() >>> len(entries) 133737 >>> for entry in entries[39943:39951]: ... print entry ... ('explorer', ['IH0', 'K', 'S', 'P', 'L', 'AO1', 'R', 'ER0']) ('exp…

Vocabulary resources

Continuing O'Reilly's textbook chapter 2.4.1:Created this code then save under the name unusual_words.py. def unusual_words(text): import nltk from nltk.corpus import words text_vocab = set(w.lower() for w in text if w.isalpha()) english_v…

Reusable codes

O'Reilly's textbook in Chapter 2.3:Can write down source codes from IDLE by menu: File -> New windowThis can be saved as a file with extension ".py". Then execute via menu: Run -> Run ModuleDefining Functions: Already experienced in previo…

Conditional frequency distribution

Let's going through O'Reilly's textbook chapter 2.2.Already used ConditionalFreqDist in the previous chapters. ConditionalFreqDist receives "list of pairs", condition and object. This example is to get pairs of each genre and words in Brow…

Other corpora

Resume my study at Chapter 2.1.6 in O'Reilly's textbook.The list of corpora is available in http://nltk.org/nltk_data/. In the textbook, Corpus HOWTO (http://www.nltk.org/howto) is also introduced, but I could not access to this link.Corpo…