Entries from 2013-04-01 to 1 month
O'Reilly's textbook chapter 2.1.4-2.1.5Reuters corpus should have plenty of news documents. >>> import nltk >>> import sys >>> from nltk.corpus import reuters >>> reuters.fileids() ['test/14826', 'test/14828', 'test/14829', 'test/14832', '…
Coming back to the O'Reilly's textbook. (Chapter 2.1.2-2.1.3)Referring Webtext: >>> from nltk.corpus import webtext >>> for fileid in webtext.fileids(): ... print fileid, webtext.raw(fileid)[:65], '...' ... firefox.txt Cookie Manager: "Don…
Takes long time to move on this because of lack of knowledge.First I created replacer.py the put under the root of Python27. However, I got following error. >>> from replacer import * Traceback (most recent call last): File "<stdin>", l…
As mentioned in the past, Babelfish Translatoin service seems no longer available. >>> from nltk.misc import babelfish >>> babelfish.translate('cookbook', 'english', 'spanish') Traceback (most recent call last): File "<stdin>", line 1, …
Stemming is technique for removing affixes from a word, ending up with the stem. I don't know the meaning of the words, "affixes" and "stem" but there is an example in the textbook. The stem of "cooking" is "cook" and "ing" is the suffix.P…
Now start reading Chapter 2.1.1. [code language="python"] >>> nltk.corpus.gutenberg.fileids() ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.t…
23. >>>for w in [w for w in text6 if w.isupper()]: ...     print w ... .... ARTHUR GALAHAD KNIGHTS TIM ROBIN ARTHUR ROBIN GALAHAD ARTHUR ROBIN KNIGHTS ARTHUR TIM I KNIGHTS .... 24.At beginning, I thought the requirement…
Continuing Chapter 1 Exercise...17. >>> text9.index('sunset') 629 >>> text9[620:630] ['PARK', 'THE', 'suburb', 'of', 'Saffron', 'Park', 'lay', 'on', 'the', 'sunset'] >>> text9[620:635] ['PARK', 'THE', 'suburb', 'of', 'Saffron', 'Park', 'la…
Let's start with a sample; to calculate similarity of two words, 'cookbook' and 'instruction_book'. >>> cb = wordnet.synset('cookbook.n.01') >>> ib = wordnet.synset('instruction_book.n.01') >>> cb.wup_similarity(ib) 0.9166666666666666 Alth…
According to the text book, Collocations are two or more words that tend to appear frequently together. This was also introduced in chapter 1 of the O'Reilly's text. >>> from nltk.corpus import webtext >>> from nltk.collocations import Big…
Read through Chapter 1 of the O'Reilly's textbook. Here quickly go through the exercise. These are just my results and might not be the best answers of the questions.1. >>> 12 / (4 + 1) 2 >>> from __future__ import division >>> 12 / (4 + 1…
To find synonyms of a word, lemma can be used. This example is to find synonyms of "cookbook". >>> syn = wordnet.synsets('cookbook')[0] >>> lemmas = syn.lemmas >>> len(lemmas) 2 >>> lemmas [Lemma('cookbook.n.01.cookbook'), Lemma('cookbook.…
WordNet is a lexical database, a kind of dictionary.Japanese: http://ja.wikipedia.org/wiki/WordNet English: http://en.wikipedia.org/wiki/WordNetNLTK has a simple interface to WordNet. Synset is groups of similar meaning words. A word belon…
Stopwords are common words that generally do not contribute to the meaning of sentences. As usual, import nltk.book. >>> import nltk >>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and …
chatbots() is also introduced in the O'Reilly's book. >>> nltk.chat.chatbots() Which chatbot would you like to talk to? 1: Eliza (psycho-babble) 2: Iesha (teen anime junky) 3: Rude (abusive bot) 4: Suntsu (Chinese sayings) 5: Zen (gems of …
O'Reilly's "Natural Language Processing with Python" is my main text book for learning NLTK. When I am using my Macbook environment, I refer this textbook.I am still reading Chapter 1 and always start from this. >>> import nltk >>> from nl…
Still continuing tokenize.word_tokenize does not handle some cases as I expected. For example. >>> word_tokenize("can't") ['ca', "n't"] In my textbook, other tools were introduced. For example, PunktWordTokenizer >>> from nltk.to…
As my learning NLTK environment seems ready, let's moving forward.I set a one varialvle (para) to put 3 sentences. >>> import nltk >>> para = "Hello World. It's good to see you. Thansk for buying this book." >>> para "Hello World. It's goo…
In addition to O'Reilly's "Natural Language Processing with Python", now I have downloaded this book into my Kindle. I will use this book mainly in my Window's environment.Python Text Processing With NLTK 2.0 Cookbook: Over 80 Practical Re…
I got a following error when trying plot() function in NLTK. >>> fdist.plot(cumulative=True) Exception in Tkinter callback Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkin…
It was not convenient to move to the directlry under C:\Python27\Scrpits every time I start to use Python and NLTK in my Windows 7 environment. Now I learnt how to set the PATH as an enverionment variable.1. Right click My Computer icon th…
Another memo for problems when to install NLTK into my Windows 7 64bit environment. As same as previous topic, I just mentioned where I stacked.1. Installer of some tools are missing?I got some error message when I tried to install some to…
Just leave some memo as I stacked to install NLTK into my Macbook Pro (Mountain Lion). Please note this is not a complete guidance, but just to mention points where I stacked.1. Python must be re-installed even if Mac-python is availableIn…