Entries from 2013-04-01 to 1 month

Reuters corpus & Inaugural corpus

O'Reilly's textbook chapter 2.1.4-2.1.5Reuters corpus should have plenty of news documents. >>> import nltk >>> import sys >>> from nltk.corpus import reuters >>> reuters.fileids() ['test/14826', 'test/14828', 'test/14829', 'test/14832', '…

Various corpus

Coming back to the O'Reilly's textbook. (Chapter 2.1.2-2.1.3)Referring Webtext: >>> from nltk.corpus import webtext >>> for fileid in webtext.fileids(): ... print fileid, webtext.raw(fileid)[:65], '...' ... firefox.txt Cookie Manager: "Don…

Replacing words matching regular expressions

Takes long time to move on this because of lack of knowledge.First I created replacer.py the put under the root of Python27. However, I got following error. >>> from replacer import * Traceback (most recent call last): File "<stdin>", l…

Babelfish Translation from NLTK

As mentioned in the past, Babelfish Translatoin service seems no longer available. >>> from nltk.misc import babelfish >>> babelfish.translate('cookbook', 'english', 'spanish') Traceback (most recent call last): File "<stdin>", line 1, …

Stemming and lemmatization

Stemming is technique for removing affixes from a word, ending up with the stem. I don't know the meaning of the words, "affixes" and "stem" but there is an example in the textbook. The stem of "cooking" is "cook" and "ing" is the suffix.P…

Access to text corpus

Now start reading Chapter 2.1.1. [code language="python"] >>> nltk.corpus.gutenberg.fileids() ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.t…

O'Reilly: Chapter 1 Exercise 23-29

23. >>>for w in [w for w in text6 if w.isupper()]: ...     print w ... .... ARTHUR GALAHAD KNIGHTS TIM ROBIN ARTHUR ROBIN GALAHAD ARTHUR ROBIN KNIGHTS ARTHUR TIM I KNIGHTS .... 24.At beginning, I thought the requirement…

O'Reilly: Chapter 1 Exercise 17-22

Continuing Chapter 1 Exercise...17. >>> text9.index('sunset') 629 >>> text9[620:630] ['PARK', 'THE', 'suburb', 'of', 'Saffron', 'Park', 'lay', 'on', 'the', 'sunset'] >>> text9[620:635] ['PARK', 'THE', 'suburb', 'of', 'Saffron', 'Park', 'la…

Similaritiy of words

Let's start with a sample; to calculate similarity of two words, 'cookbook' and 'instruction_book'. >>> cb = wordnet.synset('cookbook.n.01') >>> ib = wordnet.synset('instruction_book.n.01') >>> cb.wup_similarity(ib) 0.9166666666666666 Alth…

Word collocations

According to the text book, Collocations are two or more words that tend to appear frequently together. This was also introduced in chapter 1 of the O'Reilly's text. >>> from nltk.corpus import webtext >>> from nltk.collocations import Big…

O'Reilly: Chapter 1 Exercise 1 - 16

Read through Chapter 1 of the O'Reilly's textbook. Here quickly go through the exercise. These are just my results and might not be the best answers of the questions.1. >>> 12 / (4 + 1) 2 >>> from __future__ import division >>> 12 / (4 + 1…

Synonyms and Antonyms

To find synonyms of a word, lemma can be used. This example is to find synonyms of "cookbook". >>> syn = wordnet.synsets('cookbook')[0] >>> lemmas = syn.lemmas >>> len(lemmas) 2 >>> lemmas [Lemma('cookbook.n.01.cookbook'), Lemma('cookbook.…

WordNet and Hypernyms

WordNet is a lexical database, a kind of dictionary.Japanese: http://ja.wikipedia.org/wiki/WordNet English: http://en.wikipedia.org/wiki/WordNetNLTK has a simple interface to WordNet. Synset is groups of similar meaning words. A word belon…

Stopwords

Stopwords are common words that generally do not contribute to the meaning of sentences. As usual, import nltk.book. >>> import nltk >>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and …

chatbots()

chatbots() is also introduced in the O'Reilly's book. >>> nltk.chat.chatbots() Which chatbot would you like to talk to? 1: Eliza (psycho-babble) 2: Iesha (teen anime junky) 3: Rude (abusive bot) 4: Suntsu (Chinese sayings) 5: Zen (gems of …

More familiar with Python

O'Reilly's "Natural Language Processing with Python" is my main text book for learning NLTK. When I am using my Macbook environment, I refer this textbook.I am still reading Chapter 1 and always start from this. >>> import nltk >>> from nl…

What is Tokenize? Part 2

Still continuing tokenize.word_tokenize does not handle some cases as I expected. For example. >>> word_tokenize("can't") ['ca', "n't"] In my textbook, other tools were introduced. For example, PunktWordTokenizer >>> from nltk.to…

What is Tokenize?

As my learning NLTK environment seems ready, let's moving forward.I set a one varialvle (para) to put 3 sentences. >>> import nltk >>> para = "Hello World. It's good to see you. Thansk for buying this book." >>> para "Hello World. It's goo…

My NLTK textbook

In addition to O'Reilly's "Natural Language Processing with Python", now I have downloaded this book into my Kindle. I will use this book mainly in my Window's environment.Python Text Processing With NLTK 2.0 Cookbook: Over 80 Practical Re…

plot() function error in matplotlib (Mac)

I got a following error when trying plot() function in NLTK. >>> fdist.plot(cumulative=True) Exception in Tkinter callback Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkin…

How to set the PATH (Windows)

It was not convenient to move to the directlry under C:\Python27\Scrpits every time I start to use Python and NLTK in my Windows 7 environment. Now I learnt how to set the PATH as an enverionment variable.1. Right click My Computer icon th…

Installing NLTK under windows 7 & proxy environment

Another memo for problems when to install NLTK into my Windows 7 64bit environment. As same as previous topic, I just mentioned where I stacked.1. Installer of some tools are missing?I got some error message when I tried to install some to…

Installing NLTK to Mountain Lion

Just leave some memo as I stacked to install NLTK into my Macbook Pro (Mountain Lion). Please note this is not a complete guidance, but just to mention points where I stacked.1. Python must be re-installed even if Mac-python is availableIn…