WordNet and Hypernyms

WordNet is a lexical database, a kind of dictionary.

Japanese: http://ja.wikipedia.org/wiki/WordNet
English: http://en.wikipedia.org/wiki/WordNet

NLTK has a simple interface to WordNet. Synset is groups of similar meaning words. A word belongs to one synset but sometime multiple synsets in case the word has various meaing.

How to import WordNet:

>>> from nltk.corpus import wordnet

As an example, use a synset "cookbook".

>>> syn = wordnet.synsets('cookbook')[0]
>>> syn.name
'cookbook.n.01'
>>> syn.definition
'a book of recipes and cooking directions'

It is a little bit unclear, especially meaning of "[0]". I tried some different words.

>>> wordnet.synsets('sugar')
[Synset('sugar.n.01'), Synset('carbohydrate.n.01'), Synset('boodle.n.01'), Synse
t('sugar.v.01')]
>>> wordnet.synsets('salt')
[Synset('salt.n.01'), Synset('salt.n.02'), Synset('strategic_arms_limitation_tal
ks.n.01'), Synset('salt.n.04'), Synset('salt.v.01'), Synset('salt.v.02'), Synset
('salt.v.03'), Synset('salt.v.04'), Synset('salt.s.01')]
>>>

A word "sugar" is included in 4 synsets, sugar.n.01, carbohydrate.n.01, boodle.n.01, sugar.v.01. For "salt", there are 9 synsests. [0] means the fisrt element of list in Python. Therefore wordnet.synsets('cookbook')[0] stands for the first element of synsets where 'cookbook' is assigned.

>>> wordnet.synsets('cookbook')
[Synset('cookbook.n.01')]

Anyway, only one synset exists for this word. Let's move on. Synsets have some example sentences.

>>> wordnet.synsets('cooking')[0].examples
['cooking can be a great art', 'people are needed who have experience in cookery
', 'he left the preparation of meals to his wife']

This is the same as "sugar" and "salt" example.

>>> wordnet.synsets('cooking')
[Synset('cooking.n.01'), Synset('cook.v.01'), Synset('cook.v.02'), Synset('cook.
v.03'), Synset('fudge.v.01'), Synset('cook.v.05')]
>>> wordnet.synsets('cooking')[0]
Synset('cooking.n.01')

Hypernym means a word that is more generic than a given word. This command to display hypernym(s) of a word "cookbook".

>>> syn.hypernyms()
[Synset('reference_book.n.01')]

This one to display hyperinym of hyponyms. Hyponyms is a word taht is more specific than a given word.

>>> syn.hypernyms()[0].hyponyms()
[Synset('encyclopedia.n.01'), Synset('directory.n.01'), Synset('source_book.n.01
'), Synset('handbook.n.01'), Synset('instruction_book.n.01'), Synset('cookbook.n
.01'), Synset('annual.n.02'), Synset('atlas.n.02'), Synset('wordbook.n.01')]

"cookbook" has "reference_book" as a hypernym then display hyponyms of "reference_book" in this example. Of course, "cookbook" is included as one of hyponyms. Others could be 'neibors' of "cookbook".

root_hypernyms() should be the highest level. In this case, "entity" is the one.

>>> syn.root_hypernyms()
[Synset('entity.n.01')]

This command is to show the path from the highest level to the bottom level("cookbook").

>>> syn.hypernym_paths()
[[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('object.n.01'),
Synset('whole.n.02'), Synset('artifact.n.01'), Synset('creation.n.02'), Synset('
product.n.02'), Synset('work.n.02'), Synset('publication.n.01'), Synset('book.n.
01'), Synset('reference_book.n.01'), Synset('cookbook.n.01')]]

The next command answers a question what is the meaning of "n" after word and period(.).

>>> syn.pos
'n'

Now it is clear. That's part-of-speech (POS).

n: Noun
a: Adjective
r: Adverb
v: Verb

Using len, we can know how many elements are included.

>>> len(wordnet.synsets('great'))
7
>>> len(wordnet.synsets('great', pos='n'))
1
>>> len(wordnet.synsets('great', pos='a'))
6
>>> wordnet.synsets('great')
[Synset('great.n.01'), Synset('great.s.01'), Synset('great.s.02'), Synset('great
.s.03'), Synset('bang-up.s.01'), Synset('capital.s.03'), Synset('big.s.13')]
>>>

In this example, there is only 1 Noun and 6 adjectives in the list. However, I got an interesting error when I tried as follows.

>>> len(wordnet.synsets('great', pos='s'))
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Python27\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1200,
 in synsets
 for form in self._morphy(lemma, p)
 File "C:\Python27\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1380,
 in _morphy
 substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
KeyError: 's'
>>>

As you can see 6 elements have 's', guess 's' stands for synonym. Then the system gets a POS of the synonyms then return their POSs.