Exercise Chapter2 (12-15)

12.

>>> entries = nltk.corpus.cmudict.entries()
>>> len(entries)
133737
>>> words = [word for word, pron in entries]
>>> len(words)
133737
>>> len(set(words))
123455
>>> from __future__ import division
>>> 1 - (len(set(words)) / len(words))
0.07688223902136282
>>> len(words) - len(set(words))
10282

Distinct words: 123455
Fraction of words with more than one pronunciations: 7.69%

13.

It took very long runtime to reach this idea because I am still not so familiar with python grammar.

>>> from nltk.corpus import wordnet as wn
>>> len([synset for synset in list(wn.all_synsets('n'))])
82115
>>> len([synset for synset in list(wn.all_synsets('n')) if len(list(synset.hyponyms()))==0])
65422

14.

>>> def supergloss(s):
...     print 'Original:'
...     print wn.synset(s).definition
...     print 'Hypernyms'
...     for hypernym in set(wn.synset(s).hypernyms()):
...             print hypernym, hypernym.definition
...     print 'Hyponyms'
...     for hyponym in set(wn.synset(s).hyponyms()):
...             print hyponym, hyponym.definition
... 
>>> supergloss('car.n.01')
Original:
a motor vehicle with four wheels; usually propelled by an internal combustion engine
Hypernyms
Synset('motor_vehicle.n.01') a self-propelled wheeled vehicle that does not run on rails
Hyponyms
Synset('beach_wagon.n.01') a car that has a long body and rear door with space behind rear seat
Synset('minicar.n.01') a car that is even smaller than a subcompact car
Synset('gas_guzzler.n.01') a car with relatively low fuel efficiency
Synset('loaner.n.02') a car that is lent as a replacement for one that is under repair
Synset('model_t.n.01') the first widely available automobile powered by a gasoline engine; mass-produced by Henry Ford from 1908 to 1927
Synset('horseless_carriage.n.01') an early term for an automobile
Synset('electric.n.01') a car that is powered by electricity
Synset('bus.n.04') a car that is old and unreliable
Synset('cab.n.03') a car driven by a person whose job is to take passengers where they want to go in exchange for money
Synset('convertible.n.01') a car that has top that can be folded or removed
Synset('compact.n.03') a small and economical car
Synset('minivan.n.01') a small box-shaped passenger van; usually has removable seats; used as a family car
Synset('racer.n.02') a fast car that competes in races
Synset('hatchback.n.01') a car having a hatchback door
Synset('roadster.n.01') an open automobile having a front seat and a rumble seat
Synset('sport_utility.n.01') a high-performance four-wheel drive car built on a truck chassis
Synset('sedan.n.01') a car that is closed and that has front and rear seats and two or four doors
Synset('jeep.n.01') a car suitable for traveling over rough terrain
Synset('limousine.n.01') large luxurious car; usually driven by a chauffeur
Synset('cruiser.n.01') a car in which policemen cruise the streets; equipped with radiotelephonic communications to headquarters
Synset('hardtop.n.01') a car that resembles a convertible but has a fixed rigid top
Synset('ambulance.n.01') a vehicle that takes people to and from hospitals
Synset('stanley_steamer.n.01') a steam-powered automobile
Synset('touring_car.n.01') large open car seating four with folding top
Synset('used-car.n.01') a car that has been previously owned; not a new car
Synset('stock_car.n.01') a car kept in dealers' stock for regular sales
Synset('subcompact.n.01') a car smaller than a compact car
Synset('pace_car.n.01') a high-performance car that leads a parade of competing cars through the pace lap and then pulls off the course
Synset('hot_rod.n.01') a car modified to increase its speed and acceleration
Synset('sports_car.n.01') a small low car with a high-powered engine; usually seats two persons
Synset('coupe.n.01') a car with two doors and front seats and a luggage compartment
>>> 

Let's try another examples:

>>> supergloss('computer.n.01')
Original:
a machine for performing calculations automatically
Hypernyms
Synset('machine.n.01') any mechanical or electrical device that transmits or modifies energy to perform or assist in the performance of human tasks
Hyponyms
Synset('turing_machine.n.01') a hypothetical computer with an infinitely long memory tape
Synset('home_computer.n.01') a computer intended for use in the home
Synset('node.n.08') (computer science) any computer that is hooked up to a computer network
Synset('number_cruncher.n.02') a computer capable of performing a large number of mathematical operations per second
Synset('server.n.03') (computer science) a computer that provides client stations with access to files and printers as shared resources to a computer network
Synset('pari-mutuel_machine.n.01') computer that registers bets and divides the total amount bet among those who won
Synset('digital_computer.n.01') a computer that represents information by numerical (binary) digits
Synset('predictor.n.03') a computer for controlling antiaircraft fire that computes the position of an aircraft at the instant of a shell's arrival
Synset('web_site.n.01') a computer connected to the internet that maintains a series of web pages on the World Wide Web
Synset('analog_computer.n.01') a computer that represents information by variable quantities (e.g., positions or voltages)
>>> 
>>> supergloss('laptop.n.01')
Original:
a portable computer small enough to use in your lap
Hypernyms
Synset('portable_computer.n.01') a personal computer that can easily be carried by hand
Hyponyms

15.

>>> from nltk.corpus import brown
>>> fdist = FreqDist(brown.words())
>>> sorted([w for w in set(fdist) if fdist[w] >= 3])
['!', '$.03', '$.07', '$1', '$1,000', '$1,500', '$1.1', '$10', '$10,000', '$100', '$100,000', '$125', '$135', '$14', '$15', '$15,000', '$150', '$17,000', '$2', '$2,000', '$20', '$20,000', '$200', '$25', '$25,000', '$250', '$28', '$3', '$3,000', '$30,000', '$300', '$37', '$4', '$40', '$400', '$45', '$450', '$5', '$5,000', 
....
'zero', 'zest', 'zigzagging', 'zinc', 'zing', 'zone', 'zones', 'zoning', 'zoo']
>>> 

The result is too long let's adjust a little bit.

>>> def ExtractBrownWords(th):
...     fdist = FreqDist(brown.words())
...     return sorted([w for w in set(fdist) if fdist[w] >= th])
... 
>>> ExtractBrownWords(50)
['!', '&', "'", "''", '(', ')', ',', '-', '--', '.', '1', '10', '100', '11', '12', '14', '15', '16', '18', '1958', '1959', '1960', '1961', '2', '20', '25', '3', '30', '4', '5', '50', 
....
"you'll", "you're", 'young', 'your', 'yourself', 'youth']
>>> ExtractBrownWords(1000)
['!', "''", '(', ')', ',', '--', '.', ':', ';', '?', 'A', 'But', 'He', 'I', 'In', 'It', 'The', 'This', '``', 'a', 'about', 'all', 'an', 'and', 'any', 'are', 'as', 'at', 'be', 'been', 'but', 'by', 'can', 'could', 'do', 'first', 'for', 'from', 'had', 'has', 'have', 'he', 'her', 'him', 'his', 'if', 'in', 'into', 'is', 'it', 'its', 'like', 'made', 'man', 'may', 'me', 'more', 'most', 'must', 'my', 'new', 'no', 'not', 'now', 'of', 'on', 'one', 'only', 'or', 'other', 'our', 'out', 'over', 'said', 'she', 'so', 'some', 'such', 'than', 'that', 'the', 'their', 'them', 'then', 'there', 'these', 'they', 'this', 'time', 'to', 'two', 'up', 'was', 'we', 'were', 'what', 'when', 'which', 'who', 'will', 'with', 'would', 'you']
>>> 

Runtime is poor but can get expected results anyway...