Exercise Chapter2 (12-15)
12.
>>> entries = nltk.corpus.cmudict.entries() >>> len(entries) 133737 >>> words = [word for word, pron in entries] >>> len(words) 133737 >>> len(set(words)) 123455 >>> from __future__ import division >>> 1 - (len(set(words)) / len(words)) 0.07688223902136282 >>> len(words) - len(set(words)) 10282
Distinct words: 123455
Fraction of words with more than one pronunciations: 7.69%
13.
It took very long runtime to reach this idea because I am still not so familiar with python grammar.
>>> from nltk.corpus import wordnet as wn >>> len([synset for synset in list(wn.all_synsets('n'))]) 82115 >>> len([synset for synset in list(wn.all_synsets('n')) if len(list(synset.hyponyms()))==0]) 65422
14.
>>> def supergloss(s): ... print 'Original:' ... print wn.synset(s).definition ... print 'Hypernyms' ... for hypernym in set(wn.synset(s).hypernyms()): ... print hypernym, hypernym.definition ... print 'Hyponyms' ... for hyponym in set(wn.synset(s).hyponyms()): ... print hyponym, hyponym.definition ... >>> supergloss('car.n.01') Original: a motor vehicle with four wheels; usually propelled by an internal combustion engine Hypernyms Synset('motor_vehicle.n.01') a self-propelled wheeled vehicle that does not run on rails Hyponyms Synset('beach_wagon.n.01') a car that has a long body and rear door with space behind rear seat Synset('minicar.n.01') a car that is even smaller than a subcompact car Synset('gas_guzzler.n.01') a car with relatively low fuel efficiency Synset('loaner.n.02') a car that is lent as a replacement for one that is under repair Synset('model_t.n.01') the first widely available automobile powered by a gasoline engine; mass-produced by Henry Ford from 1908 to 1927 Synset('horseless_carriage.n.01') an early term for an automobile Synset('electric.n.01') a car that is powered by electricity Synset('bus.n.04') a car that is old and unreliable Synset('cab.n.03') a car driven by a person whose job is to take passengers where they want to go in exchange for money Synset('convertible.n.01') a car that has top that can be folded or removed Synset('compact.n.03') a small and economical car Synset('minivan.n.01') a small box-shaped passenger van; usually has removable seats; used as a family car Synset('racer.n.02') a fast car that competes in races Synset('hatchback.n.01') a car having a hatchback door Synset('roadster.n.01') an open automobile having a front seat and a rumble seat Synset('sport_utility.n.01') a high-performance four-wheel drive car built on a truck chassis Synset('sedan.n.01') a car that is closed and that has front and rear seats and two or four doors Synset('jeep.n.01') a car suitable for traveling over rough terrain Synset('limousine.n.01') large luxurious car; usually driven by a chauffeur Synset('cruiser.n.01') a car in which policemen cruise the streets; equipped with radiotelephonic communications to headquarters Synset('hardtop.n.01') a car that resembles a convertible but has a fixed rigid top Synset('ambulance.n.01') a vehicle that takes people to and from hospitals Synset('stanley_steamer.n.01') a steam-powered automobile Synset('touring_car.n.01') large open car seating four with folding top Synset('used-car.n.01') a car that has been previously owned; not a new car Synset('stock_car.n.01') a car kept in dealers' stock for regular sales Synset('subcompact.n.01') a car smaller than a compact car Synset('pace_car.n.01') a high-performance car that leads a parade of competing cars through the pace lap and then pulls off the course Synset('hot_rod.n.01') a car modified to increase its speed and acceleration Synset('sports_car.n.01') a small low car with a high-powered engine; usually seats two persons Synset('coupe.n.01') a car with two doors and front seats and a luggage compartment >>>
Let's try another examples:
>>> supergloss('computer.n.01') Original: a machine for performing calculations automatically Hypernyms Synset('machine.n.01') any mechanical or electrical device that transmits or modifies energy to perform or assist in the performance of human tasks Hyponyms Synset('turing_machine.n.01') a hypothetical computer with an infinitely long memory tape Synset('home_computer.n.01') a computer intended for use in the home Synset('node.n.08') (computer science) any computer that is hooked up to a computer network Synset('number_cruncher.n.02') a computer capable of performing a large number of mathematical operations per second Synset('server.n.03') (computer science) a computer that provides client stations with access to files and printers as shared resources to a computer network Synset('pari-mutuel_machine.n.01') computer that registers bets and divides the total amount bet among those who won Synset('digital_computer.n.01') a computer that represents information by numerical (binary) digits Synset('predictor.n.03') a computer for controlling antiaircraft fire that computes the position of an aircraft at the instant of a shell's arrival Synset('web_site.n.01') a computer connected to the internet that maintains a series of web pages on the World Wide Web Synset('analog_computer.n.01') a computer that represents information by variable quantities (e.g., positions or voltages) >>> >>> supergloss('laptop.n.01') Original: a portable computer small enough to use in your lap Hypernyms Synset('portable_computer.n.01') a personal computer that can easily be carried by hand Hyponyms
15.
>>> from nltk.corpus import brown >>> fdist = FreqDist(brown.words()) >>> sorted([w for w in set(fdist) if fdist[w] >= 3]) ['!', '$.03', '$.07', '$1', '$1,000', '$1,500', '$1.1', '$10', '$10,000', '$100', '$100,000', '$125', '$135', '$14', '$15', '$15,000', '$150', '$17,000', '$2', '$2,000', '$20', '$20,000', '$200', '$25', '$25,000', '$250', '$28', '$3', '$3,000', '$30,000', '$300', '$37', '$4', '$40', '$400', '$45', '$450', '$5', '$5,000', .... 'zero', 'zest', 'zigzagging', 'zinc', 'zing', 'zone', 'zones', 'zoning', 'zoo'] >>>
The result is too long let's adjust a little bit.
>>> def ExtractBrownWords(th): ... fdist = FreqDist(brown.words()) ... return sorted([w for w in set(fdist) if fdist[w] >= th]) ... >>> ExtractBrownWords(50) ['!', '&', "'", "''", '(', ')', ',', '-', '--', '.', '1', '10', '100', '11', '12', '14', '15', '16', '18', '1958', '1959', '1960', '1961', '2', '20', '25', '3', '30', '4', '5', '50', .... "you'll", "you're", 'young', 'your', 'yourself', 'youth'] >>> ExtractBrownWords(1000) ['!', "''", '(', ')', ',', '--', '.', ':', ';', '?', 'A', 'But', 'He', 'I', 'In', 'It', 'The', 'This', '``', 'a', 'about', 'all', 'an', 'and', 'any', 'are', 'as', 'at', 'be', 'been', 'but', 'by', 'can', 'could', 'do', 'first', 'for', 'from', 'had', 'has', 'have', 'he', 'her', 'him', 'his', 'if', 'in', 'into', 'is', 'it', 'its', 'like', 'made', 'man', 'may', 'me', 'more', 'most', 'must', 'my', 'new', 'no', 'not', 'now', 'of', 'on', 'one', 'only', 'or', 'other', 'our', 'out', 'over', 'said', 'she', 'so', 'some', 'such', 'than', 'that', 'the', 'their', 'them', 'then', 'there', 'these', 'they', 'this', 'time', 'to', 'two', 'up', 'was', 'we', 'were', 'what', 'when', 'which', 'who', 'will', 'with', 'would', 'you'] >>>
Runtime is poor but can get expected results anyway...