Exercise: Chapter 2 (1-7)
Although it took a long time, now I reached to the end of Chapter 2 in the whale book.
1.
>>> words1 = ['green', 'yellow', 'red', 'white', 'black'] >>> words2 = ['pink', 'brown'] >>> words3 = words1 + words2 >>> words3 ['green', 'yellow', 'red', 'white', 'black', 'pink', 'brown'] >>> words2 * 2 ['pink', 'brown', 'pink', 'brown'] >>> words3[2] 'red' >>> words3[:2] ['green', 'yellow'] >>> words3[-2] 'pink' >>> words3[-2:] ['pink', 'brown'] >>> words3[2:4] ['red', 'white'] >>> ' '.join(words1) 'green yellow red white black' >>> sorted(words3) ['black', 'brown', 'green', 'pink', 'red', 'white', 'yellow']
2.
austen-persuasion.txt is under gutenberg.
>>> len(nltk.corpus.gutenberg.words('austen-persuasion.txt')) 98171 >>> ap = nltk.corpus.gutenberg.words('austen-persuasion.txt') >>> len(ap) 98171 >>> len(set(ap)) 6132 >>>
Ans: austen-persuasion.txt is consist of 98,171 words. Number of unique words is 6,132.
3.
>>> bc = nltk.corpus.brown.words() >>> wt = nltk.corpus.webtext.words() >>> bc ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...] >>> wt ['Cookie', 'Manager', ':', '"', 'Don', "'", 't', ...] >>> len(set(bc)) 56057 >>> len(set(wt)) 21537 >>> fdistbc = nltk.FreqDist([w.lower() for w in bc]) >>> fdistwt = nltk.FreqDist([w.lower() for w in wt]) >>> modals = ['what', 'why', 'when', 'which', 'who', 'how'] >>> for m in modals: ... print m + ':', fdistbc[m] ... what: 1908 why: 404 when: 2331 which: 3561 who: 2252 how: 834 >>> for m in modals: ... print m + ':', fdistwt[m] ... what: 1362 why: 400 when: 1833 which: 134 who: 404 how: 461 >>> fdistbc <FreqDist with 49815 samples and 1161192 outcomes> >>> fdistwt <FreqDist with 17414 samples and 396736 outcomes> >>> fdistbc.max() 'the' >>> fdistbc['the'] 69971 >>> fdistbc.freq('the') 0.06025790739171472 >>> fdistwt.max() '.'
4.
>>> from nltk.corpus import state_union >>> state_union.fileids() ['1945-Truman.txt', '1946-Truman.txt', '1947-Truman.txt', '1948-Truman.txt', '1949-Truman.txt', '1950-Truman.txt', '1951-Truman.txt', '1953-Eisenhower.txt', '1954-Eisenhower.txt', '1955-Eisenhower.txt', '1956-Eisenhower.txt', '1957-Eisenhower.txt', '1958-Eisenhower.txt', '1959-Eisenhower.txt', '1960-Eisenhower.txt', '1961-Kennedy.txt', '1962-Kennedy.txt', '1963-Johnson.txt', '1963-Kennedy.txt', '1964-Johnson.txt', '1965-Johnson-1.txt', '1965-Johnson-2.txt', '1966-Johnson.txt', '1967-Johnson.txt', '1968-Johnson.txt', '1969-Johnson.txt', '1970-Nixon.txt', '1971-Nixon.txt', '1972-Nixon.txt', '1973-Nixon.txt', '1974-Nixon.txt', '1975-Ford.txt', '1976-Ford.txt', '1977-Ford.txt', '1978-Carter.txt', '1979-Carter.txt', '1980-Carter.txt', '1981-Reagan.txt', '1982-Reagan.txt', '1983-Reagan.txt', '1984-Reagan.txt', '1985-Reagan.txt', '1986-Reagan.txt', '1987-Reagan.txt', '1988-Reagan.txt', '1989-Bush.txt', '1990-Bush.txt', '1991-Bush-1.txt', '1991-Bush-2.txt', '1992-Bush.txt', '1993-Clinton.txt', '1994-Clinton.txt', '1995-Clinton.txt', '1996-Clinton.txt', '1997-Clinton.txt', '1998-Clinton.txt', '1999-Clinton.txt', '2000-Clinton.txt', '2001-GWBush-1.txt', '2001-GWBush-2.txt', '2002-GWBush.txt', '2003-GWBush.txt', '2004-GWBush.txt', '2005-GWBush.txt', '2006-GWBush.txt'] >>>
It seems that first 4 digits stand for year.
>>> [fileid[:4] for fileid in state_union.fileids()] ['1945', '1946', '1947', '1948', '1949', '1950', '1951', '1953', '1954', '1955', '1956', '1957', '1958', '1959', '1960', '1961', '1962', '1963', '1963', '1964', '1965', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2001', '2002', '2003', '2004', '2005', '2006'] >>>
It's time to use ConditionalFreqDist().
>>> cfd = nltk.ConditionalFreqDist( ... (target, fileid[:4]) ... for fileid in state_union.fileids() ... for w in state_union.words(fileid) ... for target in ['men', 'women', 'people'] ... if w.lower() == target) >>> cfd.tabulate() 1945 1946 1947 1948 1949 1950 1951 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 men 2 12 7 5 2 6 8 3 2 4 2 5 2 4 2 6 6 8 3 19 12 11 4 5 2 1 1 1 0 0 3 2 0 0 1 1 1 3 3 1 2 1 1 2 3 9 4 1 1 1 2 1 2 2 5 4 3 6 7 8 7 people 10 49 12 22 15 15 10 17 15 26 30 11 19 11 10 10 10 15 3 30 35 25 17 6 23 34 9 10 22 14 18 19 26 15 12 11 17 19 27 12 14 24 17 13 9 27 27 45 66 73 43 31 22 22 41 27 14 33 21 18 22 women 2 7 2 1 1 2 2 0 0 0 2 2 1 1 0 0 2 5 1 3 1 1 0 2 0 0 0 0 0 0 1 1 1 1 2 1 2 7 5 1 2 0 0 3 2 9 4 2 1 3 3 2 2 3 7 6 6 4 8 11 7 >>>
Yes, I can count the numbers but prefer to a more graphical way.
>>> cfd.plot()
What can I say from the results?
- 'People' is intensively used in mid of 1990's (1994, 1995).
- 'women' is more frequently used since 1978.
5.
>>> from nltk.corpus import wordnet as wn >>> wn.synset('computer.n.01').part_meronyms() [Synset('computer_accessory.n.01'), Synset('data_converter.n.01'), Synset('hardware.n.03'), Synset('keyboard.n.01'), Synset('chip.n.07'), Synset('memory.n.04'), Synset('disk_cache.n.01'), Synset('diskette.n.01'), Synset('busbar.n.01'), Synset('computer_circuit.n.01'), Synset('monitor.n.04'), Synset('central_processing_unit.n.01'), Synset('peripheral.n.01'), Synset('cathode-ray_tube.n.01')] >>> wn.synset('computer.n.01').substance_meronyms() [] >>> wn.synset('computer.n.01').member_holonyms() [] >>> wn.synset('laptop.n.01') Synset('laptop.n.01') >>> wn.synset('laptop.n.01').definition 'a portable computer small enough to use in your lap' >>> wn.synset('laptop.n.01').part_meronyms() [] >>> wn.synset('fish.n.01').definition 'any of various mostly cold-blooded aquatic vertebrates usually having scales and breathing through gills' >>> wn.synset('fish.n.01').part_meronyms() [Synset('tail_fin.n.03'), Synset('milt.n.02'), Synset('fin.n.06'), Synset('fishbone.n.01'), Synset('roe.n.02'), Synset('fish_scale.n.01'), Synset('lateral_line.n.01')] >>> wn.synset('fish.n.01').substance_meronyms() [] >>> wn.synset('picture.n.01').definition 'a visual representation (of an object or scene or person or abstraction) produced on a surface' >>> wn.synset('picture.n.01').member_meronyms() [] >>> wn.synset('picture.n.01').part_meronyms() [] >>> wn.synset('picture.n.01').substance_meronyms() [] >>> wn.synset('picture.n.01').member_holonyms() [] >>> wn.synset('picture.n.01').part_holonyms() [] >>> wn.synset('picture.n.01').substance_holonyms() [] >>>
I cannot find a good example...
6.
>>> from nltk.corpus import swadesh >>> de2en = swadesh.entries(['de', 'en']) >>> it2en = swadesh.entries(['it', 'en']) >>> translate2 = dict(de2en) >>> translate2.update(dict(it2en)) >>> len(translate2) 411 >>> translate2['bianco'] 'white' >>> translate2['Hund'] 'dog' >>>
Possible problem could be 'de' should have higher priority if same words (and different meaning in English) exist both in 'de' and 'it'. Maybe should have separate dictionary??? There must be better answers for this question, but lack of my idea...
Try some more.
>>> itonly = swadesh.entries(['it']) >>> sorted(itonly) [('a',), ('acqua',), ('aguzzo, affilato',), ('ala',), ('albero',), ('alcuni',), ('altro',), ('animale',), .... ('uomo',), ('uomo',), ('uovo',), ('vecchio',), ('vedere',), ('venire',), ('vento',), ('verde',), ('verme',), ('vicino',), ('vivere',), ('voi',), ('volare',), ('vomitare',)] >>> translate['uomo'] 'man (human being)' >>> translate['uomo'][1] 'a'
I found some words are duplicated in the dictionary. How to get the second one? The above example is not the right way.
7.
>>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908 >>> text1.concordance('however') Building index... Displaying 25 of 95 matches: gledy - piggledy whale statements , however authentic , in these extracts , for lave ? Tell me that . Well , then , however the old sea - captains may order me ea - captains may order me about -- however they may thump and punch me about , needs be the sign of " The Trap ." However , I picked myself up and hearing a the conclusion that such an idea , however wild , might not be altogether unwa most obstreperously . I observed , however , that one of them held somewhat al ade on the sea . In a few minutes , however , he was missed by his shipmates , bag ' s mouth . This accomplished , however , he turned round -- when , good he te man into a purplish yellow one . However , I had never been in the South Sea tle in the matter of my bedfellow . However , a good laugh is a mighty good thi ight of the water it had absorbed . However , hat and coat and overshoes were o pulpit , it had not escaped me that however convenient for a ship , these joint lf baptized again . For the nonce , however , he proposed to sail about , and s own and comrade ' s bill ; using , however , my comrade ' s money . The grinni in to say it was on the starboard . However , by dint of beating about a little a supper for us both on one clam ?" However , a warm savory steam from the kitc owners till all is ready for sea . However , it is always as well to have a lo fectly as he was known to me then . However , my thoughts were at length carrie I got down our traps , resolving , however , to sleep ashore till the last . B em !" " No need of profane words , however great the hurry , Peleg ," said Bil a pilot . I was comforting myself , however , with the thought that in pious Bi isely -- who knows ? Certain I am , however , that a king ' s head is solemnly o scientific description . As yet , however , the sperm whale , scientific or p IZONTAL TAIL . There you have him . However contracted , that definition is the several varieties , most of which , however , are little known . Broad - nosed >>> >>> text2.concordance('however') Displaying 25 of 155 matches: hters . He meant not to be unkind , however , and , as a mark of his affection e condition of visitors . As such , however , they were treated by her with qui le ." His wife hesitated a little , however , in giving her consent to this pla urned Mrs . John Dashwood . " But , however , ONE thing must be considered . Wh can ever afford to live in . But , however , so it is . Your father thought on ce inquiry or remark . Conversation however was not wanted , for Sir John was v sary to the happiness of both ; for however dissimilar in temper and outward be al engagements at home and abroad , however , supplied all the deficiencies of s silent and grave . His appearance however was not unpleasing , in spite of hi n their own house . One consolation however remained for them , to which the ex in the country ? That is good news however ; I will ride over tomorrow , and a ever so rich . I am glad to find , however , from what you say , that he is a t to the excellence of such works , however disregarded before . Their taste wa ly excited by her sister ; and that however a general resemblance of dispositio d Marianne . " Do not boast of it , however ," said Elinor , " for it is injust t will be any satisfaction to you , however , to be told , that I believe his c wo wives , I know not . A few years however will settle her opinions on the rea n his side impossible . His concern however was very apparent ; and after expre d her husband and mother . The idea however started by her , was immediately pu are determined on anything . But , however , I hope you will think better of i I can guess what his business is , however ," said Mrs . Jennings exultingly . o unfortunate an event ; concluding however by observing , that as they were al r . Willoughby ." " Mr . Willoughby however is the only person who can have a r sed in him . There is great truth , however , in what you have now urged of the iced by him ." " Do not blame him , however , for departing from his character
It looks like most of case, "However" is not located at the begging of the sentences. Even thought it is at the beginning of the sentence, the meaning is something like 'but' or 'although'.