Exercise: Chapter 2 (20-22)
20.
>>> def word_freq(word, section): ... fdist = FreqDist([w for w in nltk.corpus.brown.words(categories=section)]) ... return fdist.__getitem__(word) ... >>> word_freq('love', 'romance') 32 >>> word_freq('city', 'government') 7 >>> word_freq('train', 'adventure') 10
21.
Need to consider words with multiple pronunciation.
>>> text = ['love', 'new', 'yankee', 'really'] >>> new_entries = [y for x, y in enumerate(nltk.corpus.cmudict.entries()) if y[0] in text] >>> done_ent = [] >>> tot_len = 0 >>> for entry in new_entries: ... if entry[0] not in done_ent: ... tot_len += len(entry[1]) ... done_ent.append(entry[0]) ... >>> tot_len 14 >>> new_entries [('love', ['L', 'AH1', 'V']), ('new', ['N', 'UW1']), ('new', ['N', 'Y', 'UW1']), ('really', ['R', 'IH1', 'L', 'IY0']), ('really', ['R', 'IY1', 'L', 'IY0']), ('yankee', ['Y', 'AE1', 'NG', 'K', 'IY0'])] >>>
The result seems correct. (3 + 2 + 4 + 5 = 14)
Define function and process with samples.
>>> def CountSE(text): ... new_entries = [y for x, y in enumerate(nltk.corpus.cmudict.entries()) if y[0] in text] ... done_ent = [] ... tot_len = 0 ... for entry in new_entries: ... if entry[0] not in done_ent: ... tot_len += len(entry[1]) ... done_ent.append(entry[0]) ... return tot_len ... >>> CountSE(text) 14 >>> CountSE(fdist.samples()) 36162
22.
Not sure I correctly understand the question, but I create like this.
>>> text = fdist.samples()[50] >>> def hedge(text): ... new_text = [] ... counter = 1 ... for word in text: ... new_text.append(word) ... if counter % 3 == 0: ... new_text.append('like') ... counter = counter + 1 ... return new_text ... >>> hedge(text) [',', '.', 'the', 'like', 'and', 'to', 'a', 'like', 'of', '``', "''", 'like', 'was', 'I', 'in', 'like', 'he', 'had', '?', 'like', 'her', 'that', 'it', 'like', 'his', 'she', 'with', 'like', 'you', 'for', 'at', 'like', 'He', 'on', 'him', 'like', 'said', '!', '--', 'like', 'be', 'as', ';', 'like', 'have', 'but', 'not', 'like', 'would', 'She', 'The', 'like', 'out', 'were', 'up', 'like', 'all', 'from', 'could', 'like', 'me', 'like', 'been', 'like', 'so', 'there'] >>>
I will try remaining questions (23-) after going through entire the book.