Evaluation (6.3)
The Test Set (6.3.1)
>>> import random >>> from nltk.corpus import brown >>> tagged_sents = list(brown.tagged_sents(categories='news')) >>> random.shuffle(tagged_sents) >>> size = int(len(tagged_sents) * 0.1) >>> train_set, test_set = tagged_sents[size:], tagged_sents[:size]
The problem of this one is that train_set and test_set are too close. In other words, too similar. If the sentences from a same document exist both in train_set and test_set, we cannot tell the evaluation result is reasonable.
Therefore better to make sure train_set and test_set should be taken from different documents.
>>> file_ids = brown.fileids(categories='news') >>> size = int(len(file_ids) * 0.1) >>> train_set = brown.tagged_sents(file_ids[size:]) >>> test_set = brown.tagged_sents(file_ids[:size])
Another example in the text book is to extract from other categories.
>>> train_set = brown.tagged_sents(categories='news') >>> test_set = brown.tagged_sents(categories='fiction')
In my impression, this works only when creating not-category specific features.
Accuracy (6.3.2)
>>> classifier = nltk.NaiveBayesClassifier.train(train_set) .... >>> print 'Accuracy: %4.2f' % nltk.classify.accuracy(classifier, test_set) ....
Already tried Accuracy in previous sections.
Confusion Matrices (6.3.4)
>>> def tag_list(tagged_sents): ... return [tag for sent in tagged_sents for (word, tag) in sent] ... >>> def apply_tagger(tagger, corpus): ... return [tagger.tag(nltk.tag.untag(sent)) for sent in corpus] ... >>> gold = tag_list(brown.tagged_sents(categories='editorial'))
Define t2 (from section 5.4 Unigram tagger) here.
>>> t2 = nltk.UnigramTagger(brown.tagged_sents(categories='editorial')) >>> test = tag_list(apply_tagger(t2, brown.tagged_sents(categories='editorial'))) >>> cm = nltk.ConfusionMatrix(gold, test) >>> cm <ConfusionMatrix: 57382/61604 correct>
As extremely huge matrix was displayed, adjust a little bit.
>>> gold = tag_list(brown.tagged_sents(categories='editorial', simplify_tags=True)) >>> t2 = nltk.UnigramTagger(brown.tagged_sents(categories='editorial', simplify_tags=True)) >>> test = tag_list(apply_tagger(t2, brown.tagged_sents(categories='editorial'))) >>> cm = nltk.ConfusionMatrix(gold, test) >>> print cm | V | | B | | + | | A A C D M N P P V | | ' D D N E E F O N U R T U P B V V V W ` | | ' ' ( ) * , . : J V J T X W D N P M P O O H V O Z D G N H ` | -------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | <192> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | ' | . <19> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | '' | . . <382> . . . . . . . . . . . . . . . . . . . . . . . . . . . . | ( | . . . <95> . . . . . . . . . . . . . . . . . . . . . . . . . . . | ) | . . . . <95> . . . . . . . . . . . . . . . . . . . . . . . . . . | * | . . . . . <307> . . . . . . . . . . . . . . . . . . . . . . . . . | , | . . . . . . <2766> . . . . . . . . . . . . . . . . . . . . . . . . | . | . . . . . . . <3001> 8 . . . . . . . . . . . . . . . . . . . . . . | : | . . . . . . . . <145> . . . . . . . . . . . . . . . . . . . . . . | ADJ | . . . . . . . . . <4105> 17 1 20 . . . 64 17 . 1 . . . 5 . . . 1 4 . . | ADV | . . . . . . . . . 51 <1800> 19 104 23 . . . . 14 94 . . 1 . . . . . . 2 . | CNJ | . . . . . . . . . . 19 <2971> 17 . . . . . . 52 . . . . . . . . . . . | DET | . . . . . . . . . 21 87 162 <7365> . . . 5 . . 7 1 . . . . . . . . 9 . | EX | . . . . . . . . . . . . . <101> . . . . . . . . . . . . . . . . . | FW | . . . . . . . . . . . . . . <43> . . 1 . . . . . . . . . . . . . | MOD | . . . . . . . . . . . . . . . <933> . 1 . . . . . 1 . . . . . . . | N | . . . . . . . . . 111 11 2 2 . . 9<12065> 17 . 4 . . . 172 . 37 1 38 6 . . | NP | . . . . . . . . 1 33 . . 2 . 1 . 10 <2539> . . . . . . . . . . . . . | NUM | . . . . . . . . . . . . . . . . . . <835> . . . . . . . . . . . . | P | . . . . . . . . . . 44 76 4 . . . 1 . . <5679> . 612 . . . . . . . . . | PRO | . . . . . . . . . . . . . . . . . . 23 . <2927> . . . . . . . . . . | TO | . . . . . . . . . . . . . . . . . . . 12 . <942> . . . . . . . . . | UH | . . . . . . . . . . . . . . . . . . . . . . <8> 3 . . . . . 1 . | V | . . . . . . . . . 22 3 19 5 . . 2 213 . . . . . . <4935> . . 1 . 1 . . | VB+PPO | . . . . . . . . . . . . . . . . . . . . . . . . <1> . . . . . . | VBZ | . . . . . . . . . . . . . . . . 27 . . . . . . . . <537> . . . . . | VD | . . . . . . . . . . . . . . . . 2 . . . . . . 8 . . <470> . 221 . . | VG | . . . . . . . . . . . . . . . . 24 . . 2 . . . . . . . <880> . . . | VN | . . . . . . . . . . . . . . . . 7 . . . . . . 29 . . 71 . <1481> . . | WH | . . . . . . . . . . . 97 . . . . . . . . . . . . . . . . . <773> . | `` | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <396>| -------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ (row = reference; col = test)
I don't know why only 9 tags are selected in the textbook's sample.