Evaluation (6.3) - Deutschina's Tech Diary

The Test Set (6.3.1)

>>> import random
>>> from nltk.corpus import brown
>>> tagged_sents = list(brown.tagged_sents(categories='news'))
>>> random.shuffle(tagged_sents)
>>> size = int(len(tagged_sents) * 0.1)
>>> train_set, test_set = tagged_sents[size:], tagged_sents[:size]

The problem of this one is that train_set and test_set are too close. In other words, too similar. If the sentences from a same document exist both in train_set and test_set, we cannot tell the evaluation result is reasonable.

Therefore better to make sure train_set and test_set should be taken from different documents.

>>> file_ids = brown.fileids(categories='news')
>>> size = int(len(file_ids) * 0.1)
>>> train_set = brown.tagged_sents(file_ids[size:])
>>> test_set = brown.tagged_sents(file_ids[:size])

Another example in the text book is to extract from other categories.

>>> train_set = brown.tagged_sents(categories='news')
>>> test_set = brown.tagged_sents(categories='fiction')

In my impression, this works only when creating not-category specific features.

Accuracy (6.3.2)

>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
....
>>> print 'Accuracy: %4.2f' % nltk.classify.accuracy(classifier, test_set)
....

Already tried Accuracy in previous sections.

Confusion Matrices (6.3.4)

>>> def tag_list(tagged_sents):
...     return [tag for sent in tagged_sents for (word, tag) in sent]
... 
>>> def apply_tagger(tagger, corpus):
...     return [tagger.tag(nltk.tag.untag(sent)) for sent in corpus]
... 
>>> gold = tag_list(brown.tagged_sents(categories='editorial'))

Define t2 (from section 5.4 Unigram tagger) here.

>>> t2 = nltk.UnigramTagger(brown.tagged_sents(categories='editorial'))
>>> test = tag_list(apply_tagger(t2, brown.tagged_sents(categories='editorial')))
>>> cm = nltk.ConfusionMatrix(gold, test)                                       
>>> cm
<ConfusionMatrix: 57382/61604 correct>

As extremely huge matrix was displayed, adjust a little bit.

>>> gold = tag_list(brown.tagged_sents(categories='editorial', simplify_tags=True))
>>> t2 = nltk.UnigramTagger(brown.tagged_sents(categories='editorial', simplify_tags=True))
>>> test = tag_list(apply_tagger(t2, brown.tagged_sents(categories='editorial')))
>>> cm = nltk.ConfusionMatrix(gold, test)
>>> print cm                                                                           |                                                                                                                                                     V                                     |
       |                                                                                                                                                     B                                     |
       |                                                                                                                                                     +                                     |
       |                                                           A     A     C     D                 M                 N           P                       P     V                               |
       |                 '                                         D     D     N     E     E     F     O           N     U           R     T     U           P     B     V     V     V     W     ` |
       |           '     '     (     )     *     ,     .     :     J     V     J     T     X     W     D     N     P     M     P     O     O     H     V     O     Z     D     G     N     H     ` |
-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
       |  <192>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
     ' |     .   <19>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
    '' |     .     .  <382>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
     ( |     .     .     .   <95>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
     ) |     .     .     .     .   <95>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
     * |     .     .     .     .     .  <307>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
     , |     .     .     .     .     .     . <2766>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
     . |     .     .     .     .     .     .     . <3001>    8     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
     : |     .     .     .     .     .     .     .     .  <145>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
   ADJ |     .     .     .     .     .     .     .     .     . <4105>   17     1    20     .     .     .    64    17     .     1     .     .     .     5     .     .     .     1     4     .     . |
   ADV |     .     .     .     .     .     .     .     .     .    51 <1800>   19   104    23     .     .     .     .    14    94     .     .     1     .     .     .     .     .     .     2     . |
   CNJ |     .     .     .     .     .     .     .     .     .     .    19 <2971>   17     .     .     .     .     .     .    52     .     .     .     .     .     .     .     .     .     .     . |
   DET |     .     .     .     .     .     .     .     .     .    21    87   162 <7365>    .     .     .     5     .     .     7     1     .     .     .     .     .     .     .     .     9     . |
    EX |     .     .     .     .     .     .     .     .     .     .     .     .     .  <101>    .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     . |
    FW |     .     .     .     .     .     .     .     .     .     .     .     .     .     .   <43>    .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     . |
   MOD |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .  <933>    .     1     .     .     .     .     .     1     .     .     .     .     .     .     . |
     N |     .     .     .     .     .     .     .     .     .   111    11     2     2     .     .     9<12065>   17     .     4     .     .     .   172     .    37     1    38     6     .     . |
    NP |     .     .     .     .     .     .     .     .     1    33     .     .     2     .     1     .    10 <2539>    .     .     .     .     .     .     .     .     .     .     .     .     . |
   NUM |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .  <835>    .     .     .     .     .     .     .     .     .     .     .     . |
     P |     .     .     .     .     .     .     .     .     .     .    44    76     4     .     .     .     1     .     . <5679>    .   612     .     .     .     .     .     .     .     .     . |
   PRO |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    23     . <2927>    .     .     .     .     .     .     .     .     .     . |
    TO |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    12     .  <942>    .     .     .     .     .     .     .     .     . |
    UH |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    <8>    3     .     .     .     .     .     1     . |
     V |     .     .     .     .     .     .     .     .     .    22     3    19     5     .     .     2   213     .     .     .     .     .     . <4935>    .     .     1     .     1     .     . |
VB+PPO |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    <1>    .     .     .     .     .     . |
   VBZ |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    27     .     .     .     .     .     .     .     .  <537>    .     .     .     .     . |
    VD |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     2     .     .     .     .     .     .     8     .     .  <470>    .   221     .     . |
    VG |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    24     .     .     2     .     .     .     .     .     .     .  <880>    .     .     . |
    VN |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     7     .     .     .     .     .     .    29     .     .    71     . <1481>    .     . |
    WH |     .     .     .     .     .     .     .     .     .     .     .    97     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .  <773>    . |
    `` |     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .  <396>|
-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(row = reference; col = test)

I don't know why only 9 tags are selected in the textbook's sample.