Combine Taggers (5.5.4-5.5.5)
In case tags cannot be assigned, it is possible to switch to more general tagger by using backoff option.
>>> t0 = nltk.DefaultTagger('NN') >>> t1 = nltk.UnigramTagger(train_sents, backoff=t0) >>> t2 = nltk.BigramTagger(train_sents, backoff=t1) >>> t2.evaluate(test_sents) 0.8447124489185687 >>> t3 = nltk.TrigramTagger(train_sents, backoff=t2) >>> t3.evaluate(test_sents) 0.8423203428685339
Saving Tagger (5.5.6):
>>> from cPickle import dump >>> output = open('t2.pkl', 'wb') >>> dump(t2, output, -1) >>> output.close() >>> >>> from cPickle import load >>> input = open('t2.pkl', 'rb') >>> tagger = load(input) >>> input.close() >>> >>> text = """The board's action shows what free enterprise ... is up against in our complex maze of regulatory laws . """ >>> tokens = text.split() >>> tagger.tag(tokens) [('The', 'AT'), ("board's", 'NN$'), ('action', 'NN'), ('shows', 'NNS'), ('what', 'WDT'), ('free', 'JJ'), ('enterprise', 'NN'), ('is', 'BEZ'), ('up', 'RP'), ('against', 'IN'), ('in', 'IN'), ('our', 'PP$'), ('complex', 'JJ'), ('maze', 'NN'), ('of', 'IN'), ('regulatory', 'NN'), ('laws', 'NNS'), ('.', '.')] >>>