Identifying Dialogue Act Types (6.2.2)

>>> from nltk_init import *
>>> posts = nltk.corpus.nps_chat.xml_posts()[:10000]
>>> def dialogue_act_features(post):
...     features = {}
...     for word in nltk.word_tokenize(post):
...             features['contains(%s)' % word.lower()] = True
...     return features
... 
>>> features = [(dialogue_act_features(post.text), post.get('class'))
...             for post in posts]
>>> size = int(len(features) * 0.1)
>>> train_set, test_set = features[:size], features[size:]
>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
>>> print nltk.classify.accuracy(classifier, test_set)
0.516333333333

What is inside in features?

>>> features
[({'contains(im)': True, 'contains(now)': True, 'contains(this)': True, 'contains(left)': True, 'contains(name)': True, 'contains(with)': True, 'contains(gay)': True}, 'Statement'), ({'contains(:)': True, 'contains(p)': True}, 'Emotion'), ({'contains(part)': True}, 'System'),
....

For each post, class is assigned like 'Statement', 'Emotion' or 'System'. Therefore this featureset is for assuming the class from contained words. Not sure the score (0.51633...) is good one or not.