Check parameter type (4.4.4-4.4.6)
It is not necessary to declare type of variables in Python. As a result, unexpected behavior might happen.
>>> def tag(word): ... if word in ['a', 'the', 'all']: ... return 'det' ... else: ... return 'noun' ... >>> tag('the') 'det' >>> tag('aho') 'noun' >>> tag(['aho', 'new', 'old']) 'noun'
This is the revised version of the same function. In case the parameter value is neither string nor unicode, an error is triggered.
>>> def tag(word): ... assert isinstance(word, basestring), "argument to tag() must be a string" ... if word in ['a', 'the', 'all']: ... return 'det' ... else: ... return 'noun' ... >>> tag(['aho', 'new', 'old']) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in tag AssertionError: argument to tag() must be a string >>> tag(u'モリー') 'noun'
Another bad example:
>>> def freq_words(url, freqdist, n): ... text = nltk.clean_url(url) ... for word in nltk.word_tokenize(text): ... freqdist.inc(word.lower()) ... print freqdist.keys()[:n] ... >>> constitution = "http://www.archives.gov/national-archives-experience"\ ... "/charters/constitution_transcript.html" >>> fd = nltk.FreqDist() >>> freq_words(constitution, fd, 20) ['the', 'of', 'charters', ',', 'bill', 'constitution', 'declaration', 'rights', '-', '.', 'freedom', 'impact', 'independence', 'making']
Why nltk.FreqDist() is called outside of the function? Then change it.
>>> def freq_words(url): ... freqdist = nltk.FreqDist() ... text = nltk.clean_url(url) ... for word in nltk.word_tokenize(text): ... freqdist.inc(word.lower()) ... return freqdist ... >>> fd = freq_words(constitution) >>> print fd.keys()[:20] ['the', 'of', 'charters', ',', 'bill', 'constitution', 'declaration', 'rights', '-', '.', 'freedom', 'impact', 'independence', 'making']
This link is the general rule of function documentations.
http://www.python.org/dev/peps/pep-0257/