Structure of Python module (4.6)

It took some time to find out the source code in my laptop when I faced strange behavior because I did't know this command.

>>> nltk.metrics.distance.__file__

To get help of the source. Already learned this before.

>>> help(nltk.metrics.distance)

Got a following document:

Help on module nltk.metrics.distance in nltk.metrics:

nltk.metrics.distance - Distance Metrics.


Compute the distance between two items (usually strings).
As metrics, they must satisfy the following three requirements:

1. d(a, a) = 0
2. d(a, b) >= 0
3. d(a, c) >> from nltk.metrics import binary_distance

>>> binary_distance(1,1)

>>> binary_distance(1,3)



edit_distance(s1, s2)
Calculate the Levenshtein edit-distance between two strings.
The edit distance is the number of characters that need to be
substituted, inserted, or deleted, to transform s1 into s2. For
example, transforming "rain" to "shine" requires three steps,
consisting of two substitutions and one insertion:
"rain" -> "sain" -> "shin" -> "shine". These operations could have
been done in other orders, but at least three steps are needed.

:param s1, s2: The strings to be analysed
:type s1: str
:type s2: str
:rtype int

If the name of functions or variables start with underscore(_), those are not imported by import *.

from module import *

Function find_words() has 3 parameters. As the 3rd parameter result has default value (=[]), this can be omitted when calling.

>>> def find_words(text, wordlength, result=[]):
...     for word in text:
...             if len(word) == wordlength:
...                     result.append(word)
...     return result
>>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],3)
['omg', 'teh', 'teh', 'mat']
>>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],2,['ur'])
['ur', 'on']
>>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],3)
['omg', 'teh', 'teh', 'mat', 'omg', 'teh', 'teh', 'mat']

At the first call, "result" is omitted and blank list was generated. The second one, ['ur'] is provided and adding an entry into the existing list. The last one, "result" is omitted again but a new blank list was NOT generated, the list generated at the first call was reused. As a result, entries are duplicated.

If I think about real programming situation and call the same function multiple times, I would not omit the parameter but do like do this.

>>> result_l = []
>>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],3, result_l)
['omg', 'teh', 'teh', 'mat']
>>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],2,result_l)
['omg', 'teh', 'teh', 'mat', 'on']
>>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],3, result_l)
['omg', 'teh', 'teh', 'mat', 'on', 'omg', 'teh', 'teh', 'mat']

Or need to separate, do like this:

>>> result_l = []                                                               >>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],3, result_l)
['omg', 'teh', 'teh', 'mat']
>>> result_m = ['ur']                                                           >>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],2, result_m)
['ur', 'on']
>>> result_n = []
>>> find_words(['omg', 'teh', 'lolcat', 'sitted', 'on', 'teh', 'mat'],3, result_n)
['omg', 'teh', 'teh', 'mat']

How to debug:

Use pdb.

>>> import pdb
>>> find_words(['cat'],3)
['omg', 'teh', 'teh', 'mat', 'omg', 'teh', 'teh', 'mat', 'omg', 'teh', 'teh', 'mat', 'cat']
> <string>(1)<module>()
(Pdb) step
> <stdin>(1)find_words()
(Pdb) args
text = ['dog']
wordlength = 3
result = ['omg', 'teh', 'teh', 'mat', 'omg', 'teh', 'teh', 'mat', 'omg', 'teh', 'teh', 'mat', 'cat']
(Pdb) next
> <stdin>(2)find_words()
(Pdb) continue