Accumulative Functions (4.5.3)

Accumulative Functions at Chapter 4.5.3 of the whale book.

filter() takes 2 parameters. The first one takes other functions and the second one is sequential data. The return will be elements in the sequential data which returns True by the function at the first parameter.

>>> def is_content_word(word):
...     return word.lower() not in ['a', 'of', 'the', 'and', 'will', ',', '.']  ... 
>>> filter(is_content_word, sent)
['Take', 'care', 'sense', 'sounds', 'take', 'care', 'themselves']
>>> 

Using map().

>>> length = map(len, nltk.corpus.brown.sents(categories='news'))
>>> sum(length) / len(length)
21.75081116158339
>>> 
>>> length = [len(w) for w in nltk.corpus.brown.sents(categories='news')]
>>> sum(length) / len(length)
21.75081116158339

The second one is not to use map(). Which is simpler?

>>> map(lambda w: len(filter(lambda c: c.lower() in "aeiou", w)), sent)
[2, 2, 1, 1, 2, 0, 1, 1, 2, 1, 2, 2, 1, 3, 0]
>>> [len([c for c in w if c.lower() in "aeiou"]) for w in sent]
[2, 2, 1, 1, 2, 0, 1, 1, 2, 1, 2, 2, 1, 3, 0]
>>> 

This example above is to use lambda. To tell the truth, the second one is easier to understand for me.

It is possible that each parameter has a name like this.

>>> def repeat(msg='<empty>', num=1):
...     return msg * num
... 
>>> repeat (num=3)
<empty><empty><empty>'
>>> repeat (msg='Alice')
'Alice'
>>> repeat (num=5, msg='Alice')
'AliceAliceAliceAliceAlice'
>>>

When *args is used at the parameter of functions, this receives all parameters without name. **kwargs is keyword parameter dictionally which collect all keyword parameter names and values.

>>> def generic(*args, **kwargs):
...     print args
...     print kwargs
... 
>>> generic(1, "African swallow", monty="Python")
(1, 'African swallow')
{'monty': 'Python'} 

This example is also interesting. *song is used instead of song[0], song[1], song[2].

>>> song = [['four', 'calling', 'birds'],
...     ['three', 'French', 'hens'],
...     ['two', 'turtle', 'doves']]
>>> zip(song[0], song[1], song[2])
[('four', 'three', 'two'), ('calling', 'French', 'turtle'), ('birds', 'hens', 'doves')]
>>> zip(*song)
[('four', 'three', 'two'), ('calling', 'French', 'turtle'), ('birds', 'hens', 'doves')]

This indicates 3 types of calling functions. The rule should be parameter without name must come before parameters with name. The sequence of parameters with name is not so important if it is written like "(parameter name) = (value)".

>>> def freq_words(file, min=1, num=10):
...     text = open(file).read()
...     tokens = nltk.word_tokenize(text)
...     freqdist = nltk.FreqDist(t for t in tokens if len(t) >= min)
...     return freqdist.keys()[:num]
... 
>>> fw = freq_words('ch01.rst', 4 ,10)
>>> fw = freq_words('ch01.rst', min=4, num=10)
>>> fw = freq_words('ch01.rst', num=10, min=4)