Sequence (4.2.1)

Chapter 4 of the whale book looks like grammar review of pythons. Go to next section 4.2.

Using tuple.

>>> t = 'walk', 'fem', 3
>>> t
('walk', 'fem', 3)
>>> t[0]
'walk'
>>> t[1:]
('fem', 3)
>>> len(t)
3
>>> raw = 'I truned off the spectroroute'
>>> text = ['I', 'turned', 'off', 'the', 'spectroroute']
>>> pair = (6, 'turned')
>>> raw[2], text[3], pair[1]
('t', 'the', 'turned')
>>> raw[-3:], text[-3:], pair[-3:]
('ute', ['off', 'the', 'spectroroute'], (6, 'turned'))
>>> len(raw), len(text), len(pair)
(29, 5, 2)

What's happen when generating set?

>>> set(text)
set(['I', 'off', 'the', 'turned', 'spectroroute'])
>>> set(raw)
set([' ', 'c', 'e', 'd', 'f', 'I', 'h', 'o', 'n', 'p', 's', 'r', 'u', 't'])
>>> set(pair)
set(['turned', 6])
>>>

Tuple is not editable.

>>> pair[1] = 7
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> text[0] = 'you'
>>> text
['you', 'turned', 'off', 'the', 'spectroroute', '7']
>>> 

Using list():

>>> raw = 'Red lorry, yellow lorry, red lorry, yellow lorry.'
>>> text = nltk.word_tokenize(raw)
>>> fdist = nltk.FreqDist(text)
>>> list(fdist)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable
>>> 
>>> for key in fdist:
...     print fdist[key],
... 
4 3 2 1 1 1

I got an error when tried list(fdist). I know a workaround.

>>> fdist.keys()
['lorry', ',', 'yellow', '.', 'Red', 'red']

This example is to change the sequence of the elements.

>>> words = ['I', 'turned', 'off', 'the', 'spectroroute']
>>> words[2], words[3], words[4] = words[3], words[4], words[2]
>>> words
['I', 'turned', 'the', 'spectroroute', 'off']
>>> 
>>> tmp = words[2]
>>> words[2] = words[3]
>>> words[3] = words[4]
>>> words[4] = tmp
>>> words
['I', 'turned', 'spectroroute', 'off', 'the']

zip()

>>> words = ['I', 'turned', 'off', 'the', 'spectroroute']
>>> tags = ['noun', 'verb', 'prep', 'det', 'noun']
>>> zip(words, tags)
[('I', 'noun'), ('turned', 'verb'), ('off', 'prep'), ('the', 'det'), ('spectroroute', 'noun')]

What's happen if number of elements are not the same?

>>> words.append('.')
>>> zip(words, tags)
[('I', 'noun'), ('turned', 'verb'), ('off', 'prep'), ('the', 'det'), ('spectroroute', 'noun')]
||< 

However, I still got an error when using <strong>list()</strong>. Still something missing?

>|python|
>>> list(enumerate(words))
Traceback (most recent call last):
  File "&lt;stdin>", line 1, in &lt;module>
TypeError: 'list' object is not callable

Splitting list into two in this example.

>>> text = nltk.corpus.nps_chat.words()
>>> cut = int(0.9 * len(text))
>>> training_data, test_data = text[:cut], text[cut:]
>>> text == training_data + test_data
True
>>> len(training_data) / len(test_data)
9.0

Note:
I tired the same operation in other environment. I did not get any errors when to use list(). The reason I got errors in my Mountain Lion (Mac) environment should be a variable with same name (list) is already used and the system rocognized "list" as a variable.

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re, nltk, sys
>>> from __future__ import division
>>>
>>> raw = 'Red lorry, yellow lorry, red lorry, yellow lorry.'
>>> text = nltk.word_tokenize(raw)
>>> fdist = nltk.FreqDist(text)
>>> list(fdist)
['lorry', ',', 'yellow', '.', 'Red', 'red']
>>> words = ['I', 'turned', 'off', 'the', 'spectroroute']
>>> tags = ['noun', 'verb', 'prep', 'det', 'noun']
>>> zip(words, tags)
[('I', 'noun'), ('turned', 'verb'), ('off', 'prep'), ('the', 'det'), ('spectroro
ute', 'noun')]
>>> list(enumerate(words))
[(0, 'I'), (1, 'turned'), (2, 'off'), (3, 'the'), (4, 'spectroroute')]
>>>