Sequence (4.2.1)
Chapter 4 of the whale book looks like grammar review of pythons. Go to next section 4.2.
Using tuple.
>>> t = 'walk', 'fem', 3 >>> t ('walk', 'fem', 3) >>> t[0] 'walk' >>> t[1:] ('fem', 3) >>> len(t) 3 >>> raw = 'I truned off the spectroroute' >>> text = ['I', 'turned', 'off', 'the', 'spectroroute'] >>> pair = (6, 'turned') >>> raw[2], text[3], pair[1] ('t', 'the', 'turned') >>> raw[-3:], text[-3:], pair[-3:] ('ute', ['off', 'the', 'spectroroute'], (6, 'turned')) >>> len(raw), len(text), len(pair) (29, 5, 2)
What's happen when generating set?
>>> set(text) set(['I', 'off', 'the', 'turned', 'spectroroute']) >>> set(raw) set([' ', 'c', 'e', 'd', 'f', 'I', 'h', 'o', 'n', 'p', 's', 'r', 'u', 't']) >>> set(pair) set(['turned', 6]) >>>
Tuple is not editable.
>>> pair[1] = 7 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment >>> text[0] = 'you' >>> text ['you', 'turned', 'off', 'the', 'spectroroute', '7'] >>>
Using list():
>>> raw = 'Red lorry, yellow lorry, red lorry, yellow lorry.' >>> text = nltk.word_tokenize(raw) >>> fdist = nltk.FreqDist(text) >>> list(fdist) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'list' object is not callable >>> >>> for key in fdist: ... print fdist[key], ... 4 3 2 1 1 1
I got an error when tried list(fdist). I know a workaround.
>>> fdist.keys() ['lorry', ',', 'yellow', '.', 'Red', 'red']
This example is to change the sequence of the elements.
>>> words = ['I', 'turned', 'off', 'the', 'spectroroute'] >>> words[2], words[3], words[4] = words[3], words[4], words[2] >>> words ['I', 'turned', 'the', 'spectroroute', 'off'] >>> >>> tmp = words[2] >>> words[2] = words[3] >>> words[3] = words[4] >>> words[4] = tmp >>> words ['I', 'turned', 'spectroroute', 'off', 'the']
zip()
>>> words = ['I', 'turned', 'off', 'the', 'spectroroute'] >>> tags = ['noun', 'verb', 'prep', 'det', 'noun'] >>> zip(words, tags) [('I', 'noun'), ('turned', 'verb'), ('off', 'prep'), ('the', 'det'), ('spectroroute', 'noun')]
What's happen if number of elements are not the same?
>>> words.append('.') >>> zip(words, tags) [('I', 'noun'), ('turned', 'verb'), ('off', 'prep'), ('the', 'det'), ('spectroroute', 'noun')] ||< However, I still got an error when using <strong>list()</strong>. Still something missing? >|python| >>> list(enumerate(words)) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'list' object is not callable
Splitting list into two in this example.
>>> text = nltk.corpus.nps_chat.words() >>> cut = int(0.9 * len(text)) >>> training_data, test_data = text[:cut], text[cut:] >>> text == training_data + test_data True >>> len(training_data) / len(test_data) 9.0
Note:
I tired the same operation in other environment. I did not get any errors when to use list(). The reason I got errors in my Mountain Lion (Mac) environment should be a variable with same name (list) is already used and the system rocognized "list" as a variable.
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import re, nltk, sys >>> from __future__ import division >>> >>> raw = 'Red lorry, yellow lorry, red lorry, yellow lorry.' >>> text = nltk.word_tokenize(raw) >>> fdist = nltk.FreqDist(text) >>> list(fdist) ['lorry', ',', 'yellow', '.', 'Red', 'red'] >>> words = ['I', 'turned', 'off', 'the', 'spectroroute'] >>> tags = ['noun', 'verb', 'prep', 'det', 'noun'] >>> zip(words, tags) [('I', 'noun'), ('turned', 'verb'), ('off', 'prep'), ('the', 'det'), ('spectroro ute', 'noun')] >>> list(enumerate(words)) [(0, 'I'), (1, 'turned'), (2, 'off'), (3, 'the'), (4, 'spectroroute')] >>>