Combining Different Sequence Types (4.2.2-4.2.3)
Let's continue.
>>> words = 'I turned off the spectroroute'.split() >>> wordlens = [(len(word), word) for word in words] >>> wordlens.sort() >>> ' '.join(w for(_, w) in wordlens) 'I off the turned spectroroute' >>>
The first line is to split into each words from sentence. Then generate a tuple for length of the words and words themselves. The next step is to sort by length. At the end, joined with space( ) each word. Underscore (_) here to indicate that not to use the value.
>>> lexicon = [ ... ('the', 'det', ['Di:', 'D@']), ... ('off', 'prep', ['Qf', 'O:f']) ... ] >>> lexicon [('the', 'det', ['Di:', 'D@']), ('off', 'prep', ['Qf', 'O:f'])] >>>
list vs tuple: list is mutable, but tuple is immutable.
>>> lexicon.sort() >>> lexicon[1] = ('turned', 'VBD', ['t3:nd', 't3`nd']) >>> del lexicon[0] >>> lexicon [('turned', 'VBD', ['t3:nd', 't3`nd'])] >>> lexicon = tuple(lexicon) >>> lexicon[1] = ('turned', 'ADJ', ['t3:nd', 't31nd']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment >>>
Generator Expressions:
>>> text = '''"When I use a word," Humpty Dumpty said in rather a scornful tone, ... "it means just what I chooseit to mean - neither more nor less."''' >>> [w.lower() for w in nltk.word_tokenize(text)] ['``', 'when', 'i', 'use', 'a', 'word', ',', "''", 'humpty', 'dumpty', 'said', 'in', 'rather', 'a', 'scornful', 'tone', ',', "''", 'it', 'means', 'just', 'what', 'i', 'chooseit', 'to', 'mean', '-', 'neither', 'more', 'nor', 'less', '.', "''"] >>> max([w.lower() for w in nltk.word_tokenize(text)]) 'word' >>> max(w.lower() for w in nltk.word_tokenize(text)) 'word'
[] is omitted in the second one. This will contribute runtime improvement if handling huge data size.