From list to string (3.9) - Deutschina's Tech Diary

Now in chapter 3.9 in the whale book.

How to use join().

>>> silly = ['We', 'called', 'him', 'Tortoise', 'because', 'he', 'taught', 'us', '.']
>>> ' '.join(silly)
'We called him Tortoise because he taught us .'
>>> ';'.join(silly)
'We;called;him;Tortoise;because;he;taught;us;.'
>>> ''.join(silly)
'WecalledhimTortoisebecausehetaughtus.'
>>>

Nothing new to me.

>>> word = 'cat'
>>> sentence = """hello
... world"""
>>> print word
cat
>>> print sentence
hello
world
>>> word
'cat'
>>> sentence
'hello\nworld'
>>>

I have already learned the result difference when using print statement in the past.

>>> fdist = nltk.FreqDist(['dog', 'cat', 'dog', 'cat', 'dog', 'snake', 'dog', 'cat'])
>>> for word in fdist:
...     print word, '->', fdist[word], ';',
... 
dog -> 4 ; cat -> 3 ; snake -> 1 ;
>>> for word in fdist:
...     print '%s->%d;' % (word, fdist[word]),
... 
dog->4; cat->3; snake->1;
>>>

One of the reasons why I like this book is that it let me try something first even if I don't understand the meaning, then later part I can get a detailed explanation.

Actually it was hard to understand how to use '%' in print statement when I saw first time in the earlier section of the textbook. What I could at that time was to assume it.

Now I can check my assumption was correct or not.

>>> '%s->%d;' % ('cat', 3)
'cat->3;'
>>> '%s->%d;' % 'cat'
Traceback (most recent call last):
  File "&lt;stdin>", line 1, in &lt;module>
TypeError: not enough arguments for format string
>>> '%s->' % 'cat'
'cat->'
>>> '%d' % 3
'3'
>>> 'I want a %s right now' % 'coffee'
'I want a coffee right now'
>>> "%s wants a %s %s" % ("Lee", "sandwich", "for lunch")
'Lee wants a sandwich for lunch'
>>> template = 'Lee wants a %s right now'
>>> menu = ['sandwich', 'spam fritter', 'pancake']
>>> for snack in menu:
...     print template % snack
... 
Lee wants a sandwich right now
Lee wants a spam fritter right now
Lee wants a pancake right now
>>>

To specify the width of the field, we can use following way.

>>> '%6s' % 'dog'
'   dog'
>>> '%-6s' % 'dog'
'dog   '
>>> width = 6
>>> '%-*s' % (width, 'dog')
'dog   '
>>> '%*s' % (width, 'dog')
'   dog'

Normally field values are aligned to right. For left aligned, use '-'.

Now I understand more clearly how to be displayed in tabulate.

>>> def tabulate(cfdist, words, categories):
...     print '%-16s' % 'Category',
...     for word in words:
...             print '%6s' % word,
...     print
...     for category in categories:
...             print '%-16s' % category,
...             for word in words:
...                     print '%6d' % cfdist[category][word],
...             print
... 
>>> from nltk.corpus import brown
>>> cfd = nltk.ConditionalFreqDist(
...     (genre, word)
...     for genre in brown.categories()
...     for word in brown.words(categories=genre))
>>> genres = ['news', 'religion', 'hobbies', 'science_fiction', 'romance', 'humor']
>>> modals = ['can', 'could', 'may', 'might', 'must', 'will']
>>> tabulate(cfd, modals, genres)
Category            can  could    may  might   must   will
news                 93     86     66     38     50    389
religion             82     59     78     12     54     71
hobbies             268     58    131     22     83    264
science_fiction      16     49      4     12      8     16
romance              74    193     11     51     45     43
humor                16     30      8      8      9     13
>>>

Export the result to a file.

>>> output_file = open('output.txt', 'w')
>>> words = set(nltk.corpus.genesis.words('english-kjv.txt'))
>>> for word in sorted(words):
...     output_file.write(word + "\n")
... 
>>>

What happen if I ommit "\n" ?

>>> output_file = open('output2.txt', 'w')
>>> for word in sorted(words):
...     output_file.write(word)
...

Needless to explain...

When wrting non-text data, need to convert into string first.

>>> len(words)
2789
>>> str(len(words))
'2789'
>>> output_file.write(str(len(words)) + "\n")
>>> output_file.close()

The result was added at the end of the file.

>>> saying = ['After', 'all', 'is', 'said', 'and', 'done', ',',
...     'more', 'is', 'said', 'than', 'done', '.']
>>> for word in saying:
...     print word, '(' + str(len(word)) + ')',
...
After (5) all (3) is (2) said (4) and (3) done (4) , (1) more (4) is (2) said (4) than (4) done (4) . (1)
>>>

Wrapping text example.

>>> from textwrap import fill
>>> format = '%s (%d),'
>>> pieces = [format % (word, len(word)) for word in saying]
>>> output = ' '.join(pieces)
>>> wrapped = fill(output)
>>> print wrapped
After (5), all (3), is (2), said (4), and (3), done (4), , (1), more
(4), is (2), said (4), than (4), done (4), . (1),
>>>

Chapter 3 is almost over. Let's move onto exercises!