From list to string (3.9)
Now in chapter 3.9 in the whale book.
How to use join().
>>> silly = ['We', 'called', 'him', 'Tortoise', 'because', 'he', 'taught', 'us', '.'] >>> ' '.join(silly) 'We called him Tortoise because he taught us .' >>> ';'.join(silly) 'We;called;him;Tortoise;because;he;taught;us;.' >>> ''.join(silly) 'WecalledhimTortoisebecausehetaughtus.' >>>
Nothing new to me.
>>> word = 'cat' >>> sentence = """hello ... world""" >>> print word cat >>> print sentence hello world >>> word 'cat' >>> sentence 'hello\nworld' >>>
I have already learned the result difference when using print statement in the past.
>>> fdist = nltk.FreqDist(['dog', 'cat', 'dog', 'cat', 'dog', 'snake', 'dog', 'cat']) >>> for word in fdist: ... print word, '->', fdist[word], ';', ... dog -> 4 ; cat -> 3 ; snake -> 1 ; >>> for word in fdist: ... print '%s->%d;' % (word, fdist[word]), ... dog->4; cat->3; snake->1; >>>
One of the reasons why I like this book is that it let me try something first even if I don't understand the meaning, then later part I can get a detailed explanation.
Actually it was hard to understand how to use '%' in print statement when I saw first time in the earlier section of the textbook. What I could at that time was to assume it.
Now I can check my assumption was correct or not.
>>> '%s->%d;' % ('cat', 3) 'cat->3;' >>> '%s->%d;' % 'cat' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: not enough arguments for format string >>> '%s->' % 'cat' 'cat->' >>> '%d' % 3 '3' >>> 'I want a %s right now' % 'coffee' 'I want a coffee right now' >>> "%s wants a %s %s" % ("Lee", "sandwich", "for lunch") 'Lee wants a sandwich for lunch' >>> template = 'Lee wants a %s right now' >>> menu = ['sandwich', 'spam fritter', 'pancake'] >>> for snack in menu: ... print template % snack ... Lee wants a sandwich right now Lee wants a spam fritter right now Lee wants a pancake right now >>>
To specify the width of the field, we can use following way.
>>> '%6s' % 'dog' ' dog' >>> '%-6s' % 'dog' 'dog ' >>> width = 6 >>> '%-*s' % (width, 'dog') 'dog ' >>> '%*s' % (width, 'dog') ' dog'
Normally field values are aligned to right. For left aligned, use '-'.
Now I understand more clearly how to be displayed in tabulate.
>>> def tabulate(cfdist, words, categories): ... print '%-16s' % 'Category', ... for word in words: ... print '%6s' % word, ... print ... for category in categories: ... print '%-16s' % category, ... for word in words: ... print '%6d' % cfdist[category][word], ... print ... >>> from nltk.corpus import brown >>> cfd = nltk.ConditionalFreqDist( ... (genre, word) ... for genre in brown.categories() ... for word in brown.words(categories=genre)) >>> genres = ['news', 'religion', 'hobbies', 'science_fiction', 'romance', 'humor'] >>> modals = ['can', 'could', 'may', 'might', 'must', 'will'] >>> tabulate(cfd, modals, genres) Category can could may might must will news 93 86 66 38 50 389 religion 82 59 78 12 54 71 hobbies 268 58 131 22 83 264 science_fiction 16 49 4 12 8 16 romance 74 193 11 51 45 43 humor 16 30 8 8 9 13 >>>
Export the result to a file.
>>> output_file = open('output.txt', 'w') >>> words = set(nltk.corpus.genesis.words('english-kjv.txt')) >>> for word in sorted(words): ... output_file.write(word + "\n") ... >>>
What happen if I ommit "\n" ?
>>> output_file = open('output2.txt', 'w') >>> for word in sorted(words): ... output_file.write(word) ...
Needless to explain...
When wrting non-text data, need to convert into string first.
>>> len(words) 2789 >>> str(len(words)) '2789' >>> output_file.write(str(len(words)) + "\n") >>> output_file.close()
The result was added at the end of the file.
>>> saying = ['After', 'all', 'is', 'said', 'and', 'done', ',', ... 'more', 'is', 'said', 'than', 'done', '.'] >>> for word in saying: ... print word, '(' + str(len(word)) + ')', ... After (5) all (3) is (2) said (4) and (3) done (4) , (1) more (4) is (2) said (4) than (4) done (4) . (1) >>>
Wrapping text example.
>>> from textwrap import fill >>> format = '%s (%d),' >>> pieces = [format % (word, len(word)) for word in saying] >>> output = ' '.join(pieces) >>> wrapped = fill(output) >>> print wrapped After (5), all (3), is (2), said (4), and (3), done (4), , (1), more (4), is (2), said (4), than (4), done (4), . (1), >>>
Chapter 3 is almost over. Let's move onto exercises!