Learning about Function (Chapter 4.4)
For reuse, create a function and save as a file.
import re def get_text(file): """Read text from a file, normalizing whitespace and stripping HTML markup.""" text = open(file).read() text = re.sub('\s+', ' ', text) text = re.sub(r'<.*?>', ' ', text) return text
To call the function, set the path.
>>> sys.path.append('/Users/xxx/Documents/workspace/NLTK Learning/scripts') >>> sys.path ['', '/Library/Python/2.7/site-packages/setuptools-0.6c11-py2.7.egg', '/Library/Python/2.7/site-packages/pip-1.3.1-py2.7.egg', '/Library/Python/2.7/site-packages/ipython-0.13.2-py2.7.egg', '/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python', '/Library/Python/2.7/site-packages/feedparser-5.1.3-py2.7.egg', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python27.zip', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages', '/Library/Python/2.7/site-packages', '/Users/xxx/Documents/workspace/NLTK Learning/scripts'] >>> >>> from get_text import * >>> get_text('/Users/xxx/Documents/workspace/NLTK Learning/text files/document2.txt') "I was born in Tokyo, Japan in mid of 1970's. After graduated university, I have been working as an IT engineer since then. Now my base is in Shanghai, China. Many of colleagues were surprised when they heard my major at university was Law. That's not so rare cases in Japan, as many of Japanese IT companies were hiring fresh graduates not only from computer science or mathematics area but also social sciences like law or economics. At that time ---- the end of 1990's, many of IT companies in Japan were eager to hire as many as possible even they need to educate them in the companies. "
To get help.
Help on function get_text in module get_text: get_text(file) Read text from a file, normalizing whitespace and stripping HTML markup. (END)
Input/Output of Functions:
>>> def repeat(msg, num): ... return ' '.join([msg] * num) ... >>> monty = 'Monty Python' >>> repeat(monty, 3) 'Monty Python Monty Python Monty Python' >>> >>> def monty(): ... return "Monty Python" ... >>> monty() 'Monty Python' >>> repeat(monty(),3) 'Monty Python Monty Python Monty Python'
Usually functions should either modify argument or return value.
>>> def my_sort1(mylist): ... mylist.sort() ... >>> def my_sort2(mylist): ... return sorted(mylist) ... >>> def my_sort3(mylist): ... mylist.sort() ... return mylist ...
In terms of that, the third one is not in good manner.
>>> def set_up(word, properties): ... word = 'lolcat' ... properties.append('noun') ... properties = 5 ... >>> w = '' >>> p = >>> set_up(w,p) >>> w '' >>> p ['noun']