Entries from 2013-05-13 to 1 day

Processing HTML (3.1.2) / Search engine (3.1.3)

Continuing as of chapter 3.1.2 in the whale book. >>> url = "http://news.bbc.co.uk/2/hi/health/2284783.stm" >>> html = urlopen(url).read() >>> html[:60] '<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN' Can display the sour…

Accessing to text source (3.1.1)

Now start Chapter 3 of the whale book. Let's import "Crime and Punishment" from the Gutenberg Ebook. >>> from __future__ import division >>> import nltk, re, pprint >>> >>> from urllib import urlopen >>> url = "http://www.gutenberg.org/fil…