O'Reilly: Chapter 1 Exercise 23-29

23.

>>>for w in [w for w in text6 if w.isupper()]:
...     print w
...
....
ARTHUR
GALAHAD
KNIGHTS
TIM
ROBIN
ARTHUR
ROBIN
GALAHAD
ARTHUR
ROBIN
KNIGHTS
ARTHUR
TIM
I
KNIGHTS
....

24.

At beginning, I thought the requirement is to extract words who meet all conditions but I could not find. I executed one by one.

>>> [w for w in text6 if w.endswith('ize')]
[]
>>> [w for w in text6 if 'z' in w]
['zone', 'amazes', 'Fetchez', 'Fetchez', 'zoop', 'zoo', 'zhiv', 'frozen', 'zoosh']
>>> [w for w in text6 if 'pt' in w]
['empty', 'aptly', 'Thpppppt', 'Thppt', 'Thppt', 'empty', 'Thppppt', 'temptress' , 'temptation', 'ptoo', 'Chapter', 'excepting', 'Thpppt']
>>> set([w for w in text6 if (w.istitle() and len(w) > 1)])
set(['Welcome', 'Winter', 'Lead', 'Uugh', 'Does', 'Saint', 'Until', 'Today', 'Th ou', 'Burn', 'Lucky', 'Uhh', 'Not', 'Now', 'Twenty', 'Where', 'Just', 'Course', 'Go', 'Erbert', 'Uther', 'Actually', 'Cherries', 'Thpppt', 'Bloody', 'Aramaic', 'Mmm', 'Put', 'Haw', 'True', 'Pull', 'Fiends', 'Agh', 'Yup', 'We', 'Arthur', 'Zo ot', 'English', 'Alright', 'My', 'Silence', 'Clark', 'Bedevere', 'Bors', 'Back',  'Maynard', 'Fetchez', 'Seek', 'Exactly', 'Doctor', 'Rather', 'When', 'Three', ' Providence', 'Book', 'Therefore', 'Huh', 'Stay', 'Umhm', 'Aaaaaaaah', 'Huy', 'Th ose', 'Dingo', 'Cider', 'Chop', 'Aauuugh', 'So', 'Found', 'Guy', 'Oui', 'Anarcho ', 'Torment', 'Our', 'Your', 'Lie', 'Almighty', 'Galahad', 'Britons', 'Lord', 'W ho', 'Beast', 'Loimbard', 'Why', 'Don', 'Guards', 'Oooh', 'All', 'Aaauugh', 'Ass yria', 'Yeaaah', 'One', 'Farewell', 'Greetings', 'Beyond', 'Blue', 'What', 'Ayy' , 'His', 'Recently', 'Here', 'Hic', 'Away', 'Wait', 'Concorde', 'Herbert', 'Ere' , 'Bad', 'She', 'Mother', 'Shh', 'Erm', 'Tower', 'Robin', 'Summer', 'Chaste', 'E nchanter', 'Skip', 'Four', 'Say', 'Anthrax', 'Mud', 'Armaments', 'Build', 'Which ', 'Nador', 'Hiyaah', 'Woa', 'More', 'Picture', 'Holy', 'Very', 'Practice', 'Pac king', 'Uuh', 'Hold', 'Huyah', 'Throw', 'Must', 'None', 'This', 'Leaving', 'Ives ', 'Nine', 'Stand', 'Firstly', 'Brother', 'Oooo', 'Eh', 'Amen', 'Jesus', 'Camaaa aaargue', 'Divine', 'Speak', 'Even', 'Hallo', 'Dappy', 'Yay', 'Iiiives', 'Prepar e', 'There', 'Please', 'Black', 'Pure', 'Quoi', 'Excalibur', 'Iesu', 'Hmm', 'Mid get', 'Angnor', 'Splendid', 'Aggh', 'Lancelot', 'Victory', 'See', 'Will', 'Shrub beries', 'Court', 'Aauuuves', 'God', 'Father', 'Patsy', 'It', 'Peng', 'Other', ' Then', 'Halt', 'Thee', 'Ridden', 'Aaaah', 'Knight', 'Antioch', 'They', 'Ask', 'W ith', 'Gallahad', 'Off', 'Thy', 'Well', 'Didn', 'Anybody', 'Isn', 'Grail', 'Neee ', 'The', 'Bridge', 'Thsss', 'Hiyah', 'Yapping', 'Robinson', 'Hah', 'Explain', ' Aauuggghhh', 'Hill', 'Forward', 'Behold', 'European', 'Shut', 'Meanwhile', 'Chic kennn', 'French', 'Psalms', 'Auuuuuuuugh', 'Ector', 'Aah', 'Keep', 'Quick', 'Onc e', 'Right', 'Help', 'Over', 'Anyway', 'Aaaugh', 'For', 'France', 'Umm', 'Walk',  'Dramatically', 'Good', 'Run', 'That', 'Arimathea', 'Forgive', 'Ecky', 'King', 'Could', 'Quiet', 'Hooray', 'Himself', 'African', 'Launcelot', 'Gable', 'Bravest ', 'Bring', 'Shrubber', 'Aaah', 'Yes', 'Death', 'Christ', 'Would', 'Hey', 'Waa',  'Hee', 'Sorry', 'Heh', 'Get', 'Crapper', 'But', 'Hiyya', 'Aaaaaaaaah', 'Schools ', 'Hurry', 'Princess', 'Together', 'Dragon', 'Honestly', 'Caerbannog', 'Action' , 'Knights', 'Round', 'And', 'Old', 'How', 'Winston', 'Mercea', 'Battle', 'Follo w', 'Aaaaugh', 'Open', 'Ahh', 'Bedwere', 'Hya', 'Tis', 'Til', 'Tim', 'Charge', ' Wood', 'You', 'Nay', 'Tell', 'Stop', 'Aaaaaah', 'Excuse', 'Riiight', 'Supposing' , 'Aaauggh', 'Attila', 'Do', 'Clear', 'Alice', 'Apples', 'Bristol', 'Order', 'Tr y', 'Piglet', 'Tall', 'Spring', 'Is', 'Mind', 'Mine', 'Have', 'In', 'Table', 'De nnis', 'If', 'Wayy', 'Thank', 'Ninepence', 'Said', 'Hyy', 'Churches', 'Be', 'Aug h', 'Ewing', 'Far', 'Oooohoohohooo', 'Surely', 'Consult', 'By', 'On', 'Unfortuna tely', 'Oh', 'Did', 'Of', 'Supreme', 'Morning', 'Tale', 'Ow', 'England', 'Or', ' Dis', 'Brave', 'Ohh', 'Pin', 'Pendragon', 'Are', 'Bones', 'Fine', 'Prince', 'Too ', 'Iiiiives', 'Since', 'Pie', 'Idiom', 'Between', 'Whoa', 'Listen', 'Monsieur',  'Oooooooh', 'Frank', 'Quite', 'Let', 'Ho', 'Hm', 'Nothing', 'Ha', 'He', 'Chapte r', 'Look', 'Thppppt', 'Um', 'Un', 'Uh', 'Bon', 'Hello', 'First', 'Ages', 'Autum n', 'Looks', 'Olfin', 'Message', 'Really', 'Ni', 'Use', 'Cut', 'No', 'Make', 'Aa uuuuugh', 'Two', 'Quickly', 'Everything', 'Thpppppt', 'Nu', 'Rheged', 'Most', 'H ang', 'Ooh', 'Hand', 'Gawain', 'Every', 'Aaagh', 'Come', 'Bread', 'Peril', 'Stea dy', 'Thppt', 'Ulk', 'Silly', 'Defeat', 'Eee', 'Castle', 'Grenade', 'Camelot', ' Aagh', 'Britain', 'Joseph', 'Badon', 'Sir', 'Hoa', 'Perhaps', 'Hoo', 'Saxons', ' Lake', 'Thursday', 'To', 'Shall', 'May', 'Never', 'Eternal', 'As', 'Cornwall', ' Running', 'Five', 'Gorge', 'Lady', 'Man', 'Great', 'Like', 'Yeaah', 'Remove', 'S wamp', 'Heee', 'Ah', 'Am', 'Yeah', 'An', 'Bravely', 'Allo', 'At', 'Ay', 'Roger',  'Chicken'])
>>>

Although no words were found for the first condition, there are some which end with 'ise' instead of 'ize'.

>>> [w for w in text6 if w.endswith('ise')]
['wise', 'wise', 'apologise', 'surprise', 'surprise', 'surprise', 'noise', 'surp rise']

25.

>>> sentx = ['she', 'sells', 'sea', 'shells', 'by', 'the', 'sea', 'shore']
>>> sentx
['she', 'sells', 'sea', 'shells', 'by', 'the', 'sea', 'shore']
>>> for w in [w for w in sentx if w.startswith('sh')]:
...     print w
...
she
shells
shore
>>> for w in [w for w in sentx if len(w) > 4]:
...     print w
...
sells
shells
shore
>>>

26.

>>> sum([len(w) for w in text1])
999044

This code to sum up the length ("len") of each word in text1. The calculated number should be total number of characters in text1.

27.

>>> def vocab_size(text):
...     return len(set(text))
...
>>> vocab_size(text1)
19317
>>> len(set(text1))
19317

28.

>>> def percent(word, text):
...     wf = text.count(word)
...     pt = wf / len(text) * 100
...     rt = []
...     rt.append(wf)
...     rt.append(pt)
...     return rt
...
>>> percent('whale', text1)
[906, 0.3473673313677301]

Not beautiful. I know I don't have a good sense of coding...

29.

>>> set(sent3) < set(text1)
True
>>> set(text3) < set(text1)
False
>>> len(set(text3))
2789
>>> len(set(text1))
19317

When using set() for comparison, I always used with len(). I thought they are just comparing length, however it cannot be explained for the seconde one, set(text3) < set(text1) is False.

According to language reference of Python (http://docs.python.org/2/library/sets.html), set() supports set to set comparison. Based on this information. This code means set(text1) includes all element of set(sent3). Therefore the return is True.

>>> set(sent3) &lt; set(text1)
True

The next one is to check whether set(text1) includes all element of set(text3).

>>> set(text3) &lt; set(text1)
False

The reason of returning False is that at least one element of set(text3) is not included in set(text1).

If so, this would be very helpful when comparing two sets.