Comparing Vocabulary List

Going through O'Reilly's textbook chapter 2.4.3:
The title of this post might not be same as English version of the textbook as I am using Japanese version.

Swadesh Vocabulary List:

>>> from nltk.corpus import swadesh
>>> swadesh.fileids()
['be', 'bg', 'bs', 'ca', 'cs', 'cu', 'de', 'en', 'es', 'fr', 'hr', 'it', 'la', 'mk', 'nl', 'pl', 'pt', 'ro', 'ru', 'sk', 'sl', 'sr', 'sw', 'uk']

According to fileids(), we can see 24 languages are included.

In English('en') which words are included?

>>> swadesh.words('en')
['I', 'you (singular), thou', 'he', 'we', 'you (plural)', 'they', 'this', 'that', 'here', 'there', 'who', 'what', 'where', 'when', 'how', 'not', 'all', 'many', 'some', 'few', 'other', 'one', 'two', 'three', 'four', 'five', 'big', 'long', 'wide', 'thick', 'heavy', 'small', 'short', 'narrow', 'thin', 'woman', 'man (adult male)', 'man (human being)', 'child', 'wife', 'husband', 'mother', 'father', 'animal', 'fish', 'bird', 'dog', 'louse', 'snake', 'worm', 'tree', 'forest', 'stick', 'fruit', 'seed', 'leaf', 'root', 'bark (from tree)', 'flower', 'grass', 'rope', 'skin', 'meat', 'blood', 'bone', 'fat (noun)', 'egg', 'horn', 'tail', 'feather', 'hair', 'head', 'ear', 'eye', 'nose', 'mouth', 'tooth', 'tongue', 'fingernail', 'foot', 'leg', 'knee', 'hand', 'wing', 'belly', 'guts', 'neck', 'back', 'breast', 'heart', 'liver', 'drink', 'eat', 'bite', 'suck', 'spit', 'vomit', 'blow', 'breathe', 'laugh', 'see', 'hear', 'know (a fact)', 'think', 'smell', 'fear', 'sleep', 'live', 'die', 'kill', 'fight', 'hunt', 'hit', 'cut', 'split', 'stab', 'scratch', 'dig', 'swim', 'fly (verb)', 'walk', 'come', 'lie', 'sit', 'stand', 'turn', 'fall', 'give', 'hold', 'squeeze', 'rub', 'wash', 'wipe', 'pull', 'push', 'throw', 'tie', 'sew', 'count', 'say', 'sing', 'play', 'float', 'flow', 'freeze', 'swell', 'sun', 'moon', 'star', 'water', 'rain', 'river', 'lake', 'sea', 'salt', 'stone', 'sand', 'dust', 'earth', 'cloud', 'fog', 'sky', 'wind', 'snow', 'ice', 'smoke', 'fire', 'ashes', 'burn', 'road', 'mountain', 'red', 'green', 'yellow', 'white', 'black', 'night', 'day', 'year', 'warm', 'cold', 'full', 'new', 'old', 'good', 'bad', 'rotten', 'dirty', 'straight', 'round', 'sharp', 'dull', 'smooth', 'wet', 'dry', 'correct', 'near', 'far', 'right', 'left', 'at', 'in', 'with', 'and', 'if', 'because', 'name']

Using two language ('fr', 'en'), generate tuples.

>>> fr2en = swadesh.entries(['fr', 'en'])
>>> fr2en
[('je', 'I'), ('tu, vous', 'you (singular), thou'), ('il', 'he'), ('nous', 'we'), ('vous', 'you (plural)'), ('ils, elles', 'they'), ('ceci', 'this'), ('cela', 'that'), ('ici', 'here'), ('l\xc3\xa0', 'there'), ('qui', 'who'), ('quoi', 'what'), ('o\xc3\xb9', 'where'), ('quand', 'when'), ('comment', 'how'), ('ne...pas', 'not'), ('tout', 'all'), ('plusieurs', 'many'), ('quelques', 'some'), ('peu', 'few'), ('autre', 'other'), ('un', 'one'), ('deux', 'two'), ('trois', 'three'), ('quatre', 'four'), ('cinq', 'five'), ('grand', 'big'), ('long', 'long'), ('large', 'wide'), ('\xc3\xa9pais', 'thick'), ('lourd', 'heavy'), ('petit', 'small'), ('court', 'short'), ('\xc3\xa9troit', 'narrow'), ('mince', 'thin'), ('femme', 'woman'), ('homme', 'man (adult male)'), ('homme', 'man (human being)'), ('enfant', 'child'), ('femme, \xc3\xa9pouse', 'wife'), ('mari, \xc3\xa9poux', 'husband'), ('m\xc3\xa8re', 'mother'), ('p\xc3\xa8re', 'father'), ('animal', 'animal'), ('poisson', 'fish'), ('oiseau', 'bird'), ('chien', 'dog'), ('pou', 'louse'), ('serpent', 'snake'), ('ver', 'worm'), ('arbre', 'tree'), ('for\xc3\xaat', 'forest'), ('b\xc3\xa2ton', 'stick'), ('fruit', 'fruit'), ('graine', 'seed'), ('feuille', 'leaf'), ('racine', 'root'), ('\xc3\xa9corce', 'bark (from tree)'), ('fleur', 'flower'), ('herbe', 'grass'), ('corde', 'rope'), ('peau', 'skin'), ('viande', 'meat'), ('sang', 'blood'), ('os', 'bone'), ('graisse', 'fat (noun)'), ('\xc5\x93uf', 'egg'), ('corne', 'horn'), ('queue', 'tail'), ('plume', 'feather'), ('cheveu', 'hair'), ('t\xc3\xaate', 'head'), ('oreille', 'ear'), ('\xc5\x93il', 'eye'), ('nez', 'nose'), ('bouche', 'mouth'), ('dent', 'tooth'), ('langue', 'tongue'), ('ongle', 'fingernail'), ('pied', 'foot'), ('jambe', 'leg'), ('genou', 'knee'), ('main', 'hand'), ('aile', 'wing'), ('ventre', 'belly'), ('entrailles', 'guts'), ('cou', 'neck'), ('dos', 'back'), ('sein, poitrine', 'breast'), ('c\xc5\x93ur', 'heart'), ('foie', 'liver'), ('boire', 'drink'), ('manger', 'eat'), ('mordre', 'bite'), ('sucer', 'suck'), ('cracher', 'spit'), ('vomir', 'vomit'), ('souffler', 'blow'), ('respirer', 'breathe'), ('rire', 'laugh'), ('voir', 'see'), ('entendre', 'hear'), ('savoir', 'know (a fact)'), ('penser', 'think'), ('sentir', 'smell'), ('craindre, avoir peur', 'fear'), ('dormir', 'sleep'), ('vivre', 'live'), ('mourir', 'die'), ('tuer', 'kill'), ('se battre', 'fight'), ('chasser', 'hunt'), ('frapper', 'hit'), ('couper', 'cut'), ('fendre', 'split'), ('poignarder', 'stab'), ('gratter', 'scratch'), ('creuser', 'dig'), ('nager', 'swim'), ('voler', 'fly (verb)'), ('marcher', 'walk'), ('venir', 'come'), ("s'\xc3\xa9tendre", 'lie'), ("s'asseoir", 'sit'), ('se lever', 'stand'), ('tourner', 'turn'), ('tomber', 'fall'), ('donner', 'give'), ('tenir', 'hold'), ('serrer', 'squeeze'), ('frotter', 'rub'), ('laver', 'wash'), ('essuyer', 'wipe'), ('tirer', 'pull'), ('pousser', 'push'), ('jeter', 'throw'), ('lier', 'tie'), ('coudre', 'sew'), ('compter', 'count'), ('dire', 'say'), ('chanter', 'sing'), ('jouer', 'play'), ('flotter', 'float'), ('couler', 'flow'), ('geler', 'freeze'), ('gonfler', 'swell'), ('soleil', 'sun'), ('lune', 'moon'), ('\xc3\xa9toile', 'star'), ('eau', 'water'), ('pluie', 'rain'), ('rivi\xc3\xa8re', 'river'), ('lac', 'lake'), ('mer', 'sea'), ('sel', 'salt'), ('pierre', 'stone'), ('sable', 'sand'), ('poussi\xc3\xa8re', 'dust'), ('terre', 'earth'), ('nuage', 'cloud'), ('brouillard', 'fog'), ('ciel', 'sky'), ('vent', 'wind'), ('neige', 'snow'), ('glace', 'ice'), ('fum\xc3\xa9e', 'smoke'), ('feu', 'fire'), ('cendres', 'ashes'), ('br\xc3\xbbler', 'burn'), ('route', 'road'), ('montagne', 'mountain'), ('rouge', 'red'), ('vert', 'green'), ('jaune', 'yellow'), ('blanc', 'white'), ('noir', 'black'), ('nuit', 'night'), ('jour', 'day'), ('an, ann\xc3\xa9e', 'year'), ('chaud', 'warm'), ('froid', 'cold'), ('plein', 'full'), ('nouveau', 'new'), ('vieux', 'old'), ('bon', 'good'), ('mauvais', 'bad'), ('pourri', 'rotten'), ('sale', 'dirty'), ('droit', 'straight'), ('rond', 'round'), ('tranchant, pointu, aigu', 'sharp'), ('\xc3\xa9mouss\xc3\xa9', 'dull'), ('lisse', 'smooth'), ('mouill\xc3\xa9', 'wet'), ('sec', 'dry'), ('juste, correct', 'correct'), ('proche', 'near'), ('loin', 'far'), ('\xc3\xa0 droite', 'right'), ('\xc3\xa0 gauche', 'left'), ('\xc3\xa0', 'at'), ('dans', 'in'), ('avec', 'with'), ('et', 'and'), ('si', 'if'), ('parce que', 'because'), ('nom', 'name')]

Do some translation.

>>> translate = dict(fr2en)
>>> translate['chien']
'dog'
>>> translate['jeter']
'throw'

In this example, translation can be done only from French to English.

Add more languages, German and Spanish.

>>> de2en = swadesh.entries(['de', 'en'])
>>> es2en = swadesh.entries(['es', 'en'])
>>> translate.update(dict(de2en))
>>> translate.update(dict(es2en))
>>> translate['Hund']
'dog'
>>> translate['perro']
'dog'
>>> 

Displaying multiple languages at the same time.

>>> languages = ['en', 'de', 'nl', 'es', 'fr', 'pt', 'la']
>>> for i in [139, 140, 141, 142]:
...     print swadesh.entries(languages)[i]
... 
('say', 'sagen', 'zeggen', 'decir', 'dire', 'dizer', 'dicere')
('sing', 'singen', 'zingen', 'cantar', 'chanter', 'cantar', 'canere')
('play', 'spielen', 'spelen', 'jugar', 'jouer', 'jogar, brincar', 'ludere')
('float', 'schweben', 'zweven', 'flotar', 'flotter', 'flutuar, boiar', 'fluctuare')

Toolbox: (Chapter 2.4.4)

Using Rotokas language?

>>> from nltk.corpus import toolbox
>>> toolbox.fileids()
[]
>>> toolbox.entries('rotokas.dic')
[('kaa', [('ps', 'V'), ('pt', 'A'), ('ge', 'gag'), ('tkp', 'nek i pas'), ('dcsv', 'true'), ('vx', '1'), ('sc', '???'), ('dt', '29/Oct/2005'), ('ex', 'Apoka ira kaaroi aioa-ia reoreopaoro.'), ('xp', 'Kaikai i pas long nek bilong Apoka bikos em i kaikai na toktok.'), ('xe', 'Apoka is gagging from food while talking.')]), ('kaa', [('ps', 'V'), ('pt', 'B'), ('ge', 'strangle'), ('tkp', 'pasim nek'), ('arg', 'O'), ('vx', '2'), ('dt', '07/Oct/2006'), ('ex', 'Rera rauroro rera kaarevoi.'), ('xp', 'Em i holim pas em na nekim em.'),
...
('kuvuto', [('ps', 'N'), ('pt', '???'), ('ge', 'clothes'), ('ge', 'clothing'), ('tkp', 'laplap'), ('dt', '28/Jul/2004')])]

At this stage, just introducing that Toolbox has various and flexible functions. I will learn at chapter 11.