EN

ディクテーションチャレンジ

自分の中でのコンプレックスの話です。英語でも中国語でもそうなんだけど、自分より上手な人を見ると、なんとも羨ましい。 どちらの言語も仕事(英語)、サバイバル(中国語)で困らないレベルには何とか辿り着いたものの、そこからの上積みがなくて、彷徨える中…

道をよく聞かれるとです

EN

技術ブログの範疇じゃないかもしれないけど、たまには良いかなということで。通勤の行き帰りに新宿駅周辺を通るんだけど、最近やたらと英語で道を聞かれます。場所柄かインバウンドの方が多いので、自ずと機会も多いのでしょうね。そんな例を三つばかり。 朝…

Pinyin Trainer - 锻炼拼音 Version 1.0.2 released now!

Apple's review has been finished much faster than I expected. Now the version 1.0.2 is released in AppStore!Corrections; Revise master data of word "埋没" (mai2 mo4) Eliminate obviously incorrect Pinyin like "lialg" in options Pinyin Train…

Pinyin Trainer - 锻炼拼音 Version 1.0.2 to be released

As I will change some core logics in the next version 1.1, I decided to separately release minor bugs fix version 1.0.2 in advance. Following bugs will be fixed in version 1.0.2. Incorrect pinyin for word "埋没" (Wrong) mai2 mei2 (Right) m…

New Bug report: Incorrect pinyin (version 1.0 - 1.0.1)

During the functional test, I found incorrect pinyin is maintained following word:(Wrong) 埋没 mai2 mei2 (Right) 埋没 mai2 mo4 In addition, it is still possible that strange pinyin is displayed as dummy choice. "lialg" should be displayed …

Pinyin Trainer - 锻炼拼音 Version 1.0.1 released now!

I am pleased to announce that the new version of Pinyin Trainer - 锻炼拼音 has been released in App Store!This version (1.0.1) includes some minor bugs fix like: Font size adjustment Eliminate obviously incorrect Pinyin like lelg Adjust p…

Still waiting for Apple's review

As I previously mentioned, I have already submitted version 1.0.1 with small bugs fix to App Store. The status is still "Waiting for review". Once the review has been started, the new version will be released shortly. By the way, I am very…

Pinyin Trainer (锻炼拼音) - Version 1.0.1 to be released

I uploaded the new version (1.0.1) with bugs fix to App Store today. This version will be available after Apple's review.本日、バグ修正を含んだ新しいバージョン(1.0.1)をApp Storeにアップロードしました。このバージョンはアップルのレビューを経…

Pinyin Trainer - 锻炼拼音 - 2013/09/25 Update

(Japanese version follows) Finally relesed to App Store! (English) The version 1.0 is releaesed to App Store on September 25th 2013!I have already found following small issues in the current released version (1.0) and will be fixed at the …

Pinyin Trainer - 锻炼拼音 - Support page (EN)

News: The appplication is released in App Store on Sep 25th 2013! Welcome to Pinyin Trainer Support page One of the difficulties to learn Chinese is to remember pronunciations. Even though you have multiple-year-experience of learning Chin…

Chunking (7.2)

Now I jump to Chapter 7.2 Noun Phrase Chunking (7.2.1) >>> sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ... ("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] >>> grammar = "NP: {<DT>?<JJ>*<NN>}" >>> cp = n</nn></jj></dt>…

Decision Trees (6.4)

Entropy and Information Gain (6.4.1) Try to execute entropy calculation sample. >>> import nltk >>> from nltk_init import * >>> import math >>> def entoropy(labels): ... freqdist = nltk.FreqDist(labels) ... probs = [freqdist.freq(l) for l …

Evaluation (6.3)

The Test Set (6.3.1) >>> import random >>> from nltk.corpus import brown >>> tagged_sents = list(brown.tagged_sents(categories='news')) >>> random.shuffle(tagged_sents) >>> size = int(len(tagged_sents) * 0.1) >>> train_set, test_set = tagg…

Recoginzing Textual Entailment (6.2.3)

Recognizing Textual Entailment (6.2.3) Save as rte_features.py with following source code. import nltk def rte_features(rtepair): extractor = nltk.RTEFeatureExtactor(rtpair) features = {} features['word_overlap'] = len(extractor.overlap('w…

Identifying Dialogue Act Types (6.2.2)

Identifying Dialogue Act Types (6.2.2) >>> from nltk_init import * >>> posts = nltk.corpus.nps_chat.xml_posts()[:10000] >>> def dialogue_act_features(post): ... features = {} ... for word in nltk.word_tokenize(post): ... features['contains…

Further Examples of Supervised Classification (6.2)

Sentence Segmentation (6.2.1) >>> sents = nltk.corpus.treebank_raw.sents() >>> tokens = [] >>> boundaries = set() >>> offset = 0 >>> for sent in nltk.corpus.treebank_raw.sents(): ... tokens.extend(sent) ... offset += len(sent) ... boundari…

Sequence Classification (6.1.6)

この本の写経シリーズは英語で始めてしまったので、とりあえず英語のまま行きます。特に深い意味はありません。As I have already started this series (learning NLTK) in English, continue to write in English. Sequence Classification (6.1.6) This sa…

Using context (6.1.5)

This example to get previous word as well as suffix. >>> train_set, test_set = featuresets[size:], featuresets[:size] >>> classifier = nltk.DecisionTreeClassifier.train(train_set) >>> >>> def pos_features(sentence, i): ... features = {"suf…