KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching
description
Transcript of KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching
![Page 1: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/1.jpg)
KIWIA Multi-Lingual Usage Consultation Tool
based on Internet Searching
Kumiko TANAKA-IshiiMasato YAMAMOTOHiroshi NAKAGAWA
Language Informatics Laboratory,University of Tokyo
![Page 2: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/2.jpg)
How do you say
「無線 LAN 」 in French?
Never be found in dictionaries…
wireless
=
![Page 3: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/3.jpg)
What could be done
1. Look up part of key in the dictionary
無線 → sans fil
2. Enter the translation into search engine
![Page 4: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/4.jpg)
1. Look up part of key in the dictionary
無線 = sans fil
2. Enter the translation into search engine
le reseau sans fil le net sans fil l’acces sans fil
Top 20
What could be done
![Page 5: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/5.jpg)
1. Look up part of key in the dictionary
無線 = sans fil
2. Enter the translation into search engine
les reseaux sans fil l’internet sans fil
Sum up top 500
What could be done
![Page 6: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/6.jpg)
Similar Ocassions….• Up to date expression * sans fil les reseaux sans fil• Commonness of expression (le reseaux)/(les reseaux) sans fil • Simple Q&A * Zidane• Grammar check -noun gender un/une langage, un/une langue -preposition discuter ? -articles du/de Japon, du/de Nancy
Have clues but can’t remember exactly…
![Page 7: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/7.jpg)
Our Idea• Multiple candidates Which one? • Minority candidate 300th candidate?
Impossible to manually scan 500 candidates!
A tool for scanning search engine results
Kiwi
![Page 8: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/8.jpg)
Related Work 1 : www.webcorp.org.uk
ー The Web as Corpus ー1999
![Page 9: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/9.jpg)
• English only• Sum up fixed length words• Slow!!
Related Work 1 : www.webcorp.org.uk
ー The Web as Corpus ー
![Page 10: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/10.jpg)
•Compare the frequency of 2 phrases •Multilingual
Related work 2 : Google Fight, Google Duel
![Page 11: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/11.jpg)
Related work 2 : Google Fight, Google Duel
•Comparison of two phrases only
![Page 12: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/12.jpg)
Kiwi’s characteristics
•Flexible query - comparison A/B - wild card *A A*B B*
•Multilingual aspect -String based processing Language dependent analysis
User has clues
English webuser36.5%globstats 2002
![Page 13: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/13.jpg)
Online Language Populations
http://www.glreach.com/globstats
![Page 14: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/14.jpg)
The Process
1. Obtain search results
summaries only
2. Extract candidates *A, A*
3. Order candidates
![Page 15: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/15.jpg)
• Frequent
• Moderately long
• Various succeeding characters
Characteristics of candidates (at entry being A *)
![Page 16: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/16.jpg)
• Frequent
• Moderately long
• Various succeeding characters Extraction
Ordering} }
Characteristics of candidates (at entry being A *)
![Page 17: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/17.jpg)
Extraction: number of succeeding character kind
n a u r et _
a
l
human *
_
cut
longer context
increase
decrease increase
Branching degreedecreases
cut
is
![Page 18: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/18.jpg)
Ordering
-Shorter more frequent
Ex. “international” includes “in”
Eval-fun (candidate) = freq ( candidate)× log (length(candidate) + 1)
Empirically defined
![Page 19: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/19.jpg)
Examples
german_demo_viewlet_swf.html
![Page 20: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/20.jpg)
German
“Atemwegssyndrom”
Other candidates
・ Respiratorische Syndrom
・ oder Chronische
Gesundheits ・ Erkrankung
Etc…
![Page 21: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/21.jpg)
Japanese
“ 重症急性呼吸器”( SARS)
Other candidates
・シックハウス (Sick Building)
・エコノミークラス (Economy class)
・慢性疲労 (Chronic Fatigue)
Etc…
![Page 22: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/22.jpg)
French
“Respiratoire Aigu Sévère ”
Other candidates
・ de Marfan ・ Prémenstruel ・ de la class
économique
Etc…
![Page 23: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/23.jpg)
Chinese
“ 嚴重急性呼吸道”
Other candidates
・經前 ・電腦視覺 ・腕道 ・睡眠呼吸中止 ・後天免疫缺乏
Etc…
![Page 24: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/24.jpg)
Korean
“ 중증급성호흡기”
Other candidates
・급성호흡기 ・만성피로 ・ 과민성 대장
Etc…
![Page 25: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/25.jpg)
Evaluation: English collocations
Kiwi : 1000 match totalized: examine top n (exact match)Baseline : Search engine results: top n (included or not)
and so on in spite of
and so * * spite of
tail head
300 300
![Page 26: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/26.jpg)
Results
③ upper bound of Kiwi
n = 1 n = 10 n = 1000head
head
tail
tail
![Page 27: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/27.jpg)
Results③ ー② Extraction error② ー① Ordering error
+ test set problem} Ex. be anxious for to
≠search engine
n = 1 n = 10 n = 1000head
head
tail
tail
![Page 28: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/28.jpg)
No. matches
Rankin
g
Insufficient
Sufficient
Data amountRank transition of best & correct candidate
![Page 29: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/29.jpg)
EvaluationUsing different search engines
Answer in Top 1
in Top 10 In Candidate
Mean Reciprocal
Ranking
AltaVista head
AllTheWeb head
Google head
77.0%74.8%76.4%
93.3%91.5%92.7%
97.0%97.6%97.2%
0.830.800.82
AltaVista tail
AllTheWeb tail
Google tail
78.5%73.6%75.8%
92.8%93.2%
93.8%
96.3%97.8%98.1%
0.850.800.82
Red score is the best score
n = 1 n = 10 n=1000
![Page 30: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/30.jpg)
Obtain Q&A answers from search engine results
Related work 3 : NL based on search engineEx. Q&A Brill et al.(2002)
already totalized results
What does this mean to NL?
![Page 31: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/31.jpg)
Comparison of Results
head tail
Top n candidates
1 3 10 1 3 10
Using different search engine
AltaVista-
AllTheWeb87.4% 72.9
%56.2% 84.5% 71.0% 57.8%
AllTheWeb-
Google86.2% 75.1
%59.4% 87.0% 75.8% 61.8%
Google-
AltaVista87.8% 72.2
%57.9% 82.3% 75.5% 59.9%
Using different segment of search results (AltaVista)
1st 1000 match–
Next 1000 match
91.1% 69.1%
60.0% 87.0% 70.9% 59.9%
![Page 32: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/32.jpg)
Conclusion
• Usage consultation tool
up-to-date expression, grammar check
• Totalize search engine results
• Multi-lingual & flexible entry
• String based candidate extraction and ordering
• Evaluation
![Page 33: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching](https://reader035.fdocument.pub/reader035/viewer/2022062422/56813e76550346895da893f3/html5/thumbnails/33.jpg)
Thank you!
Demonstration at ACL
(demo session & Univ.Tokyo booth)
Please come!