Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat...
Transcript of Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat...
![Page 1: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/1.jpg)
Natural Language Models and Interfaces���lecture 9
Ivan Titov
Institute for Logic, Language and Computation
![Page 2: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/2.jpg)
Today
2
} Machine translation: outlook
} motivating the task
} word-based models
} Integrating phrases and syntax
![Page 3: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/3.jpg)
3
Machine Translation
美国关岛国际机场及其办公室均接获一名自称沙地阿拉伯富商拉登等发出的电子邮件,威胁将会向机场等公众地方发动生化袭击後,关岛经保持高度戒备。
The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport.
[In this part, many slides are from Kevin Knight]
![Page 4: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/4.jpg)
4
Thousands of Languages Are Spoken MANDARIN 885,000,000 SPANISH 332,000,000 ENGLISH 322,000,000 BENGALI 189,000,000
HINDI 182,000,000 PORTUGUESE 170,000,000 RUSSIAN 170,000,000 JAPANESE 125,000,000 GERMAN 98,000,000
WU (China) 77,175,000 JAVANESE 75,500,800 KOREAN 75,000,000 FRENCH 72,000,000 VIETNAMESE 67,662,000
TELUGU 66,350,000 YUE (China) 66,000,000 MARATHI 64,783,000 TAMIL 63,075,000
TURKISH 59,000,000 URDU 58,000,000 MIN NAN (China) 49,000,000 JINYU (China) 45,000,000
GUJARATI 44,000,000 POLISH 44,000,000 ARABIC 42,500,000 UKRAINIAN 41,000,000
ITALIAN 37,000,000 XIANG (China) 36,015,000 MALAYALAM 34,022,000 HAKKA (China) 34,000,000
KANNADA 33,663,000 ORIYA 31,000,000 PANJABI 30,000,000 SUNDA 27,000,000
![Page 5: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/5.jpg)
Warren Weaver (1947)
ingcmpnqsnwf cv fpn owoktvcv hu ihgzsnwfv rqcffnw cw owgcnwf kowazoanv ...
![Page 6: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/6.jpg)
Warren Weaver (1947)
e e e e ingcmpnqsnwf cv fpn owoktvcv e e e hu ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv ...
![Page 7: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/7.jpg)
Warren Weaver (1947)
e e e the ingcmpnqsnwf cv fpn owoktvcv e e e hu ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv ...
![Page 8: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/8.jpg)
Warren Weaver (1947)
e he e the ingcmpnqsnwf cv fpn owoktvcv e e e t hu ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv ...
![Page 9: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/9.jpg)
Warren Weaver (1947)
e he e of the ingcmpnqsnwf cv fpn owoktvcv e e e t hu ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv ...
![Page 10: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/10.jpg)
Warren Weaver (1947)
e he e of the fof ingcmpnqsnwf cv fpn owoktvcv e f o e o oe t hu ihgzsnwfv rqcffnw cw owgcnwf ef kowazoanv ...
![Page 11: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/11.jpg)
Warren Weaver (1947)
e he e of the ingcmpnqsnwf cv fpn owoktvcv e e e t hu ihgzsnwfv rqcffnw cw owgcnwf e kowazoanv ...
![Page 12: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/12.jpg)
Warren Weaver (1947)
e he e is the sis ingcmpnqsnwf cv fpn owoktvcv e s i e i ie t hu ihgzsnwfv rqcffnw cw owgcnwf es kowazoanv ...
![Page 13: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/13.jpg)
Warren Weaver (1947)
decipherment is the analysis ingcmpnqsnwf cv fpn owoktvcv of documents written in ancient hu ihgzsnwfv rqcffnw cw owgcnwf languages ... kowazoanv ...
![Page 14: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/14.jpg)
“When I look at an article in Russian, I say: this is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”
- Warren Weaver, March 1947
![Page 15: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/15.jpg)
“When I look at an article in Russian, I say: this is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”
- Warren Weaver, March 1947 “... as to the problem of mechanical translation, I frankly am afraid that the [semantic] boundaries of words in different languages are too vague ... to make any quasi-mechanical translation scheme very hopeful.”
- Norbert Wiener, April 1947
![Page 16: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/16.jpg)
Spanish/English corpus
1a. Garcia and associates . 1b. Garcia y asociados .
7a. the clients and the associates are enemies . 7b. los clients y los asociados son enemigos .
2a. Carlos Garcia has three associates . 2b. Carlos Garcia tiene tres asociados .
8a. the company has three groups . 8b. la empresa tiene tres grupos .
3a. his associates are not strong . 3b. sus asociados no son fuertes .
9a. its groups are in Europe . 9b. sus grupos estan en Europa .
4a. Garcia has a company also . 4b. Garcia tambien tiene una empresa .
10a. the modern groups sell strong pharmaceuticals . 10b. los grupos modernos venden medicinas fuertes .
5a. its clients are angry . 5b. sus clientes estan enfadados .
11a. the groups do not sell zenzanine . 11b. los grupos no venden zanzanina .
6a. the associates are also angry . 6b. los asociados tambien estan enfadados .
12a. the small groups are not modern . 12b. los grupos pequenos no son modernos .
![Page 17: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/17.jpg)
Spanish/English corpus
1a. Garcia and associates . 1b. Garcia y asociados .
7a. the clients and the associates are enemies . 7b. los clients y los asociados son enemigos .
2a. Carlos Garcia has three associates . 2b. Carlos Garcia tiene tres asociados .
8a. the company has three groups . 8b. la empresa tiene tres grupos .
3a. his associates are not strong . 3b. sus asociados no son fuertes .
9a. its groups are in Europe . 9b. sus grupos estan en Europa .
4a. Garcia has a company also . 4b. Garcia tambien tiene una empresa .
10a. the modern groups sell strong pharmaceuticals . 10b. los grupos modernos venden medicinas fuertes .
5a. its clients are angry . 5b. sus clientes estan enfadados .
11a. the groups do not sell zenzanine . 11b. los grupos no venden zanzanina .
6a. the associates are also angry . 6b. los asociados tambien estan enfadados .
12a. the small groups are not modern . 12b. los grupos pequenos no son modernos .
Translate: Clients do not sell pharmaceuticals in Europe.
![Page 18: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/18.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
![Page 19: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/19.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
![Page 20: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/20.jpg)
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
![Page 21: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/21.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
???
![Page 22: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/22.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
![Page 23: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/23.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
![Page 24: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/24.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
![Page 25: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/25.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
???
![Page 26: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/26.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
![Page 27: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/27.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
process of elimination
![Page 28: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/28.jpg)
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
cognate?
![Page 29: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/29.jpg)
Your assignment, put these words in order: { jjat, arrat, mat, bat, oloat, at-yurp }
Centauri/Arcturan [Knight 97]
1a. ok-voon ororok sprok . 1b. at-voon bichat dat .
7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat .
2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat .
8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat .
3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat .
9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp .
4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat .
10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .
5a. wiwok farok izok stok . 5b. totat jjat quat cat .
11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat .
6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat .
12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat .
zero fertility
![Page 30: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/30.jpg)
“When I look at an article in Russian, I say: this is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”
- Warren Weaver, March 1947
The required statistical tables have millions of entries…? Too much for the computers of Weaver’s day. à Not enough RAM!
![Page 31: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/31.jpg)
IBM Candide Project [Brown et al 93]
French Broken English
English
French/English Bilingual Text
English Text
Statistical Analysis Statistical Analysis
J’ ai si faim
What hunger have I, Hungry I am so, I am so hungry, Have me that hunger …
I am so hungry
![Page 32: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/32.jpg)
Mathematical Formulation
J’ ai si faim I am so hungry
Translation Model P(f | e)
Language Model P(e)
Decoding algorithm argmaxe P(e) · P(f | e)
Given source sentence f:
argmaxe P(e | f) =
argmaxe P(f | e) · P(e) / P(f) = by Bayes Rule
argmaxe P(f | e) · P(e) P(f) same for all e
French Broken English
English
![Page 33: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/33.jpg)
Language Modeling
Goal of a language model for MT:
He is on the soccer field He is in the soccer field
Is table the on cup the The cup is on the table
American shrine American company
Need to make these decisions, because translation model may not have a lot of context information!
![Page 34: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/34.jpg)
Translation Model?
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Source-language morphological analysis Source parse tree Semantic representation Generate target structure
What are all the possible moves and what probability tables control those moves?
Process model of translation:
![Page 35: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/35.jpg)
The Classic Translation Model Word Substitution/Permutation [Brown et al., 1993]
Mary did not slap the green witch
Mary not slap slap slap the green witch
Maria no dió una bofetada a la bruja verde
Mary not slap slap slap NULL the green witch
Maria no dió una bofetada a la verde bruja
Trainable
Process model of translation:
n(3|slap) 50k entries
d(j|i) 2500 entries
P-Null 1 entry
t(la|the) 25m entries
![Page 36: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/36.jpg)
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
All P(french-word | english-word) equally likely
![Page 37: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/37.jpg)
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
“la” and “the” observed to co-occur frequently, so P(la | the) is increased.
![Page 38: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/38.jpg)
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
“maison” co-occurs with both “the” and “house”, but P(maison | house) can be raised without limit, to 1.0,
while P(maison | the) is limited because of “la”
(pigeonhole principle)
![Page 39: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/39.jpg)
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
settling down after another iteration
![Page 40: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/40.jpg)
Unsupervised EM Training
… la maison … la maison bleue … la fleur … … the house … the blue house … the flower …
Inherent hidden structure revealed by EM training! • “A Statistical MT Tutorial Workbook” (Knight, 1999). Promises free beer.
• “The Mathematics of Statistical Machine Translation” (Brown et al, 1993)
• Software: GIZA++
![Page 41: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/41.jpg)
e f P(f | e) national
nationale 0.47 national 0.42 nationaux 0.05 nationales 0.03
the
le 0.50 la 0.21 les 0.16 l’ 0.09 ce 0.02 cette 0.01
farmers
agriculteurs 0.44 les 0.42 cultivateurs 0.05 producteurs 0.02
new French sentence f
P(f | e) · P(e) ! score for e
Translation Model
Language Model
w1 w2 P(w2 | w1) of
the 0.13 a 0.09 another 0.01 some 0.01
hong
kong 0.98 said 0.01 stated 0.01
potential translation e
P(f | e) P(e)
![Page 42: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/42.jpg)
Search for Best Translation
voulez – vous vous taire !
![Page 43: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/43.jpg)
Search for Best Translation
voulez – vous vous taire !
you – you you quiet !
![Page 44: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/44.jpg)
Search for Best Translation
voulez – vous vous taire !
you – you quiet !
![Page 45: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/45.jpg)
Search for Best Translation
voulez – vous vous taire !
quiet you – you you !
![Page 46: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/46.jpg)
Search for Best Translation
voulez – vous vous taire !
shut you – you you !
![Page 47: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/47.jpg)
Search for Best Translation
voulez – vous vous taire !
you shut !
![Page 48: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/48.jpg)
Search for Best Translation
voulez – vous vous taire !
you shut up !
![Page 49: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/49.jpg)
Classic Decoding Algorithm Given f, find the English string e that
maximizes P(e) · P(f | e) NP-Complete [Knight 99]. Brown et al 93: “In this paper, we focus on the
translation modeling problem. We hope to deal with the [decoding] problem in a later paper.”
Beam search can be used instead
![Page 50: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/50.jpg)
Beam Search Decoding [Brown et al US Patent #5,477,451]
1st English word
2nd English word
3rd English word
4th English word
start end
Each partial translation hypothesis contains: - Last English word chosen + source words covered by it - Next-to-last English word chosen - Entire coverage vector (so far) of source sentence - Language model and translation model scores (so far)
all source words
covered
[Jelinek 69; Och, Ueffing, and Ney, 01]
![Page 51: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/51.jpg)
1st English word
2nd English word
3rd English word
4th English word
start end
Each partial translation hypothesis contains: - Last English word chosen + source words covered by it - Next-to-last English word chosen - Entire coverage vector (so far) of source sentence - Language model and translation model scores (so far)
all source words
covered
best predecessor link
[Jelinek 69; Och, Ueffing, and Ney, 01]
Beam Search Decoding [Brown et al US Patent #5,477,451]
![Page 52: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/52.jpg)
Flaws of Word-Based MT
• Can’t translate multiple English words to one French word
• Can’t translate phrases – “real estate”, “note that”, “interest in”
• Isn’t sensitive to syntax – Adjectives/nouns should swap order – Verb comes at the beginning in Arabic
• Doesn’t understand the meaning (?)
![Page 53: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/53.jpg)
The MT Triangle
SOURCE TARGET
words words
syntax syntax
logical form
interlingua
logical form
![Page 54: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/54.jpg)
SOURCE TARGET
Commercial Rule-Based Systems
words words
syntax syntax
logical form
interlingua
logical form
![Page 55: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/55.jpg)
Knight et al 95 - meaning-based translation - composition rules
Language Model
SOURCE TARGET
words words
syntax syntax
logical form
interlingua
logical form
![Page 56: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/56.jpg)
Wu 97, Alshawi 98 - inducing syntactic structure as a by-product of aligning words in bilingual text
SOURCE TARGET
Language Model words words
syntax syntax
logical form
interlingua
logical form
![Page 57: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/57.jpg)
Yamada/Knight (01,02) - tree/string model - used existing target language parser
SOURCE TARGET
Language Model words words
syntax syntax
logical form
interlingua
logical form
![Page 58: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/58.jpg)
Phrases
SOURCE TARGET
phrases phrases
How do you translate “real estate” into French? real estate real number dance number dance card memory card memory stick …
words words
syntax syntax
logical form
interlingua
logical form
![Page 59: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/59.jpg)
Phrase-Based Statistical MT
• Foreign input segmented into phrases – “phrase” just means “word sequence”
• Each phrase is probabilistically translated into English – P(to the conference | zur Konferenz) – P(into the meeting | zur Konferenz)
• Phrases are probabilistically re-ordered See [Koehn et al, 2003] for an overview.
Morgen fliege ich nach Kanada zur Konferenz
Tomorrow I will fly to the conference In Canada
![Page 60: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/60.jpg)
How to Learn the Phrase Translation Table?
• One method: “alignment templates” [Och et al 99]
• Start with word alignment
• Collect all phrase pairs that are consistent with the word alignment
![Page 61: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/61.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
![Page 62: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/62.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
![Page 63: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/63.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (bruja verde, green witch)
![Page 64: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/64.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (bruja verde, green witch) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)
![Page 65: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/65.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) …
![Page 66: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/66.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
Word Alignment Induced Phrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)
![Page 67: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/67.jpg)
Phrase Pair Probabilities
• A certain phrase pair (f-f-f, e-e-e) may appear many times across the bilingual corpus.
• No EM training
• Just relative frequency:
count(f-f-f, e-e-e) P(f-f-f | e-e-e) = ----------------------- count(e-e-e)
![Page 68: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/68.jpg)
Phrase-Based MT • It was the best way to do Statistical MT until very recently • Now syntax starts play the role
![Page 69: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/69.jpg)
The gunman was killed by police . DT NN AUX VBN IN NN NPB PP NP-C VP S
Tree Output
. 击毙 警方 被 枪手
![Page 70: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/70.jpg)
Synchronous CFGs [Chiang, 2005]
• Developed in the 60’s for programming-language compilation [Aho1969]
• In NLP synchronous CFGs have been used for – Machine translation – Semantic interpretation
![Page 71: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/71.jpg)
• Like CFGs, but production have two right hand sides – Source side – Target side – Related through linked non-terminal symbols
• E.g. VP → <V[1] NP[2],NP[2] V[1]> • One-to-one correspondence • Productions applied in parallel to both sides to
linked non-terminals
Synchronous CFGs [Chiang, 2005]
![Page 72: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/72.jpg)
Synchronous CFGs [Chiang, 2005]
![Page 73: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/73.jpg)
• Limitations – No Chomsky normal form
• Has implications for complexity of decoder
– Sister-reordering only
Synchronous CFGs [Chiang, 2005]
![Page 74: Natural Language Models and Interfacesivan-titov.org/teaching/nlmi-15/lecture-9.pdf11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a.](https://reader033.fdocument.pub/reader033/viewer/2022053122/60a8771fff35ba08771b5991/html5/thumbnails/74.jpg)
Summary MT
• An important application • There has been an important progress • Interdisciplinary work
– Natural language processing – Machine learning – Linguistics – Automata theory
• More classes in Masters of AI