Dual Learning for Machine Translation (NIPS 2016)

“Dual Learning for Machine Translation”Di He et al.

2016年1⽉Toru Fujino

東⼤新領域⼈間環境学陳研究室 D1

Paper information

• Authors: Di He et al. (Microsoft Research Asia)• Conference: NIPS 2016• Date: 11/01/2016 (arxiv)• Times cited: 1

Overview

• What• Introduce an autoencoder-like mechanism, “Dual learning”,

to utilize monolingual datasets

• Results• Dual Learning with 10% data ≈ Baseline model with 100% data

1)“DualLearning:ANewLearningParadigm”,https://www.youtube.com/watch?v=HzokNo3g63E&feature=youtu.be

Neural machine translation

• Learn conditional probability 𝑃(𝑦|𝑥; Θ) from a input 𝑥 = {𝑥,, 𝑥., … , 𝑥01} to an output 𝑦 = {𝑦,, 𝑦., … , 𝑦03}

• Maximize the log probability

Θ∗ = argmax ; ;log 𝑃(𝑦>|𝑦?>, 𝑥; Θ)03

B,C ∈E

Difficulty in getting large bilingual data

• Solution: utilization of monolingual data• Train a language model of the target language, and then

integrate it with the MT model1)2)

<- does not fundamentally address the shortage ofparallel data.

• Generate pesudo bilingual data from monolingual data3)4)

<- no guarantee on the quality of the pesudo bilingual data

1) T.Brants etal.,“Largelanguagemodelsinmachinetranslation”,EMNLP20072) C.Gucehre etal.,“Onusingmonolingualcorporainneuralmachinetranslation”,arix 20153) R.Sennrich etal.,“Improvingneuralmachinetranslationmodelswithmonolingualdata”,ACL20164) N.Ueffing etal.,“Semi-supervisedmodeladaptationforstatisticalmachinetranslation”,MachineTranslationJournal2008

Dual learning algorithm

• Use monolingual datasets to train translation models through dual learning

• Things required𝐷G:corpus of language A𝐷I: corpus of language B (not necessarily aligned with 𝐷G)𝑃(. |𝑠; ΘGI): translation model from A to B𝑃(. |𝑠; 𝛩IG): translation model from B to A𝐿𝑀G . : learned language model of A𝐿𝑀I . : learned language model of B

1. Generate 𝐾 translated sentences𝑠PQR,,, 𝑠PQR,., … , 𝑠PQR,S

from 𝑃 . 𝑠; ΘTU based on beam search

1. Generate 𝐾 translated sentences𝑠PQR,,, 𝑠PQR,., … , 𝑠PQR,S

from 𝑃 . 𝑠; ΘTU based on beam search2. Compute intermediate rewards

𝑟,,,, 𝑟,,., … , 𝑟,,Sfrom 𝐿𝑀I(𝑠PQR,W) for each sentence as

𝑟,,W = 𝐿𝑀I(𝑠PQR,W)

3. Get communication rewards𝑟.,,, 𝑟.,., … , 𝑟.,W

for each sentence as 𝑟.,W = ln 𝑃(𝑠|𝑠PQR,W; ΘUT)

3. Get communication rewards𝑟.,,, 𝑟.,., … , 𝑟.,W

for each sentence as 𝑟.,W = ln 𝑃(𝑠|𝑠PQR,W; ΘUT)4. Set the total reward of k-th sentence as

𝑟W = 𝛼𝑟,,W + 1 − 𝛼 𝑟.,W

5. Compute the stochastic gradient of ΘGI and ΘTU𝛻^_`𝐸 𝑟 =

1𝐾 ;[𝑟W∇TU ln 𝑃(𝑠PQR,W|𝑠; ΘGI)]

𝛻^`_𝐸 𝑟 =1𝐾 ;[(1 − 𝛼)∇IG ln 𝑃(𝑠PQR,W|𝑠; ΘIG)]

5. Compute the stochastic gradient of ΘGI and ΘTU𝛻^_`𝐸 𝑟 =

1𝐾 ;[𝑟W∇TU ln 𝑃(𝑠PQR,W|𝑠; ΘGI)]

𝛻^`_𝐸 𝑟 =1𝐾 ;[(1 − 𝛼)∇IG ln 𝑃(𝑠PQR,W|𝑠; ΘIG)]

W@,6. Update model parameters

ΘGI ← ΘGI + 𝛾,∇g_`𝐸[𝑟]ΘIG ← ΘIG + 𝛾.∇g`_𝐸[𝑟]

Experiment settings

• Baseline models• Bahdanau et al., “Neural Machine Translation by Jointly

Learning to Align and Translate”

• Sennrich et al., “Improving Neural Machine Translation Models with Monolingual Data”

Dataset

• WMTʼ14• 12M sentence pairs• English -> French, French -> English

• Data usage (for dual learning)• Small

1. Train translation models with 10% bilingual data.2. Train translation models with 10% bilingual data and

monolingual data through dual learning algorithm.3. Train translation models only with monolingual data through dual

learning algorithm.• Large

1. Train translation models with 100% bilingual data.2. Train translation models with 100% bilingual data.3. Train translation models only with monolingual data through dual

learning algorithm.

Evaluation

• BLEU: geometric mean of n-gram precision

Results

• Outperform the base line models• In Fr->En, dual learning with 10% data ≈ baseline

models with 100% data.• Dual learning is effective especially in a small dataset.

Results

• For different source sentence length• Improvement is significant for long sentences.

Results

• Reconstruction performance (BLEU)• Huge improvement from baseline models, especially in

En->Fr-En(S)

Results

• Reconstruction examples

Future extensions & words

• Application in other domains

• Generalization of dual learning• Dual -> Triple -> … -> n-loop

• Learn from scratch• only with monolingual data• maybe plus lexical dictionary

Application Primaltask Dualtask

Speechprocessing Speech recognition Texttospeech

Imageunderstanding Imagecaptioning Imagegeneration

Conversation engine Question Response

Search engine Search Query/Keywordsuggestion

Summary

• What• Introduce “Dual learning algorithm” to utilize

monolingual data• Results

• With 100% data, the model outperforms the baseline models

• With 10% data, the model shows the comparable result with the baseline models

• Future• Dual learning mechanism can be applied to other

domains• Learn from scratch

Some notes

• Dual Learning does not learn word-to-word correspondences?• Training from bilingual data is a must?• Or lexical dictionary

Appendix: Stochastic gradient of models

Dual Learning for Machine Translation (NIPS 2016)

Technology

Transcript of Dual Learning for Machine Translation (NIPS 2016)

ACTIVATE! B2 Dutch translation - pearsoneducation.nlB2_Dutch.pdf · ACTIVATE! B2 Dutch translation Photocopiable wordlist English headword Pronunciation Dutch translation Example

Translation - Kiky

In Translation - Reflections, Refractions, Transformations (Benjamins Translation Library)

NIPS 2010 読む会

English Translation

Binary Translation

El paisaje dual / Dual Landscape

Audiovisual Translation

PENG-00016 BEI-PP3 SPT NIPS 10-2019...NIPS CKRA perusahaan Tercatat PT Golden Plantation T bk PT Sigmagold Inti Perkasa Tbk. PT Sugih Energy T bk. PT Evergreen Invesco Tbk. PT Nipress

[DL輪読会]Learning What and Where to Draw (NIPS’16)

N.EXTECHS I.NDOOR P.OSITIONING S - Tenenga · N.EXTECHS I.NDOOR P.OSITIONING S.YSTEM NIPS – AN ULTRA WIDE BAND REAL TIME POSITIONING SYSTEM . WHAT NIPS IS AND HOW IT WORKS NIPS

Physics of Information - NiPS) Lab

Keynes Translation

WALL MONO & DUAL - Schede tecniche · WALL MONO & DUAL WALL MONO & DUAL MODELLO UNITA’ INTERNA DUAL / DUAL INDOOR UNIT MODEL WALL DUAL 9000 UI WALL DUAL 12000 UI Freddo Cooling

Translation orsay

Layer Normalization@NIPS+読み会・関西

Translation as Politics: The Translation of Sadako Kurihara’s ......For example, Tejaswini Niranjana questions the collusiveness of translation with colonial power in Siting Translation:

Hindi translation

Translation Management System Language Translation Tool … · 2016. 1. 20. · TMS Language Translation Tool 1 Translation Management System Overview of the Translation Management

Garam Translation