Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU...
Transcript of Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU...
![Page 1: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/1.jpg)
Language Generation with Continuous Outputs
Yulia Tsvetkov
Carnegie Mellon University
August 9, 2018
1
![Page 2: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/2.jpg)
Encoder–Decoder Architectures
Я увидела кОшку
I saw a
</s>
<unk> </s>
2
![Page 3: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/3.jpg)
(Conditional) Language Generation
Task + Data + Language
NLGMachine TranslationSummarizationDialogueCaption GenerationSpeech Recognition. . .
3
![Page 4: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/4.jpg)
(Conditional) Language Generation – 2D
NLP Technologies/Applications
6K World’s Languages
Lemmatization
POS tagging
Parsing
NER
Coref
SRL
...
Summarization
QA
Dialogue
MT
ASR
English
Chinese
Arabic
French
Spanish
Portuguese
Russian... ...
Some European Languages
UN Languages
CzechHindi
Hebrew
Medium-Resourced Languages (dozens)
...
Resource-PoorLanguages
(thousands)
4
![Page 5: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/5.jpg)
(Conditional) Language Generation – 3DNLP Technologies/Applications
6K World’s Languages
Lemmatization
POS tagging
Parsing
NER
Coref
SRL
...
Summarization
QA
Dialogue
MT
ASR
English
Chinese
Arabic
French
Spanish
Portuguese
Russian... ...
Some European Languages
UN Languages
CzechHindi
Hebrew
Medium-Resourced Languages (dozens)
...
Resource-PoorLanguages
(thousands)
Data Domains
BibleParliamentary proceedings
NewswireWikipedia
Novels
TwitterTED talks
Telephone conversations
...
5
![Page 6: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/6.jpg)
NLP 6= Task + Data
The common misconception is that languagehas to do with words and what they mean.
It doesn’t.It has to do with people and what they mean.
Herbert H. Clark & Michael F. Schober, 1992+ Dan Jurafsky’s keynote at CVPR’17 and EMNLP’17
6
![Page 7: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/7.jpg)
(Conditional) Language Generation – ∞DNLP Technologies/Applications
6K World’s Languages
Lemmatization
POS tagging
Parsing
NER
Coref
SRL
...
Summarization
QA
Dialogue
MT
ASR
English
Chinese
Arabic
French
Spanish
Portuguese
Russian... ...
Some European Languages
UN Languages
CzechHindi
Hebrew
Medium-Resourced Languages (dozens)
...
Resource-PoorLanguages
(thousands)
Data Domains
BibleParliamentary proceedings
NewswireWikipedia
Novels
TwitterTED talks
Telephone conversations
...
7
![Page 8: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/8.jpg)
OutlineNLP Technologies/Applications
6K World’s Languages
Lemmatization
POS tagging
Parsing
NER
Coref
SRL
...
Summarization
QA
Dialogue
MT
ASR
English
Chinese
Arabic
French
Spanish
Portuguese
Russian... ...
Some European Languages
UN Languages
CzechHindi
Hebrew
Medium-Resourced Languages (dozens)
...
Resource-PoorLanguages
(thousands)
Data Domains
BibleParliamentary proceedings
NewswireWikipedia
Novels
TwitterTED talks
Telephone conversations
...
Part 1
Part 2
Part 38
![Page 9: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/9.jpg)
(Conditional) Language Generation – ∞DNLP Technologies/Applications
6K World’s Languages
Lemmatization
POS tagging
Parsing
NER
Coref
SRL
...
Summarization
QA
Dialogue
MT
ASR
English
Chinese
Arabic
French
Spanish
Portuguese
Russian... ...
Some European Languages
UN Languages
CzechHindi
Hebrew
Medium-Resourced Languages (dozens)
...
Resource-PoorLanguages
(thousands)
Data Domains
BibleParliamentary proceedings
NewswireWikipedia
Novels
TwitterTED talks
Telephone conversations
...
9
![Page 10: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/10.jpg)
Language Generation with Continuous Outputs
a b c </s> a b c
</s>a b c
10
![Page 11: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/11.jpg)
Я увидела кОшку
I saw a
</s>
<unk> </s>
11
![Page 12: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/12.jpg)
Rare Words Are Common in Language
By SergioJimenez - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45516736 12
![Page 13: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/13.jpg)
Softmax
h
W
softmax
wi
Multinomial distribution over discrete and mutually exclusivealternativesHigh computational and memory complexityVocabulary size is limited to a small fraction of words plus <unk>Words are represented as 1-hot vectors
13
![Page 14: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/14.jpg)
Alternatives to Softmax
Sampling-based approximationsÏ Importance Sampling: evaluate the denominator over a subsetÏ Noise Contrastive Estimation: convert to a proxy binary classificationproblem
Ï . . .Structure-based approximations
Ï Differentiated Softmax: divide the vocabulary to multiple classes; firstpredict a class, then predict a word of the class
Ï Hierarchical Softmax: binary tree with words as leavesÏ . . .
Subword UnitsÏ Byte Pair Encoding (BPE) (Sennrich et al. ‘2016)
14
![Page 15: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/15.jpg)
Alternatives to Softmax
SamplingBased
StructureBased
SubwordUnits
TrainingTime î î î
TestTime ¦ ¦ î
Accuracy h h îMemory ¦ h îVery LargeVocab h h î
15
![Page 16: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/16.jpg)
Our Proposal: No Softmax
Я увидела кОшку
I saw a
</s>
<unk> </s>
16
![Page 17: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/17.jpg)
Our Proposal
Word Embedding O(D) [300-500]
Softmax O(V) [16K-50K]
Represent each word by it’s pre-trained embedding instead of a 1-hotvector
17
![Page 18: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/18.jpg)
Seq2Seq with Continuous Outputs
Я увидела кОшку
I saw a
</s>
cAt
At each time-step t , generate the word’s embedding instead of aprobability distribution over the vocabulary.Training (next slides)Decoding: kNN
18
![Page 19: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/19.jpg)
Training Seq2Seq with Continuous Outputs:Empirical Losses
Euclidean LossLL2 = ‖e−e(w)‖2
Cosine LossLcosine = 1− eT e(w)
‖e‖.‖e(w)‖
Max-Margin LossLmm =∑
w ′∈V ,w ′ 6=w max{0,γ+cos(e,e(w ′))−cos(e,e(w))}
19
![Page 20: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/20.jpg)
Training Seq2Seq with Continuous Outputs:Probabilistic Loss
von Mises Fisher (vMF) Distributionp(e(w);µ,κ) =Cm(κ)eκµ
T e(w)
We use κ= ‖e‖,p(e(w); e) =Cm(‖e‖)e eT e(w)
vMF LossLNLLvMF =− log(Cm‖e‖)− eT e(w)
+regularizationLNLLvMF−reg1 =− logCm(‖e‖)− eT e(w)+λ1‖e‖
LNLLvMF−reg2 =− logCm(‖e‖)−λ2eT e(w)
20
![Page 21: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/21.jpg)
Training Seq2Seq with Continuous Outputs:Research Questions
Я увидела кОшку
I saw a
</s>
cAt
Objective function: empirical and probabilistic lossesEmbeddings: word2vec, fasttext, syntactic, morphological, ELMO, etc.Attention: words vs. BPE in the inputDecoding: scheduled sampling; kNN approximations; beam search;post-processing with LMsOOVs: scheduled sampling; tied embeddings
21
![Page 22: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/22.jpg)
Training Seq2Seq with Continuous Outputs:Research Questions
Я увидела кОшку
I saw a
</s>
cAt
Objective function: empirical and probabilistic lossesEmbeddings: word2vec, fasttext, syntactic, morphological, ELMO, etc.Attention: words vs. BPE in the inputDecoding: scheduled sampling; kNN approximations; beam search;interpolation with LMsOOVs: scheduled sampling; tied embeddings
22
![Page 23: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/23.jpg)
Experimental Setup
IWSLTfr–en
tst2015+tst2016train 220Kdev 2.3Ktest 2.2K
Stronger Baselines for Trustable Results in NMT (Denkowski &Neubig ‘17)BLEU50K word vocab; 16K BPE vocab300-dimensional embeddingsMore setups in the paper: IWSLT de–en, IWSLT en–fr, WMT de–en
23
![Page 24: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/24.jpg)
Translation Quality
Source Type/Target Type Loss BLEU
fr–en
word → word cross-entropy 30.98word → BPE cross-entropy 29.06BPE → BPE cross-entropy 31.44
BPE → word2vec L2 16.78BPE → word2vec cosine 26.92
word → word2vec L2 27.16word → word2vec cosine 29.14word → word2vec max-margin 29.56
word → fasttext max-margin 30.98word → fasttext + tied max-margin 32.12word → fasttext NLLvMFreg1+reg2 30.38word → fasttext + tied NLLvMFreg1+reg2 31.63
24
![Page 25: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/25.jpg)
Translation Quality
Source Type/Target Type Loss BLEU
fr–en
word → word cross-entropy 30.98word → BPE cross-entropy 29.06BPE → BPE cross-entropy 31.44
BPE → word2vec L2 16.78BPE → word2vec cosine 26.92
word → word2vec L2 27.16word → word2vec cosine 29.14word → word2vec max-margin 29.56
word → fasttext max-margin 30.98word → fasttext + tied max-margin 32.12word → fasttext NLLvMFreg1+reg2 30.38word → fasttext + tied NLLvMFreg1+reg2 31.63
24
![Page 26: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/26.jpg)
Translation Quality
Source Type/Target Type Loss BLEU
fr–en
word → word cross-entropy 30.98word → BPE cross-entropy 29.06BPE → BPE cross-entropy 31.44
BPE → word2vec L2 16.78BPE → word2vec cosine 26.92
word → word2vec L2 27.16word → word2vec cosine 29.14word → word2vec max-margin 29.56
word → fasttext max-margin 30.98word → fasttext + tied max-margin 32.12word → fasttext NLLvMFreg1+reg2 30.38word → fasttext + tied NLLvMFreg1+reg2 31.63
24
![Page 27: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/27.jpg)
Translation Quality
Source Type/Target Type Loss BLEU
fr–en
word → word cross-entropy 30.98word → BPE cross-entropy 29.06BPE → BPE cross-entropy 31.44
BPE → word2vec L2 16.78BPE → word2vec cosine 26.92
word → word2vec L2 27.16word → word2vec cosine 29.14word → word2vec max-margin 29.56
word → fasttext max-margin 30.98word → fasttext + tied max-margin 32.12word → fasttext NLLvMFreg1+reg2 30.38word → fasttext + tied NLLvMFreg1+reg2 31.63
24
![Page 28: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/28.jpg)
Translation Quality
Source Type/Target Type Loss BLEU
fr–en
word → word cross-entropy 30.98word → BPE cross-entropy 29.06BPE → BPE cross-entropy 31.44
BPE → word2vec L2 16.78BPE → word2vec cosine 26.92
word → word2vec L2 27.16word → word2vec cosine 29.14word → word2vec max-margin 29.56
word → fasttext max-margin 30.98word → fasttext + tied max-margin 32.12word → fasttext NLLvMFreg1+reg2 30.38word → fasttext + tied NLLvMFreg1+reg2 31.63
24
![Page 29: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/29.jpg)
Translation Quality
Source Type/Target Type Loss BLEU
fr–en
word → word cross-entropy 30.98word → BPE cross-entropy 29.06BPE → BPE cross-entropy 31.44
BPE → word2vec L2 16.78BPE → word2vec cosine 26.92
word → word2vec L2 27.16word → word2vec cosine 29.14word → word2vec max-margin 29.56
word → fasttext max-margin 30.98word → fasttext + tied max-margin 32.12word → fasttext NLLvMFreg1+reg2 30.38word → fasttext + tied NLLvMFreg1+reg2 31.63
24
![Page 30: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/30.jpg)
Training Time & Memory
* 1 GeForce GTX TITAN X GPU
Softmaxbaseline
BPEbaseline
NLLvMFbest model
fr–en 4h 4.5h 1.9hde–en 3h 3.5h 1.5hen–fr 1.8 2.8h 1.3hWMT de-en 4.3d 4.5d 1.6d
# Parametersin the Output Layer
Softmax 51.2M (1.0x)BPE 16.384M (0.32x)NLLvMF 307.2K (0.006x)
25
![Page 31: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/31.jpg)
Training Time & Memory
26
![Page 32: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/32.jpg)
Encoder–Decoder with Continuous OutputsConvergence Time
13
18
23
28
33
38
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
13
18
23
28
33
38
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
27
![Page 33: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/33.jpg)
Encoder–Decoder with Continuous OutputsOutput Example
GOLDAn education is critical , but tackling this problem is going to requireeach and everyone of us to step up and be better role models forthe women and girls in our own lives .BPE2BPEeducation is critical , but it’s going to require that each of us will comein and if you do a better example for women and girls in our lives .WORD2FASTTEXTeducation is critical , but fixed this problem is going to require thatall of us engage and be a better example for women and girls inour lives .
28
![Page 34: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/34.jpg)
Seq2Seq with Continuous Outputs
SamplingBased
StructureBased
SubwordUnits Semfit
TrainingTime î î î îTestTime ¦ ¦ î î
Accuracy h h î îMemory ¦ h î îHandle VeryLargeVocab
h h î î
29
![Page 35: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/35.jpg)
Encoder–Decoder with Continuous OutputsFuture Research Questions
Я увидела кОшку
I saw a
</s>
cAt
DecodingTranslation into morphologically-rich languagesLow-resource NMTMore generation tasks, e.g. style transfer with GANs
30
![Page 36: Language Generation with Continuous Outputs · TranslationQuality SourceType/ TargetType Loss BLEU fr–en word! word cross-entropy 30.98 word! BPE cross-entropy 29.06 BPE! BPE cross-entropy](https://reader035.fdocument.pub/reader035/viewer/2022063016/5fd6a3fdd0d9565d2a410e54/html5/thumbnails/36.jpg)
31