11,001 New Features for Statistical Machine Translation

11,001 New Features for Statistical Machine

Translation

David Chiang, Kevin Knight, Wei Wang

报告人：李贤华2009.11.12

主要内容简介 MIRA 训练特征选择实验结果分析讨论

简介所用系统： Hiero ，句法系统所用方法：添加大量特征，用 MIRA 训练实验结果：汉英上 BLEU 分别提高 1.5 ， 1.1

添加的特征多为句法系统才能使用的特征，突出句法系统的优势。

相比 MERT ， MIRA 对于多特征调参更佳。

Baseline

Hiero:

串到串翻译系统， 12 个特征，用 MERT 训练得到权重

句法系统：串到树翻译系统， 25 个特征，用 MERT 训

练得到权重

主要内容简介MIRA训练特征选择实验结果分析讨论

MIRA 训练将新特征加入线性模型，用 MIRA 训练。e ：输出串 h(e): e 的特征向量 w: 特征权重循环如下：1. 选择一组输入句子 f1…fm ，解码2. 对于每个输入句子，选择其 10-best3. 对每个输入句子，选择一个 oracle 翻译，

4. 对于每一个候选翻译，计算损失

5. 更新 w 至 w’ ， w’ 最小化：

对于每个句子，解码器给训练器一个森林，训练器将权重更新后传给解码器。

特征选择 Discount feature

有许多计数为 1 的规则被选中，说明其概率被高估了

使用特征 count 来奖励或者惩罚规则，特征值和规则的计数有关

目标端特征 Rule overlap features 规则之间有交点。有些非终结符作为交点

时更加可靠。对于不同的非终结符做交点的规则，进行不同的奖罚。

Bad single-level rewrite对于一些使用范围很小的有问题的规则进行

惩罚，通过对开发集的观察，得到以下规则需要惩罚：

PP->VBN NP-CPP-BAR->NP-C INVP->NP-C PPCONJP->RB IN

Node count features

对于树中出现的非终结符计数，以免某非终结符出现过多或过少

Insertion features

有些规则会在英文端插入一些单词。对于每一个最可能出现在插入规则中的单词

一个特征。

特征选择源端特征： Soft syntactic constraints

软句法约束，在源句子上建立句法树，奖励那些源端与该句法树相符的规则，惩罚不符的规则

Structural distortion features

设 S 为非终结符覆盖的源语言单词个数， R为是否要调序， P(R|S) 可以在抽取规则的时候计算得到，并可作为模型的一个新特征，由此影响调序。

实验结果使用了 260m 词的汉英双语语料。对于句法系统，重现了 Collins 的 parser ，

以对英语部分产生句法树句法规则的抽取是在 65m 的子训练集上

完成的对于 Hiero ，两个非终结符的规则在 38m

子集上抽取，其余规则在训练集其余部分抽取

训练了 3 个 5 元语言模型 1-> 在整个英文语料上训得，用于两个系

统 2-> 用 10 亿词训得，用于句法系统 3-> 用 20 亿词训得，用于 Hiero 所有语言模型都用 KN 平滑算法

开发集： 2010 个句子测试集： 1994 个句子从 NIST2004 ， 2005 及 GALE program

抽取的新闻语料。 Hiero 用了源端特征，句法系统用了目标

端特征，两个系统都用了折扣特征。

分析

Discount feature:

+ 表示惩罚，- 表示奖励。

Word insertion feature:

Be 动词， a+ the,.,,-

Rule-overlap feature

Weights for generated English nonterminals

结论 1. 新特征即使在顶级翻译系统上也能有所

作为 2.MIRA 优于 MERT 3. 句法系统能利用在其他系统中无法使用

的特征，句法系统和 MIRA 是一个强大的组合

11,001 New Features for Statistical Machine Translation

Documents

Transcript of 11,001 New Features for Statistical Machine Translation

Building a Phrase-based SMT Systemcs.jhu.edu/~kevinduh/notes/building-smt-en-20120510.pdf · 2012-05-11 · 2 Building a Phrase-Based SMT System Phrase-based Statistical Machine Translation

· Web viewBoth proprietary and free, i.e. open-source translation memory systems support common features such as project management, translation memory maintenance, terminology

THAINGUYEN STATISTICAL YEARBOOK 2016 · THAINGUYEN STATISTICAL YEARBOOK 2016

Emending a Translation into “Scrupulous” Translation...KATAOKA Emending a Translation into “Scrupulous” Translation 01105 85 first translation. Omitted parts were fully reinstated

CENTRAL STATISTICAL OFFICE STATISTICAL OFFICE IN … · CENTRAL STATISTICAL OFFICE STATISTICAL OFFICE IN POZNAN MIASTA WOJEWÓDZKIE podstawowe dane statystyczne VOIVODSHIP CITIES

Statistical Phrased-Based Translation

HPE MSR3000 Router Series data sheet · • Embedded security features with hardware-based encryption, stateful firewall, network address translation (NAT), and ... • Zero-touch

Translation Management System Language Translation Tool … · 2016. 1. 20. · TMS Language Translation Tool 1 Translation Management System Overview of the Translation Management

Template for MATLAB EXPO 2019 - MathWorks · Predictive Maintenance Toolbox R2019a Extract, visualize, and rank features from sensor data Use both statistical and dynamic modeling

· Statistical Models for Hierarchical Phrase-based Machine Translation Von der Fakultät für Mathematik, Informatik und Naturwissenschaften der RWTH Aachen University zur Erlangung

PROFESSIONAL TRANSLATION STUDIES · on the linguistic features of E-communication, ... semiotics and marketing, translation theory and practice, ... pragmatics and translation since

Chapter 5 Phrase-based models - Statistical Machine Translation

Phrase-base Statistical Machine Translation

1 Lecture 8: Statistical Alignment & Machine Translation (Chapter 13 of Manning & Schutze) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer Science and Information.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Pre-ordering Methods for Chinese, English, and Japanese ...Pre-ordering Methods for Chinese, English, and Japanese Statistical Machine Translation 連絡先：宮尾祐介（Yusuke

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

Statistical Machine Translation: From Single-Word …Statistical Machine Translation: From Single-Word Models to Alignment Templates Von der Fakultat f¨ ur Mathematik, Informatik¨

Title Improving Statistical Machine Translation with ...repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/217197/2/djohk00617.pdf · Chapter 2 outlines the case study of a state-of-the-art