Summary of Rule-based Reordering Space in Statistical Machine Translation

17
文献紹介 長岡技術科学大学  自然言語処理研究室 松本宏

description

Summary of Rule-based Reordering Space in Statistical Machine Translation.

Transcript of Summary of Rule-based Reordering Space in Statistical Machine Translation

Page 1: Summary of Rule-based Reordering Space in Statistical Machine Translation

文献紹介長岡技術科学大学  自然言語処理研究室

松本宏

Page 2: Summary of Rule-based Reordering Space in Statistical Machine Translation

文献• Title:

• Rule-based Reordering Space in Statistical Machine Translation

• Author:

• Nicolas P'echeux and Alexander Allauzen and Francois Yvon

• Booktitle:

• Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

• Pages:

• 1800--1806

Page 3: Summary of Rule-based Reordering Space in Statistical Machine Translation

統計機械翻訳において• 並び替え(reordering)は重要

• 並び替え問題には

• 組み合わせ爆発

• 曖昧性

• 可能性高い組み合わせに絞り込むルールが必要

Page 4: Summary of Rule-based Reordering Space in Statistical Machine Translation

フレーズベースでは

• フレーズごとの並び替えが行われる

• フレーズの中での並び替えを考慮

• しかし、枝刈りでの制限された検索空間しかみない

Page 5: Summary of Rule-based Reordering Space in Statistical Machine Translation

本文献の貢献1. n-gram SMT system:

• 2-stepに分ける

1. 並び替え

• ソース文の順列ラティス構築

2. ディコーディング

2. SMT NCODEの紹介

• Crego, Josep, François Yvon, and José Mariño. "Ncode: an open source bilingual n-gram smt toolkit." The Prague Bulletin of Mathematical Linguistics 96 (2011): 49-58.

Page 6: Summary of Rule-based Reordering Space in Statistical Machine Translation

並び替え

アライメント

並び替え

並び替えルール

Page 7: Summary of Rule-based Reordering Space in Statistical Machine Translation

Reordering Rules Extraction

ソース文の語順関連タグの並び並び替え後の語順

順列順列集合

並び替えルールの取得

部分列 に対して

Page 8: Summary of Rule-based Reordering Space in Statistical Machine Translation

Reordering Lattices Generation

文 を基本とするラティスを構築

部分単語列並び替えルール{

1.

2.

に対して部分パスを追加

NCODEが最適beam検索を行う3.

Page 9: Summary of Rule-based Reordering Space in Statistical Machine Translation

Experiment• Data:

• 英仏Basic Traveling Expression Corpus

• 英仏, 英独 NEWS COMMENTARY from WMT’12

• 難しさ: 英独 >>> 英仏 とされている

• SMT tool

• NCODE

• 表記

• m: 翻訳, l: ラティス考慮, u: 目的言語順

Page 10: Summary of Rule-based Reordering Space in Statistical Machine Translation

• oracle: Tromble, Roy W., et al. "Lattice Minimum Bayes-Risk decoding for statistical machine translation." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008.

Page 11: Summary of Rule-based Reordering Space in Statistical Machine Translation
Page 12: Summary of Rule-based Reordering Space in Statistical Machine Translation
Page 13: Summary of Rule-based Reordering Space in Statistical Machine Translation
Page 14: Summary of Rule-based Reordering Space in Statistical Machine Translation

Reordering Space Sizes

Page 15: Summary of Rule-based Reordering Space in Statistical Machine Translation

Reordering Space Sizes

Page 16: Summary of Rule-based Reordering Space in Statistical Machine Translation

汎化

• POSタグを利用しての書き換えルール

• POS(spos): 12 POS タグ

• Enhanced POS(e50pos): 50 POSタグ

• Brown classes(classes): クラスタリング

Page 17: Summary of Rule-based Reordering Space in Statistical Machine Translation

Alternative Tagsets