Nguyen, Minh Thuan and Nguyen, Phuong Thai and Nguyen, Van Vinh and Nguyen Hoang, Minh Cong
(2020)
Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs.
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
.
pp. 207-215.
Abstract
Research on providing machine translation systems for unseen language pairs is gaining increasing attention in recent years. However, the quality of their systems is poor for most language pairs, especially for less-common pairs such as Khmer-Vietnamese. In this paper, we show a simple iterative traininggenerating-filtering-training process that utilizes all available pivot parallel data to generate synthetic data for unseen directions. In addition, we propose a filtering method based on word alignments and the longest parallel phrase to filter out noise sentence pairs in the synthetic data. Experiment results on zero-shot Khmer→Vietnamese and Indonesian→Vietnamese directions show that our proposed model outperforms some strong baselines and achieves a promising result under the zero-resource condition on ALT benchmarks. Besides, the results also indicate that our model can easily improve their quality with a small amount of real parallel data.
Actions (login required)
|
View Item |