relation: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/4604/ title: Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs creator: Nguyen, Minh Thuan creator: Nguyen, Phuong Thai creator: Nguyen, Van Vinh creator: Nguyen Hoang, Minh Cong subject: Information Technology (IT) description: Research on providing machine translation systems for unseen language pairs is gaining increasing attention in recent years. However, the quality of their systems is poor for most language pairs, especially for less-common pairs such as Khmer-Vietnamese. In this paper, we show a simple iterative traininggenerating-filtering-training process that utilizes all available pivot parallel data to generate synthetic data for unseen directions. In addition, we propose a filtering method based on word alignments and the longest parallel phrase to filter out noise sentence pairs in the synthetic data. Experiment results on zero-shot Khmer→Vietnamese and Indonesian→Vietnamese directions show that our proposed model outperforms some strong baselines and achieves a promising result under the zero-resource condition on ALT benchmarks. Besides, the results also indicate that our model can easily improve their quality with a small amount of real parallel data. publisher: Association for Computational Linguistics date: 2020-10 type: Article type: NonPeerReviewed format: application/pdf language: en identifier: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/4604/1/2020.paclic-1.24.pdf identifier: Nguyen, Minh Thuan and Nguyen, Phuong Thai and Nguyen, Van Vinh and Nguyen Hoang, Minh Cong (2020) Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs. Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation . pp. 207-215. relation: https://aclanthology.org/2020.paclic-1.24 relation: 2020.paclic-1.24