eprintid: 4604 rev_number: 6 eprint_status: archive userid: 404 dir: disk0/00/00/46/04 datestamp: 2021-08-15 08:42:28 lastmod: 2021-08-15 08:42:28 status_changed: 2021-08-15 08:42:28 type: article metadata_visibility: show creators_name: Nguyen, Minh Thuan creators_name: Nguyen, Phuong Thai creators_name: Nguyen, Van Vinh creators_name: Nguyen Hoang, Minh Cong creators_id: thuannm@vnu.edu.vn creators_id: thainp@vnu.edu.vn creators_id: vinhnv@vnu.edu.vn creators_id: minhcongnguyen1508@gmail.com title: Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs ispublished: pub subjects: IT divisions: fac_fit abstract: Research on providing machine translation systems for unseen language pairs is gaining increasing attention in recent years. However, the quality of their systems is poor for most language pairs, especially for less-common pairs such as Khmer-Vietnamese. In this paper, we show a simple iterative traininggenerating-filtering-training process that utilizes all available pivot parallel data to generate synthetic data for unseen directions. In addition, we propose a filtering method based on word alignments and the longest parallel phrase to filter out noise sentence pairs in the synthetic data. Experiment results on zero-shot Khmer→Vietnamese and Indonesian→Vietnamese directions show that our proposed model outperforms some strong baselines and achieves a promising result under the zero-resource condition on ALT benchmarks. Besides, the results also indicate that our model can easily improve their quality with a small amount of real parallel data. date: 2020-10 date_type: published publisher: Association for Computational Linguistics official_url: https://aclanthology.org/2020.paclic-1.24 id_number: 2020.paclic-1.24 full_text_status: public publication: Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation pagerange: 207-215 refereed: FALSE citation: Nguyen, Minh Thuan and Nguyen, Phuong Thai and Nguyen, Van Vinh and Nguyen Hoang, Minh Cong (2020) Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs. Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation . pp. 207-215. document_url: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/4604/1/2020.paclic-1.24.pdf