VNU-UET Repository

Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs

Nguyen, Minh Thuan and Nguyen, Phuong Thai and Nguyen, Van Vinh and Nguyen Hoang, Minh Cong (2020) Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs. Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation . pp. 207-215.

[img]
Preview
PDF
Download (199kB)

Abstract

Research on providing machine translation systems for unseen language pairs is gaining increasing attention in recent years. However, the quality of their systems is poor for most language pairs, especially for less-common pairs such as Khmer-Vietnamese. In this paper, we show a simple iterative traininggenerating-filtering-training process that utilizes all available pivot parallel data to generate synthetic data for unseen directions. In addition, we propose a filtering method based on word alignments and the longest parallel phrase to filter out noise sentence pairs in the synthetic data. Experiment results on zero-shot Khmer→Vietnamese and Indonesian→Vietnamese directions show that our proposed model outperforms some strong baselines and achieves a promising result under the zero-resource condition on ALT benchmarks. Besides, the results also indicate that our model can easily improve their quality with a small amount of real parallel data.

Item Type: Article
Subjects: Information Technology (IT)
Divisions: Faculty of Information Technology (FIT)
Depositing User: Nguy�n Minh Thuận
Date Deposited: 15 Aug 2021 08:42
Last Modified: 15 Aug 2021 08:42
URI: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/4604

Actions (login required)

View Item View Item