eprintid: 3413 rev_number: 10 eprint_status: archive userid: 405 dir: disk0/00/00/34/13 datestamp: 2019-01-09 13:56:08 lastmod: 2019-01-09 13:56:08 status_changed: 2019-01-09 13:56:08 type: conference_item metadata_visibility: show creators_name: Nguyen, Minh Thuan creators_name: Bui, Van Tan creators_name: Vu, Huy Hien creators_name: Nguyen, Phuong Thai creators_name: Luong, Chi Mai creators_id: thuannm@vnu.edu.vn creators_id: bvtan@uneti.edu.vn creators_id: hienvuhuy@gmail.com creators_id: thainp@vnu.edu.vn creators_id: lcmai@ioit.ac.vn corp_creators: Department of Computer Science, University of Engineering and Technology, VNU Hanoi corp_creators: Department of Information Technology, University of Economic and Technical Industries corp_creators: Department of Computer Science, University of Engineering and Technology, VNU Hanoi corp_creators: Department of Computer Science, University of Engineering and Technology, VNU Hanoi corp_creators: Department of Language and Speech Processing, Institute of Information Technology title: Enhancing the quality of Phrase-table in Statistical Machine Translation for Less-Common and Low-Resource Languages ispublished: pub subjects: IT divisions: fac_fit abstract: The phrase-table plays an important role in traditional phrase-based statistical machine translation (SMT) system. During translation, a phrase-based SMT system relies heavily on phrase-table to generate outputs. In this paper, we propose two methods for enhancing the quality of phrase-table. The first method is to recompute phrasetable weights by using vector representations similarity. The remaining method is to enrich the phrase-table by integrating new phrase-pairs from an extended dictionary and projections of word vector presentations on the target language space. Our methods produce an attainment of up to 0.21 and 0.44 BLEU scores on in-domain and cross-domain (Asian Language Treebank - ALT) English - Vietnamese datasets respectively. date: 2018 date_type: published full_text_status: public pres_type: paper event_title: International Association of Logopedics and Phoniatrics event_type: conference refereed: FALSE citation: Nguyen, Minh Thuan and Bui, Van Tan and Vu, Huy Hien and Nguyen, Phuong Thai and Luong, Chi Mai (2018) Enhancing the quality of Phrase-table in Statistical Machine Translation for Less-Common and Low-Resource Languages. In: International Association of Logopedics and Phoniatrics. document_url: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/3413/1/30.pdf