VNU-UET Repository: No conditions. Results ordered -Date Deposited.

VNU-UET Repository: No conditions. Results ordered -Date Deposited. 2024-07-27T16:32:44Z EPrints http://eprints.uet.vnu.edu.vn/images/sitelogo.png https://eprints.uet.vnu.edu.vn/eprints/ 2015-07-09T14:53:50Z 2015-07-09T14:53:50Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1180 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1180 2015-07-09T14:53:50Z An Efficient Framework for Extracting Parallel Sentences from Non-Parallel Corpora Cuong Hoang Anh Cuong Le cuongla@vnu.edu.vn Phuong Thai Nguyen thainp@vnu.edu.vn Bao Son Pham sonpb@vnu.edu.vn Tu Bao Ho 2013-01-08T07:43:51Z 2015-05-22T08:07:06Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/114 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/114 2013-01-08T07:43:51Z Exploiting Non-Parallel Corpora for Statistical Machine Translation

Constructing a corpus of parallel sentence pairs is an important work in building a Statistical Machine Translation system. It impacts deeply how the quality of a Statistical Machine Translation could achieve. The more parallel sentence pairs we use to train the system, the better translation's quality it is. Nowadays, comparable non-parallel corpora become important resources to alleviate scarcity of parallel corpora. The problem here is how to extract parallel sentence pairs automatically but accurately from comparable non-parallel corpora, which are usually very "noisy". This paper presents how we can apply the reinforcement-learning scheme with our new proposed algorithm for detecting parallel sentence pairs. We specify that from an initial set of parallel sentences in a domain, the proposed model can extract a large number of new parallel sentence pairs from non-parallel corpora resources in different domains, concurrently increasing the system's translation ability gradually.

Cuong Hoang Anh Cuong Le cuongla@vnu.edu.vn Phuong Thai Nguyen thainp@vnu.edu.vn Tu Bao Ho 2013-01-03T04:00:38Z 2015-05-22T04:49:43Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/98 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/98 2013-01-03T04:00:38Z A Systematic Comparison between Various Statistical Alignment Models for Statistical English-Vietnamese Phrase-Based Translation

In statistical phrase-based machine translation, the step of phrase learning heavily relies on word alignments. This paper provides a systematic comparison of applying various statistical alignment models for statistical English-Vietnamese phrase-based machine translation. We will also invest a heuristic method for elevating the translation quality of using higher word-alignment models by improving the quality of lexical modelling. In detail, we will experimentally show that taking up the lexical translation seems to be an appropriate approach to force "higher" word-based translation models be able to efficiently "boost" their merits. We hope this work will be a reliable comparison benchmark for other studies on using and improving the statistical alignment models for English-Vietnamese machine translation systems.

Cuong Hoang Anh Cuong Le cuongla@vnu.edu.vn Bao Son Pham sonpb@vnu.edu.vn