Ngoc Khuong Nguyen and Duc Hong Pham and Anh Cuong Le and Hong Thai Pham (2016) Improving Maximum Entropy Part-of-Speech Tagger with Transformation Based Learning Model. In: SW4PHD: the 2016 Scientific Workshop for PhD Students, 26 March 2016, Hanoi.
As well know, part-of-Speech (POS) tagging is basic and central problem of natural language processing branch. Development of a POS tagger will influence several pipelined modules of natural language understanding system including information extraction and retrieval; machine translation; partial parsing and word sense disambiguation. In recent years, there has been a growing interest in data-driven machine learning disambiguation methods for POS tagging with results be very close state-of-art of this problem. Improvement performance of these methods has been posed challenges for researchers. In this paper, we use one of the best machine learning method (Maximum Entropy Model - MEM) to do automatic annotation of part-of-speech in basedline then improve accuracy by using Transformation Based Learning (TBL). Our approach based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only selected to correct the errors are positive on courpus. Experimental results on English Peen Treebank show that our method greatly improve accuracy than the naive baseline model with state-of-the-art accuracy(97.14%). Special, we also do experiments on Vietnamese Viet TreeBank corpus and experimental results show that the proposed Vietnamese POS tagging system outperforms the other state-of-the-art Vietnamese taggers with 93.50% overall accuracy.
|Item Type:||Conference or Workshop Item (Poster)|
|Subjects:||Information Technology (IT)|
|Divisions:||Faculty of Information Technology (FIT)|
|Deposited By:||Dr Ngoc Thang Bui|
|Deposited On:||23 May 2016 03:31|
|Last Modified:||23 May 2016 03:32|
Repository Staff Only: item control page