eprintid: 2504 rev_number: 17 eprint_status: archive userid: 286 dir: disk0/00/00/25/04 datestamp: 2017-06-14 08:59:55 lastmod: 2017-06-14 08:59:55 status_changed: 2017-06-14 08:59:55 type: article metadata_visibility: show creators_name: Pham, Thi Ngan creators_name: Nguyen, Van Quang creators_name: Tran, Van Hien creators_name: Nguyen, Tri Thanh creators_name: Ha, Quang Thuy creators_id: nganpt.di12@vnu.edu.vn creators_id: ntthanh@vnu.edu.vn creators_id: thuyhq@vnu.edu.vn title: A semi-supervised multi-label classification framework with feature reduction and enrichment ispublished: pub subjects: IT divisions: fac_fit abstract: Multi-label classification has drawn much attention thanks to its usefulness and omnipresence in real-world applications in which objects may be char-acterized by more than one label as in the traditional approach. Getting mul-ti-label examples is costly and time-consuming therefore semi-supervised learning approach should be considered to take advantages of both labeled and unlabeled data. In this work, we propose a semi-supervised multi-label classification algorithm exploiting the specific features of the prominent class label(s) chosen by a greedy approach as an extension of the LIFT algo-rithm, and unlabeled data consumption mechanism from the TESC algo-rithm. We also make a semi-supervised multi-label classification application framework for Vietnamese texts with several feature enrichment steps in-cluding a) a stage of enriching features by adding hidden topic features; b) a stage of dimensional reduction for subtracting irrelevant features. Experi-mental results on a dataset of hotel reviews (for tourism) indicate that a rea-sonable amount of unlabeled data helps to increase the F1 score. Interesting-ly, with a small amount of labeled data, our algorithm can reach a compara-tive performance to the case of using a larger amount of labeled data. date: 2017 date_type: published full_text_status: none publication: Journal of Information and Telecommunication refereed: TRUE citation: Pham, Thi Ngan and Nguyen, Van Quang and Tran, Van Hien and Nguyen, Tri Thanh and Ha, Quang Thuy (2017) A semi-supervised multi-label classification framework with feature reduction and enrichment. Journal of Information and Telecommunication .