eprintid: 2472 rev_number: 3 eprint_status: archive userid: 307 dir: disk0/00/00/24/72 datestamp: 2017-06-10 11:32:27 lastmod: 2017-06-10 11:32:27 status_changed: 2017-06-10 11:32:27 type: article succeeds: 2001 metadata_visibility: show creators_name: Le, Hoang Quynh creators_name: Tran, Mai Vu creators_name: Dang, Thanh Hai creators_name: Ha, Quang Thuy creators_name: Collier, Nigel creators_id: lhquynh@gmail.com creators_id: vutm@vnu.edu.vn creators_id: hai.dang@vnu.edu.vn creators_id: thuyhq@vnu.edu.vn title: Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction ispublished: pub subjects: IT subjects: isi divisions: fac_fit abstract: The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate progress of text mining in facilitating integrative understanding ofchemical substances, diseases and their relations. In this article, we describe an extension of the UET-CAM system for mining chemical-disease relations from text data, of which performance was ranked 4th among 18 participating corresponding systems by the BioCreative CDR track committee. In Disease Named Entity Recognition and Normalization (DNER) phase, our system employs joint learning with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing (SSI) and Skip-gram for named entity normalization (NEN). Crucially, for solving the chemical-induced disease (CID) sub-task, we propose a pipeline that includes a coreference resolution module and a SVM intra-sentence relations extraction model. The former module utilizes a multi-pass sieve to identify inter-sentence references for entities while the latter is trained on both the CDR data and our silverCID corpus with a rich feature set. SilverCID is the silver standard corpus contains more than 50 thousands sentences which are automatically built based on the CTD database in order to provide evidence for the CID relation extraction. We critically evaluated our method on the CDR test set in order to clarify the contribution of our system components. Results show an F1 of 82.44 for the DNER task, and a best performance of F1 58.90 on the CID task. The comparisons also demonstrate the significant contribution of the multi-pass sieve coreference resolution method and the silverCID corpus. date: 2016 date_type: published official_url: http://doi.org/10.1093/database/baw102 id_number: doi:10.1093/database/baw102 full_text_status: public publication: Database volume: 2016 pagerange: baw102 refereed: TRUE issn: 1758-0463 related_url_url: http://database.oxfordjournals.org/content/2016/baw102 citation: Le, Hoang Quynh and Tran, Mai Vu and Dang, Thanh Hai and Ha, Quang Thuy and Collier, Nigel (2016) Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction. Database, 2016 . baw102. ISSN 1758-0463 document_url: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/2472/1/Sieve-based%20coreference%20resolution%20enhances%20semi-supervised%20learning%20model%20for%20chemical-induced%20disease%20relations%20extraction.pdf