VNU-UET Repository

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction

Le, Hoang Quynh and Tran, Mai Vu and Dang, Thanh Hai and Ha, Quang Thuy and Collier, Nigel (2016) Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction. Database, 2016 . baw102. ISSN 1758-0463

WarningThere is a more recent version of this item available.
[img] PDF
Download (806kB)


The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate progress of text mining in facilitating integrative understanding ofchemical substances, diseases and their relations. In this article, we describe an extension of the UET-CAM system for mining chemical-disease relations from text data, of which performance was ranked 4th among 18 participating corresponding systems by the BioCreative CDR track committee. In Disease Named Entity Recognition and Normalization (DNER) phase, our system employs joint learning with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing (SSI) and Skip-gram for named entity normalization (NEN). Crucially, for solving the chemical-induced disease (CID) sub-task, we propose a pipeline that includes a coreference resolution module and a SVM intra-sentence relations extraction model. The former module utilizes a multi-pass sieve to identify inter-sentence references for entities while the latter is trained on both the CDR data and our silverCID corpus with a rich feature set. SilverCID is the silver standard corpus contains more than 50 thousands sentences which are automatically built based on the CTD database in order to provide evidence for the CID relation extraction. We critically evaluated our method on the CDR test set in order to clarify the contribution of our system components. Results show an F1 of 82.44 for the DNER task, and a best performance of F1 58.90 on the CID task. The comparisons also demonstrate the significant contribution of the multi-pass sieve coreference resolution method and the silverCID corpus.

Item Type: Article
Subjects: Information Technology (IT)
ISI-indexed journals
Divisions: Faculty of Information Technology (FIT)
Depositing User: Hà Quang Thụy
Date Deposited: 07 Dec 2016 08:24
Last Modified: 07 Dec 2016 08:26

Available Versions of this Item

Actions (login required)

View Item View Item