VNU-UET Repository

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction

Hoang Quynh Le and Mai Vu Tran and Thanh Hai Dang and Quang Thuy Ha and Nigel Collier (2016) Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction. Database, 2016 . baw102. ISSN 1758-0463

This is the latest version of this item.

[img] PDF
806kB

Official URL: http://doi.org/10.1093/database/baw102

Abstract

The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate progress of text mining in facilitating integrative understanding ofchemical substances, diseases and their relations. In this article, we describe an extension of the UET-CAM system for mining chemical-disease relations from text data, of which performance was ranked 4th among 18 participating corresponding systems by the BioCreative CDR track committee. In Disease Named Entity Recognition and Normalization (DNER) phase, our system employs joint learning with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing (SSI) and Skip-gram for named entity normalization (NEN). Crucially, for solving the chemical-induced disease (CID) sub-task, we propose a pipeline that includes a coreference resolution module and a SVM intra-sentence relations extraction model. The former module utilizes a multi-pass sieve to identify inter-sentence references for entities while the latter is trained on both the CDR data and our silverCID corpus with a rich feature set. SilverCID is the silver standard corpus contains more than 50 thousands sentences which are automatically built based on the CTD database in order to provide evidence for the CID relation extraction. We critically evaluated our method on the CDR test set in order to clarify the contribution of our system components. Results show an F1 of 82.44 for the DNER task, and a best performance of F1 58.90 on the CID task. The comparisons also demonstrate the significant contribution of the multi-pass sieve coreference resolution method and the silverCID corpus.

Item Type:Article
Subjects:Information Technology (IT)
ISI/Scopus-indexed journals
Divisions:Faculty of Information Technology (FIT)
ID Code:2472
Deposited By: Hà Quang Thụy
Deposited On:10 Jun 2017 11:32
Last Modified:10 Jun 2017 11:32

Available Versions of this Item

Repository Staff Only: item control page