VNU-UET Repository

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relations extraction

Hoang Quynh Le and Mai Vu Tran and Thanh Hai Dang and Quang Thuy Ha and Collier Nigel (2016) Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relations extraction. In: SW4PHD: the 2016 Scientific Workshop for PhD Students, 26 March 2016, Hanoi.

[img] PDF
1MB

Abstract

The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate progress of text mining in facilitating integrative understanding of chemical substances, diseases and their relations. In this article, we describe an extension of the UET-CAM system for mining chemical- disease relations from text data, of which performance was ranked 4th among 18 participating corresponding systems by the BioCreative CDR track committee. In Disease Named Entity Recognition and Normalization (DNER) phase, our system employs joint learning with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing (SSI) and Skip-gram for named entity normalization (NEN). Crucially, for solving the chemical-induced disease (CID) sub-task, we propose a pipeline that includes a coreference resolution module and a SVM intra-sentence relations extraction model. The former module utilizes a multi-pass sieve to identify inter-sentence references for entities while the latter is trained on both the CDR data and our silverCID corpus with a rich feature set. SilverCID is the silver standard corpus contains more than 50 thousands sentences which are automatically built based on the CTD database in order to provide evidence for the CID relation extraction. We critically evaluated our method on the CDR test set in order to clarify the contribution of our system components. Results show an F1 of 82.44 for the DNER task, and a best performance of F1 58.90 on the CID task. The comparisons also demonstrate the significant contribution of the multi-pass sieve coreference resolution method and the silverCID corpus.

Item Type:Conference or Workshop Item (Poster)
Subjects:Information Technology (IT)
Divisions:Faculty of Information Technology (FIT)
ID Code:1525
Deposited By: Dr Ngoc Thang Bui
Deposited On:23 May 2016 07:38
Last Modified:23 May 2016 07:38

Repository Staff Only: item control page