Nguyen, Minh Tien and Tran, Duc Vu and Nguyen, Le Minh and Phan, Xuan Hieu (2018) Exploiting User Posts for Web Document Summarization. ACM Transactions on Knowledge Discovery from Data . ISSN 1556-4681 (In Press)
Full text not available from this repository.Abstract
Relevant user posts such as comments or tweets of a Web document provide additional valuable information to enrich the content of this document. When creating user posts, readers tend to borrow salient words or phrases in sentences. This can be considered as word variation. This paper proposes a framework which models the word variation aspect to enhance the quality of Web document summarization. Technically, the framework consists of two steps: scoring and selection. In the first step, the social information of a Web document such as user posts is exploited to model intra-relations and inter-relations in lexical and semantic levels. These relations are denoted by a mutual reinforcement similarity graph used to score each sentence and user post. After scoring, summaries are extracted by using a ranking approach or concept-based method formulated in the form of Integer Linear Programming. To confirm the efficiency of our framework, sentence and story highlight extraction tasks were taken as a case study on three datasets in two languages, English and Vietnamese. Experimental results show that: (i) the framework can improve ROUGE-scores compared to state-of-the-art baselines of social context summarization and (ii) the combination of the two relations benefits the sentence extraction of single Web documents.
Item Type: | Article |
---|---|
Subjects: | Information Technology (IT) Scopus-indexed journals ISI-indexed journals |
Divisions: | Faculty of Information Technology (FIT) |
Depositing User: | A/Prof. Xuan Hieu Phan |
Date Deposited: | 08 Jun 2018 08:51 |
Last Modified: | 08 Jun 2018 08:51 |
URI: | http://eprints.uet.vnu.edu.vn/eprints/id/eprint/2979 |
Actions (login required)
View Item |