eprintid: 2979 rev_number: 10 eprint_status: archive userid: 290 dir: disk0/00/00/29/79 datestamp: 2018-06-08 08:51:26 lastmod: 2018-06-08 08:51:26 status_changed: 2018-06-08 08:51:26 type: article metadata_visibility: show creators_name: Nguyen, Minh Tien creators_name: Tran, Duc Vu creators_name: Nguyen, Le Minh creators_name: Phan, Xuan Hieu creators_id: tiennm@jaist.ac.jp creators_id: vu.tran@jaist.ac.jp creators_id: nguyenml@jaist.ac.jp creators_id: hieupx@vnu.edu.vn title: Exploiting User Posts for Web Document Summarization ispublished: inpress subjects: IT subjects: Scopus subjects: isi divisions: fac_fit abstract: Relevant user posts such as comments or tweets of a Web document provide additional valuable information to enrich the content of this document. When creating user posts, readers tend to borrow salient words or phrases in sentences. This can be considered as word variation. This paper proposes a framework which models the word variation aspect to enhance the quality of Web document summarization. Technically, the framework consists of two steps: scoring and selection. In the first step, the social information of a Web document such as user posts is exploited to model intra-relations and inter-relations in lexical and semantic levels. These relations are denoted by a mutual reinforcement similarity graph used to score each sentence and user post. After scoring, summaries are extracted by using a ranking approach or concept-based method formulated in the form of Integer Linear Programming. To confirm the efficiency of our framework, sentence and story highlight extraction tasks were taken as a case study on three datasets in two languages, English and Vietnamese. Experimental results show that: (i) the framework can improve ROUGE-scores compared to state-of-the-art baselines of social context summarization and (ii) the combination of the two relations benefits the sentence extraction of single Web documents. date: 2018 date_type: published publisher: ACM official_url: https://tkdd.acm.org full_text_status: none publication: ACM Transactions on Knowledge Discovery from Data refereed: TRUE issn: 1556-4681 funders: Vietnam National University, Hanoi projects: QG.15.29 citation: Nguyen, Minh Tien and Tran, Duc Vu and Nguyen, Le Minh and Phan, Xuan Hieu (2018) Exploiting User Posts for Web Document Summarization. ACM Transactions on Knowledge Discovery from Data . ISSN 1556-4681 (In Press)