relation: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/2979/ title: Exploiting User Posts for Web Document Summarization creator: Nguyen, Minh Tien creator: Tran, Duc Vu creator: Nguyen, Le Minh creator: Phan, Xuan Hieu subject: Information Technology (IT) subject: Scopus-indexed journals subject: ISI-indexed journals description: Relevant user posts such as comments or tweets of a Web document provide additional valuable information to enrich the content of this document. When creating user posts, readers tend to borrow salient words or phrases in sentences. This can be considered as word variation. This paper proposes a framework which models the word variation aspect to enhance the quality of Web document summarization. Technically, the framework consists of two steps: scoring and selection. In the first step, the social information of a Web document such as user posts is exploited to model intra-relations and inter-relations in lexical and semantic levels. These relations are denoted by a mutual reinforcement similarity graph used to score each sentence and user post. After scoring, summaries are extracted by using a ranking approach or concept-based method formulated in the form of Integer Linear Programming. To confirm the efficiency of our framework, sentence and story highlight extraction tasks were taken as a case study on three datasets in two languages, English and Vietnamese. Experimental results show that: (i) the framework can improve ROUGE-scores compared to state-of-the-art baselines of social context summarization and (ii) the combination of the two relations benefits the sentence extraction of single Web documents. publisher: ACM date: 2018 type: Article type: PeerReviewed identifier: Nguyen, Minh Tien and Tran, Duc Vu and Nguyen, Le Minh and Phan, Xuan Hieu (2018) Exploiting User Posts for Web Document Summarization. ACM Transactions on Knowledge Discovery from Data . ISSN 1556-4681 (In Press) relation: https://tkdd.acm.org