eprintid: 3296 rev_number: 10 eprint_status: archive userid: 299 dir: disk0/00/00/32/96 datestamp: 2018-12-17 02:59:18 lastmod: 2018-12-17 02:59:18 status_changed: 2018-12-17 02:59:18 type: article metadata_visibility: show creators_name: Cao, Thinh creators_name: Yamada, Koichi creators_name: Unehara, Muneyuki creators_name: Suzuki, Izumi creators_name: Nguyen, Do Van creators_id: ngdovan@gmail.com title: Parallel Computation of Rough Set Approximations in Information Systems with Missing Decision Data ispublished: pub subjects: IT subjects: Scopus divisions: fac_fit abstract: The paper discusses the use of parallel computation to obtain rough set approximations from large-scale information systems where missing data exist in both condition and decision attributes. To date, many studies have focused on missing condition data, but very few have accounted for missing decision data, especially in enlarging datasets. One of the approaches for dealing with missing data in condition attributes is named twofold rough approximations. The paper aims to extend the approach to deal with missing data in the decision attribute. In addition, computing twofold rough approximations is very intensive, thus the approach is not suitable when input datasets are large. We propose parallel algorithms to compute twofold rough approximations in large-scale datasets. Our method is based on MapReduce, a distributed programming model for processing large-scale data. We introduce the original sequential algorithm first and then the parallel version is introduced. Comparison between the two approaches through experiments shows that our proposed parallel algorithms are suitable for and perform efficiently on large-scale datasets that have missing data in condition and decision attributes. date: 2018 official_url: https://www.mdpi.com/2073-431X/7/3/44 full_text_status: none publication: Computers refereed: TRUE citation: Cao, Thinh and Yamada, Koichi and Unehara, Muneyuki and Suzuki, Izumi and Nguyen, Do Van (2018) Parallel Computation of Rough Set Approximations in Information Systems with Missing Decision Data. Computers .