eprintid: 2776 rev_number: 4 eprint_status: archive userid: 297 dir: disk0/00/00/27/76 datestamp: 2017-12-14 03:34:49 lastmod: 2017-12-14 03:34:49 status_changed: 2017-12-14 03:34:49 type: conference_item succeeds: 2100 metadata_visibility: show creators_name: Nguyen, Duc Linh creators_name: Man, Duc Chuc creators_name: Bui, Quang Hung creators_name: Nguyen, Thi Nhat Thanh creators_id: chucmd@fimo.edu.vn creators_id: hungbq@vnu.edu.vn creators_id: thanhntn@vnu.edu.vn title: Standardization procedure for automatic environmental data: A case study in Hanoi, Vietnam ispublished: pub subjects: IT divisions: FIMO divisions: fac_fit keywords: Atmospheric measurements;Correlation;Data models;Filling;Monitoring;Pollution measurement;Training;PM10;abnormal detection;environmental data;missing filling abstract: In Vietnam, environmental data collected from ground-based stations may contain abnormal or missing values due to several problems during operation, i.e. sensor's problems. This paper proposes a standardization procedure which try to detect unusual values and fill in missing data. Experiments were conducted for PM10 data. Two datasets measured in 01/2011 and 01/2012 at Nguyen Van Cu station in Hanoi, Vietnam is used for experiments. For the abnormal detection process, unusual data can be informed to the data analyzers at ground stations for judging. For the missing filling process, the first dataset is used as training dataset to construct regression models for predicting missing data, the second dataset is used as testing data. In the worst case, suppose 100% PM10 is missing, Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) are 51 µg/m3 and 45% respectively. Correlation coefficient (R) between original PM10 data and predicted PM10 data is 0.56. In addition, different scenarios taking account of percentage of missing data of the whole testing dataset are also considered. Experimental results showed that it is best to perform missing filling process on datasets that contain 10% to 30% of missing data. For this case, RMSE ranges from 15–25 µg/m3 and MAPE varies from 5 to 13%. date: 2016-12 date_type: published full_text_status: none pres_type: paper pagerange: 321-326 event_title: The 8th International Conference on Knowledge and Systems Engineering (KSE) event_location: Hanoi, Vietnam event_dates: 6-8 October 2016 event_type: conference refereed: TRUE citation: Nguyen, Duc Linh and Man, Duc Chuc and Bui, Quang Hung and Nguyen, Thi Nhat Thanh (2016) Standardization procedure for automatic environmental data: A case study in Hanoi, Vietnam. In: The 8th International Conference on Knowledge and Systems Engineering (KSE), 6-8 October 2016, Hanoi, Vietnam.