relation: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/4374/ title: Big Data Analytics and Machine Learning for Industry 4.0: An Overview creator: Le, Nguyen Tuan Thanh creator: Pham, Manh Linh subject: Information Technology (IT) description: The concept of “Big data” was mentioned for the first time by Roger Mougalas in 2005. Volume hints to the size and/or scale of datasets. Until now, there is no universal threshold for data volume to be considered as big data, because of the time and diversity of datasets. Velocity indicates the speed of processing data. It can fall into three categories: streaming processing, real-time processing, or batch processing. Value alludes to the usefulness of data for decision making. Veracity denotes the quality and trustworthiness of datasets. Parallelization allows one to improve computation time by dividing big problems into smaller instances, distributing smaller tasks across multiple threads and then performing them simultaneously. Feature selection is useful for preparing high scale datasets. Sampling is a method for data reducing that helps to derive patterns in big datasets by generating, manipulating, and analyzing subsets of the original data. publisher: CRC Press - Taylor & Francis Group, LLC contributor: Rajesh, G. contributor: X. Mercilin, Raajini contributor: Dang, Thi Thu Hien date: 2021-01-31 type: Book Section type: PeerReviewed format: application/pdf language: en identifier: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/4374/6/BookCRCPressThanhLinh.pdf identifier: Le, Nguyen Tuan Thanh and Pham, Manh Linh (2021) Big Data Analytics and Machine Learning for Industry 4.0: An Overview. In: Industry 4.0 Interoperability, Analytics, Security, and Case studies. Big Data for Industry 4.0: Challenges and Applications . CRC Press - Taylor & Francis Group, LLC, Boca Raton, FL, USA, pp. 1-11. ISBN 9781003048855 relation: https://doi.org/10.1201/9781003048855