eprintid: 4445 rev_number: 8 eprint_status: archive userid: 274 dir: disk0/00/00/44/45 datestamp: 2021-05-31 11:05:29 lastmod: 2021-05-31 11:05:29 status_changed: 2021-05-31 11:05:29 type: conference_item metadata_visibility: show creators_name: Nguyen, Thi Thu Trang creators_name: Nguyen, Dai Tho creators_name: Vu, Duy Loi creators_id: trangngtt@vnu.edu.vn creators_id: nguyendaitho@vnu.edu.vn creators_id: vuduyloi55@gmail.com corp_creators: VNU University of Engineering and Technology title: A Hypercuboid-Based Machine Learning Algorithm for Malware Classification ispublished: inpress subjects: IT subjects: isi_scopus_conf divisions: avitech divisions: fac_fit keywords: Malware classification, machine learning, k-nearest neighbors algorithms, prototype-based learning, hypercuboids abstract: Malware attacks have been among the most serious threats to cyber security in the last decade. Anti-malware software can help safeguard information systems and minimize their exposure to the malware. Most of anti-malware programs detect malware instances based on signature or pattern matching. Data mining and machine learning techniques can be used to automatically detect models and patterns behind different types of malware variants. However, traditional machine-based learning techniques such as SVM, decision trees and naive Bayes seem to be only suitable for detecting malicious code, not effective enough for complex problems such as classification. In this article, we propose a new prototype extraction method for non-traditional prototype-based machine learning classification. The prototypes are extracted using hypercuboids. Each hypercuboid covers all training data points of a malware family. Then we choose the data points nearest to the hyperplanes as the prototypes. Malware samples will be classified based on the distances to the prototypes. Experiments results show that our proposition leads to F1 score of 96.5% for classification of known malware and 97.7% for classification of unknown malware, both better than the original prototype-based classification method. date: 2021-06-05 date_type: completed official_url: http://rivf.net contact_email: nguyendaitho@vnu.edu.vn full_text_status: public pres_type: paper event_title: The 15th IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF 2021) event_location: Hanoi, Vietnam event_dates: June 3-5, 2021 event_type: conference refereed: TRUE referencetext: [1] Daniele Ucci, Leonardo Aniello, Roberto Baldoni: Survey of machine learning techniques for malware analysis. Computers & Security 81: 123-147 (2019). [2] F. Ahmed, H. Hameed, M. Z. Shafiq, M. Farooq, Using spatio-temporalinformation in api calls with machine learning algorithms for malwaredetection, in: Proceedings of the 2nd ACM workshop on Security andartificial intelligence, ACM, 2009, pp. 55–62. [3] G. Liang, J. Pang, C. Dai, A behavior-based malware variant classificationtechnique, International Journal of Information and Education Technol-ogy 6 (4) (2016) 291. [4] Guillermo Suarez-Tangil, Juan E. Tapiador, Pedro Peris-Lopez, Jorge Blasco: Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families. Expert Systems with Applications,Volume 41, Issue 4, Part 1, 2014, Pages 1104-1117, ISSN 0957-4174. [5] J. Upchurch, X. Zhou, Variant: a malware similarity testing framework,in: 2015 10th International Conference on Malicious and Unwanted Soft-ware (MALWARE), IEEE, 2015, pp. 31–39. [6] K.Rieck, P.Trinius, T.Holz: Automatic Analysis of Malware Behavior using Machine Learning. Journal of Computer Security (JCS), 19 (4) 639-668, 2011. [7] K. Rieck and P. Laskov: Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research, 9(Jan):23–48, 2008. [8] M. G. Schultz, E. Eskin, F. Zadok, S. J. Stolfo, Data mining methods fordetection of new malicious executables, in: Security and Privacy, 2001. SP 2001. Proceedings. 2001 IEEE Symposium on, 2001, pp. 38–49. [9] M. Kruczkowski, E. N. Szynkiewicz, Support vector machine for malwareanalysis and classification, in: Web Intelligence (WI) and Intelligent AgentTechnologies (IAT), IEEE Computer Society, 2014, pp. 415–420. [10] P. Khodamoradi, M. Fazlali, F. Mardukhi, M. Nosrati, Heuristic metamor-phic malware detection based on statistics of assembly instructions usingclassification algorithms, in: Computer Architecture and Digital Systems(CADS), 2015 18th CSI International Symposium on, IEEE, 2015, pp.1–6. [11] P. M. Comar, L. Liu, S. Saha, P. N. Tan, A. Nucci, Combining supervisedand unsupervised learning for zero-day malware detection, in: INFOCOM,2013 Proceedings IEEE, 2013, pp. 2022–2030. [12] Prasha Shrestha,Suraj Maharajan,Gabriela Ramirez de la Rosa, Alan Sprague, Thamar Solorio and Gracy Warner: Using String Information for Malware Family Identification. @Springer International Publishing Switzerland 2014, A.L.C.Bazzan and K.Pichara(Eds.): IBERAMIA 2014, LNAI 8864,pp.686- 697, 2014.DOI:10.1007/978-3-319-12027-0_55. [13] Quinlan, J. Ross. “Combining Instance-Based and Model-Based Learning.” ICML (1993). [14] S. Attaluri, S. McGhee, M. Stamp, Profile hidden markov models andmetamorphic virus detection, Journal in Computer Virology 5 (2) (2009)151–169, [15] T. Gonzalez: Clustering to minimize the maximum inter cluster distance. Theoretical Computer Science 38, pages 293–306, 1985. [16] Z. Chen, M. Roussopoulos, Z. Liang, Y. Zhang, Z. Chen, A. Delis, Malwarecharacteristics and threats on the internet ecosystem, Journal of Systemsand Software 85 (7) (2012) 1650–1672. funders: Vingroup Joint Stock Company projects: Domestic Master/ PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Vingroup Big Data Institute (VINBIGDATA), code VINIF.2020.ThS.62 citation: Nguyen, Thi Thu Trang and Nguyen, Dai Tho and Vu, Duy Loi (2021) A Hypercuboid-Based Machine Learning Algorithm for Malware Classification. In: The 15th IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF 2021), June 3-5, 2021, Hanoi, Vietnam. (In Press) document_url: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/4445/1/A-Hypercuboid-Based-Machine-Learning-Algorithm-for-Malware-Classification.pdf