eprintid: 3763 rev_number: 8 eprint_status: archive userid: 274 dir: disk0/00/00/37/63 datestamp: 2019-12-09 09:16:51 lastmod: 2019-12-09 09:16:51 status_changed: 2019-12-09 09:16:51 type: conference_item metadata_visibility: show creators_name: Tran, Nghi Phu creators_name: Le, Huy Hoang creators_name: Nguyen, Ngoc Toan creators_name: Nguyen, Dai Tho creators_name: Nguyen, Ngoc Binh creators_id: tnphvan@gmail.com creators_id: hoangle.hvan@gmail.com creators_id: ngoctoan.hvan@gmail.com creators_id: nguyendaitho@vnu.edu.vn creators_id: nn_binh@kcg.edu corp_creators: VNU University of Engineering and Technology corp_creators: People’s Security Academy corp_creators: Kyoto College of Graduate Studies for Informatics title: CFDVex: A Novel Feature Extraction Method for Detecting Cross-Architecture IoT Malware ispublished: pub subjects: IT divisions: fac_fit abstract: The widespread adoption of Internet of Things (IoT) devices built on different architectures gave rise to the creation and development of multi-architecture malware for mass compromise. Crossarchitecture malware detection plays an important role in detecting malware early on devices using new or strange architectures. Prior knowledge of malware detection on traditional architectures can be inherited for the same task on new and uncommon ones. Basing on CFD and Vex intermediate representation, we propose a feature selection method to detect cross-architecture malware, called CFDVex. Experimental evaluation of the proposed approach on our large IoT dataset achieved good results for cross-architecture malware detection. We only trained a SVM model by Intel 80386 architecture samples, our method could detect the IoT malware for the MIPS architecture samples with 95.72% of accuracy and 2.81% false positive rate. date: 2019-12 date_type: published official_url: https://soict.org/ full_text_status: public pres_type: paper pagerange: 248-254 event_title: 10th International Symposium on Information and Communication Technology (SoICT 2019) event_location: Ha Noi - Ha Long event_dates: December 4 – 6, 2019 event_type: conference refereed: TRUE referencetext: [1] Kaspersky IoT Lab Report. New IoT malware grew three fold in H1 2018. [Online]. Available: https://www.kaspersky.com/about/press-releases/2018_new-iotmalware-grew-three-fold-in-h1-2018. [Accessed: 02-Sep-2019]. [2] Yin Minn Pa Pa, Shogo Suzuki, Katsunari Yoshioka, Tsutomu Matsumoto, Takahiro Kasama, and Christian Rossow. IoTPOT: Analysing the Rise of IoT Compromises. In Proceedings of the 9th USENIX Conference on Offensive Technologies, 9–19. WOOT’15. Berkeley, CA, USA: USENIX Association, 2015. [3] Alhanahnah, Mohannad, Qicheng Lin, Qiben Yan, Ninh Zhang, and Zhenxiang Chen. Efficient Signature Generation for Classifying Cross-Architecture IoT Malware. 2018 IEEE Conference on Communications and Network Security (CNS), 1–9, 2018. [4] N. Idika, A.P. Mathur. A Survey of Malware Detection Techniques. Technical Report, Purdue University, 2007 [5] Evanson Mwangi karanja, Shedden Masupe, Jeffrey Mandu. Internet of Things Malware: A Survey. IJCSES, vol. 8, No.3, 2017. [6] Xuxian Jiang, Xinyuan Wang, Dongyan Xu. Stealthy malware detection and monitoring through VMM-based out-of-the-box semantic view reconstruction. ACM Transactions on Information and System Security (TISSEC), Volume 13 Issue 2, February 2010. [7] Shahid Alam, R. Nigel Horspool, and Issa Traore. MAIL: Malware Analysis Intermediate Language: A Step Towards Automating and Optimizing Malware Detection. In Proceedings of the 6th International Conference on Security of Information and Networks, 233–240. SIN ’13. New York, NY, USA: ACM, 2013. [8] Ralf Huuck. Iot: The internet of threats and static program analysis defense. Embedded World 2015, Exibition & Conferences, pp. 493–495. [9] Rafiqul Islam, Ronghua Tian and Lynn Batten. Classification of Malware Based on String and Function Feature Selection. Second Cybercrime and Trustworthy Computing Workshop, 2010. [10] Huy Trung Nguyen, Quoc Dung Ngo, and Van Hoang Le. IoT Botnet Detection Approach Based on PSI Graph and DGCNN Classifier. In 2018 IEEE International Conference on Information Communication and Signal Processing (ICICSP), 118–122, 2018. [11] Igor Santos, Felix Brezo, Xabier Ugarte-Pedrero and Pablo Garcia Bringas. Opcode Sequences as Representation of Executables for Data-Mining-Based Unknown Malware Detection. Information Sciences, Data Mining for Information Security, 231 (May 10, 2013): 64–82. [12] Yuxin Ding, Wei Dai, Shengli Yan and Yumei Zhang. Control Flow-Based Opcode Behavior Analysis for Malware Detection. Computers & Security 44 (July 1, 2014): 65–74. [13] Soomin Kim, Markus Faerevaag, Minkyu Jung, SeungIl Jung, DongYeop Oh, JongHyup Lee, and Sang Kil Cha. Testing Intermediate Representations for Binary Analysis. In Proceedings of the 32Nd IEEE/ACM International Conference on 253 CFDVex: A Novel Feature Extraction Method for Detecting Cross-Architecture IoT Malware SoICT’ 19, December 4–6, 2019, Hanoi - Ha Long Bay, Vietnam Automated Software Engineering, 353–364. ASE 2017. Piscataway, NJ, USA: IEEE Press, 2017. [14] Alexander Sepp, Bogdan Mihaila, and Axel Simon. Precise Static Analysis of Binaries by Extracting Relational Information. In 18th Working Conference on Reverse Engineering, 357–366. Limerick, Ireland: IEEE, 2011. [15] N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not, 42(6):89 -100, June 2007. [16] Intermediate Representation in Angr. Available https://docs.angr.io/advancedtopics/ir [17] D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In ICISS ’08, pages 1-25, Berlin, Heidelberg, 2008. Springer-Verlag. [18] H. Yin and D. Song. Privacy-Breaching Behavior Analysis. In Automatic Malware Analysis. Springer Briefs in Computer Science, pages 27-42. Springer New York, 2013 [19] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel and Giovanni Vigna. State of The Art of War: Offensive Techniques in Binary Analysis, IEEE Symposium on Security and Privacy (SP), 2016. [20] Frequently Asked Questions. [Online]. Available: https://docs.angr.io/introductoryerrata/faq. [Accessed: 16-Jun-2019]. [21] Daniel Bilar. Opcodes as Predictor for Malware. International Journal of Electronic Security and Digital Forensics 1, no. 2 (2007): 156. [22] Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev and Yuval Elovici. Unknown Malcode Detection Using OPCODE Representation. In Intelligence and Security Informatics, 204–215. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2008. [23] Igor Santos, Felix Brezo, Javier Nieves, Yoseba K. Penya, Borja Sanz, Carlos Laorden, and Pablo Garcia Bringas. Idea: Opcode-Sequence-Based Malware Detection. In Engineering Secure Software and Systems, Second International Symposium, ESSoS 2010, Pisa, Italy, (pp.35-43), 2010. [24] Tran Nghi Phu, Nguyen Ngoc Toan, Le Hoang, Nguyen Dai Tho, Nguyen Ngoc Binh. C500-CFG: A Novel Algorithm to Extract Control Flow-based Features for IoT Malware Detection.19th International Symposium on Communications and Information Technologies (ISCIT), 2019, Hochiminh, Vietnam. [25] Shunichi Amari, Si Wu. Improving support vector machine classifiers by modifying kernel functions. Neural Netw 1999;12:783-789. [26] Andrei Costin, Jonas Zaddach, Aurélien Francillon and Davide Balzarotti, A largescale analysis of the security of embedded firmwares, in Proceedings of the 23rd USENIX Security Symposium, 2014, pp.95-110. [27] Pa Yin Minn Pa, Shogo Suzuki, Katsunari Yoshioka, Tsutomu Matsumoto, Takahiro Kasama, and Christian Rossow. IoTPOT: A Novel Honeypot for Revealing Current IoT Threats. Journal of Information Processing 24, no. 3 (2016): 522–533. [28] Detux [Online]. Available https://github.com/detuxsandbox/detux [29] David Brash. Recent Additions to the ARMv7-A Architecture. In 2010 IEEE International Conference on Computer Design, 2010. [30] Vex IR Document. https://github.com/angr/vex/blob/master/pub/libvex_ir.h [31] Hiroshi Ogura, Hiromi Amano and Masato Kondo. Feature Selection with a Measure of Deviations from Poisson in Text Categorization. Expert Systems with Applications 36, no. 3, Part 2 (April 1, 2009): 6826–6832. [32] Y. Yang and J. O. Pedersen, A comparative study on feature selection in text categorization. Proceedings of the 14th International Conference on Machine Learning (ICML ’97), p. 412-420, 1997. [33] Virusshare [Online]. Available https://virusshare.com/ [34] Virus Total [Online]. Available https://virustotal.com/ [35] Tran Nghi Phu, Nguyen Ngoc Binh, Ngo Quoc Dung, and Le Van Hoang. Towards Malware Detection in Routers with C500-Toolkit. In 2017 5th International Conference on Information and Communication Technology (ICoIC7), 1–5, 2017. [36] Christopher Kruegel and Yan Shoshitaishvili. Using static binary analysis to find vulnerabilities and backdoors in firmware. in: Black Hat USA, 2015. citation: Tran, Nghi Phu and Le, Huy Hoang and Nguyen, Ngoc Toan and Nguyen, Dai Tho and Nguyen, Ngoc Binh (2019) CFDVex: A Novel Feature Extraction Method for Detecting Cross-Architecture IoT Malware. In: 10th International Symposium on Information and Communication Technology (SoICT 2019), December 4 – 6, 2019, Ha Noi - Ha Long. document_url: https://eprints.uet.vnu.edu.vn/eprints/id/eprint/3763/1/p248-tran-nghi-phu.pdf