Abstract
As the popularity of Android mobile operating system grows, the number of software
developed to harm the users of this system increases. Therefore, many studies have been
done to detect malicious Android software. Apart from the classification of Android software
as malicious or benign, classification of the malicious software into their families is also very
important in terms of the security of the Android operating system. In this study, a machine
learning based classification system is developed that analyzes malicious Android software
and estimates the family of them. The developed system detects the requested permissions
and API calls of the malicious Android software and uses them as features in machine
learning algorithms to classify malwares. The performance of the system is investigated
using various data sets and the evaluation results show that all classification algorithms
classified the malware with a high accuracy. In addition to this work, a study of detecting an
unknown malware which belongs to a family that had never seen before is made and these
unknown malwares are classified with a high success rate.
xmlui.mirage2.itemSummaryView.Collections
xmlui.dri2xhtml.METS-1.0.item-citation
[1] Global mobile OS market share in sales to end users from 1st quarter 2009 to 2nd quarter 2018, https://www.statista.com/statistics/266136/global-market-share-held-by-smartphone-operating-systems/ (Erişim Tarihi: 25 Mayıs 2019).
[2] Android and Google Play statistics, https://www.appbrain.com/stats/stats-index/ (Erişim Tarihi: 25 Mayıs 2019).
[3] More than 99 percent of all malware designed for mobile devices targets Android devices, explained Olaf Pursche, Head of Communications at AV-TEST, in the F-Secure State of Cyber Security 2017.
[4] Another Reason 99% of Mobile Malware Targets Androids, https://blog.f-secure.com/another-reason-99-percent-of-mobile-malware-targets-androids/ (Erişim Tarihi: 25 Mayıs 2019).
[5] Android Malware Dataset, http://amd.arguslab.org/ (Erişim Tarihi: 25 Mayıs 2019).
[6] D. W. Aha, D. Kibler, and M. K. Albert. Instance-Based Learning Algorithms. Machine Learning, 6:3766, 1991
[7] Classification and Regression Trees, Leo Breiman, Jerome Friedman, Charles J. Stone, R.A. Olshen (1984)
[8] Cox, D.R. (1958). The Regression Analysis of Binary Sequences. Journal of the Royal Statistical Society: Series B, 20, 215-242.
[9] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, NY, 1995
[10] Freund, Yoav & E. Schapire, Robert. (1999). Large Margin Classification Using the Perceptron Algorithm. Machine Learning. 37. 10.1023/A:1007662407062.
[11] Leo Breiman. Random Forests. , University of California, Berkeley, Journal of Machine Learning, Vol. 45 Issue 1, pp 5 - 32, October, 2001
[12] Yoav Freund, and E. Schapire. Experiments with a New Boosting Algorithm. InProc. of the Thirteenth International Conference, 1996.
[13] Ismail, Najiahtul & Saad, H & Robiah, Y & Abdollah, Mohd. (2017). General android malware behaviour taxonomy. Defence S and T Technical Bulletin. 10. 160-168.
[14] Zhou, Y. and Jiang, X. (2012). Dissecting android malware: Characterization and evolution. Security and Privacy (SP), 95-109.
[15] dex2jar, https://code.google.com/p/dex2jar/ (Erişim Tarihi: 25 Mayıs 2019).
[16] Procyon, https://github.com/ststeiger/procyon/ (Erişim Tarihi: 25 Mayıs 2019).
[17] Jd-cmd, https://github.com/kwart/jd-cmd/ (Erişim Tarihi: 25 Mayıs 2019).
[18] Jadx, https://github.com/skylot/jadx/ (Erişim Tarihi: 25 Mayıs 2019).
[19] CFR, http://www.benf.org/other/cfr/ (Erişim Tarihi: 25 Mayıs 2019).
[20] Smali, https://github.com/JesusFreke/smali/ (Erişim Tarihi: 25 Mayıs 2019).
[21] Apktool, https://ibotpeaches.github.io/Apktool/ (Erişim Tarihi: 25 Mayıs 2019).
[22] Androguard, https://github.com/androguard/androguard (Erişim Tarihi: 25 Mayıs 2019).
[23] UI Exerciser Monkey, https://developer.android.com/studio/test/monkey.html (Erişim Tarihi: 25 Mayıs 2019).
[24] Droidutan, https://github.com/aleisalem/Droidutan (Erişim Tarihi: 25 Mayıs 2019).
[25] Droidbot, https://github.com/honeynet/droidbot (Erişim Tarihi: 25 Mayıs 2019).
[26] Droidbox, https://github.com/pjlantz/droidbox, (Erişim Tarihi: 25 Mayıs 2019).
[27] Manifest.Permission, https://developer.android.com/reference/android/Manifest.permission (Erişim Tarihi: 25 Mayıs 2019).
[28] The Drebin Dataset, https://www.sec.cs.tu-bs.de/~danarp/drebin/ (Erişim Tarihi: 25 Mayıs 2019).
[29] Android Malware Genome Project, http://www.malgenomeproject.org/ (Erişim Tarihi: 25 Mayıs 2019).
[30] A. Feizollah, N. B. Anuar, R. Salleh, and A. W. A. Wahab, “A review on feature selection in mobile malware detection,” Digital Investigation, vol. 13, no. 0, pp. 22 – 37, 2015.
[31] Wei, Fengguo & Li, Yuping & Roy, Sankardas & Ou, Xinming & Zhou, Wu. (2017). Deep Ground Truth Analysis of Current Android Malware. 252-276. 10.1007/978-3-319-60876-1_12.
[32] scikit-learn, https://scikit-learn.org/ (Erişim Tarihi: 25 Mayıs 2019)
[33] Tuning the hyper-parameters of an estimator, https://scikit-learn.org/stable/modules/grid_search.html (Erişim Tarihi: 25 Mayıs 2019)
[34] Receiver Operating Characteristic (ROC), https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
[35] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. In NDSS, San Diego,CA,USA, 2014.
[36] Y. Aafer, W. Du, and H. Yin. DroidAPIMiner: Mining API-level features for robust malware detection in android. In SecureComm, pages 86–103, Sydney,AU, 2013. Springer.
[37] Hossein Fereidooni, Mauro Conti, Danfeng Yao, and Alessandro Sperduti. Anastasia: Android malware detection using static analysis of applications. In New Technologies, Mobility and Security (NTMS), 2016 8th IFIP International Conference on, pages 1–5. IEEE, 2016.
[38] B. Amos, H. A. Turner, and J. White, “Applying Machine Learning Classifiers to Dynamic Android Malware Detection at Scale,” in International Conference on Wireless Communications and Mobile Computing (IWCMC), 2013.
[39] G. Dini, F. Martinelli, A. Saracino, and D. Sgandurra. MADAM: A multi-level anomaly detector for android malware. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 7531 LNCS:240–253, 2012.
[40] W.-C. Wu and S.-H. Hung. Droiddolphin: A dynamic android malware detection framework using big data and machine learning. In Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, RACS ’14, pages 247–252, New York, NY, USA, 2014. ACM.
[41] Nancy, Dr. Deepak Kumar Sharma, “Android Malware Detection using Decision Trees and Network Traffic”, in IJCSIT, Vol. 7 (4), 2016.
[42] M. Lindorfer, M. Neugschwandtner, and C. Platzer. Marvin: Efficient and comprehensive mobile app classification through static and dynamic analysis. In IEEE COMPSAC, 2015.
[43] M. Lindorfer, M. Neugschwandtner, L. Weichselbaum, Y. Fratantonio, V. v. d. Veen and C. Platzer, "ANDRUBIS -- 1,000,000 Apps Later: A View on Current Android Malware Behaviors," 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), Wroclaw, 2014, pp. 3-17.
[44] F. Yang, Y. Zhuang, J. Wang, “Android Malware Detection Using Hybrid Analysis and Machine Learning Technique”, from book Cloud Computing and Security: Third International Conference, ICCCS 2017, Nanjing, China, June 16-18, 2017, Revised Selected Papers, Part II (pp.565-575)
[45] Xu, Lifan & Zhang, Dongping & Jayasena, Nuwan & Cavazos, John. (2018). HADM: Hybrid Analysis for Detection of Malware. 702-724. 10.1007/978-3-319-56991-8_51.
[46] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, Droid-sec: Deep learning in Android malware detection, in Proceedings of the 2014 ACM Conference on Special Interest Group on Data Communication (SIGCOMM, poster), 2014, pp. 371–372.
[47] Xin Su et al. 2016. A Deep Learning Approach to Android Malware Feature Learning and Detection. In IEEE TrustCom. 244–251.
[48] L. Deshotels, V. Notani, and A. Lakhotia, “DroidLegacy: Automated familial classification of Android malware,” in ACM SIGPLAN Program Protection and Reverse Engineering Workshop, PPREW, 2014.
[49] J. Garcia, M. Hammad, B. Pedrood, A. Bagheri-Khaligh, and S.Malek, “Obfuscation-resilient, efficient, and accurate detection and family identification of android malware,” Department of Computer Science, George Mason University, Tech. Rep., 2015.
[50] S. K. Dash, G. Suarez-Tangil, S. Khan, K. Tam, M. Ahmadi, J. Kinder, and L. Cavallaro, “Droidscribe: Classifying android malware based on runtime behavior,” in Security and Privacy Workshops (SPW), 2016 IEEE. IEEE, 2016, pp. 252–261.
[51] Massarelli, Luca & Aniello, Leonardo & Ciccotelli, Claudio & Querzoni, Leonardo & Ucci, Daniele & Baldoni, Roberto. (2017). Android Malware Family Classification Based on Resource Consumption over Time.
[52] G. Suarez-Tangil, S. K. Dash, M. Ahmadi, J. Kinder, G. Giacinto, and L. Cavallaro, “Droidsieve: Fast and accurate classification of obfuscated android malware,” in Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. ACM, 2017, pp. 309–320.
[53] T. Chakraborty, F. Pierazzi and V. S. Subrahmanian, "EC2: Ensemble Clustering and Classification for Predicting Android Malware Families," in IEEE Transactions on Dependable and Secure Computing.
[54] Chen, Ricky. (2016). A Brief Introduction on Shannon's Information Theory. 10.13140/RG.2.1.2912.3604.
[55] Android, https://en.wikipedia.org/wiki/Android_(operating_system) (Erişim Tarihi: 25 Mayıs 2019)
[56] Android-Architecture, https://www.tutorialspoint.com/android/android_architecture (Erişim Tarihi: 25 Mayıs 2019)
[57] Building and Running, https://stuff.mit.edu/afs/sipb/project/android/docs/tools/building (Erişim Tarihi: 25 Mayıs 2019)
[58] k-nearest neighbors algorithm, https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm (Erişim Tarihi: 25 Mayıs 2019)
[59] Decision Trees Algorithms, https://medium.com/deep-math-machine-learning-ai/chapter-4-decision-trees-algorithms b93975f7a1f1 (Erişim Tarihi: 25 Mayıs 2019)
[60] Support Vector Machine, https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 (Erişim Tarihi: 25 Mayıs 2019)
[61] Random Forest Simple Explanation, https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d, (Erişim Tarihi: 25 Mayıs 2019)
[62] Sebastian Raschka, Python Machine Learning, Packt Publishing, 2015
[63] Multilayer perceptron example, https://github.com/rcassani/mlp-example (Erişim Tarihi: 25 Mayıs 2019)
[64] Aktas, Kursat & Sen, Sevil. (2018). UpDroid: Updated Android Malware and Its Familial Classification. 10.1007/978-3-030-03638-6_22.
[65] Updroid, https://wise.cs.hacettepe.edu.tr/projects/updroid/dataset/, (Erişim Tarihi: 13 Haziran 2019)