Analysis of dimensionality reduction on ransomware detection using machine learning techniques
Abstract
Ransomware attacks continue to evolve as a pervasive threat to cybersecurity such as data loss, financial losses, and potential disruption of critical services which have prompted the need for robust detection mechanisms. Leveraging on machine learning techniques for ransomware detection has gained recognition; however, the high-dimensional nature of feature spaces has posed some challenges in model efficiency and effectiveness. This research therefore explores the impact of two well-known dimensionality reduction methods that may enhance ransomware detection using five popularly used machine learning algorithms which are K-Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) and Naive Bayes (NB). Through comprehensive analysis and experimentation, two well-known dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) were examined on the selected machine learning algorithms using a Ransomware PE Header Feature Dataset (publicly available on online data repository) with 1028 features. Metrics such as Accuracy, Recall, Precision and F1-Score were used to evaluate the classifiers. The comparative analysis of LDA and PCA reveals a discernible preference for one classifier over another. From the results, it is observed that the performance of classifiers with PCA is better than that of with LDA. Also, Decision Tree and Random Forest classifiers outperform the other three algorithms without using dimensionality reduction as well as with both PCA and LDA.