Enhancing XGBoost Performance in Malware Detection through Chi-Squared Feature Selection
DOI:
https://doi.org/10.32736/sisfokom.v13i3.2293Keywords:
malware detection, XGBoost, chi-squared, machine learning, feature selectionAbstract
The increasing prevalence of malware poses significant risks, including data loss and unauthorized access. These threats manifest in various forms, such as viruses, Trojans, worms, and ransomware. Each continually evolves to exploit system vulnerabilities. Ransomware has seen a particularly rapid increase, as evidenced by the devastating WannaCry attack of 2017 which crippled critical infrastructure and caused immense economic damage. Due to their heavy reliance on signature-based techniques, traditional anti-malware solutions struggle to keep pace with malware's evolving nature. However, these techniques face limitations, as even slight code modifications can allow malware to evade detection. Consequently, this highlights weaknesses in current cybersecurity defenses and underscores the need for more sophisticated detection methods. To address these challenges, this study proposes an enhanced malware detection approach utilizing Extreme Gradient Boosting (XGBoost) in conjunction with Chi-Squared Feature Selection. The research applied XGBoost to a malware dataset and implemented preprocessing steps such as class balancing and feature scaling. Furthermore, the incorporation of Chi-Squared Feature Selection improved the model's accuracy from 99.1% to 99.2% and reduced testing time by 89.28%, demonstrating its efficacy and efficiency. These results confirm that prioritizing relevant features enhances both the accuracy and computational speed of the model. Ultimately, combining feature selection with machine learning techniques proves effective in addressing modern malware detection challenges, not only enhancing accuracy but also expediting processing times.References
N. A. Azeez, O. E. Odufuwa, S. Misra, J. Oluranti, and R. Damaševičius, “Windows PE Malware Detection Using Ensemble Learning,” Informatics, vol. 8, no. 1, p. 10, Feb. 2021, doi: 10.3390/informatics8010010.
N. Pachhala, S. Jothilakshmi, and B. P. Battula, “A Comprehensive Survey on Identification of Malware Types and Malware Classification Using Machine Learning Techniques,” in 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India: IEEE, Oct. 2021, pp. 1207–1214. doi: 10.1109/ICOSEC51865.2021.9591763.
N. Dutta, N. Jadav, S. Tanwar, H. K. D. Sarma, and E. Pricop, “Introduction to Malware Analysis,” in Cyber Security: Issues and Current Trends, vol. 995, in Studies in Computational Intelligence, vol. 995. , Singapore: Springer Singapore, 2022, pp. 129–141. doi: 10.1007/978-981-16-6597-4_7.
N. Adeel, R. Kumar, K. N. S. Akella, V. Manickam, M. W. Khan, and S. V. Nandury, “Measuring the Implications of Email Viruses Through a Unified Model of Cyber Security,” in 2023 6th International Conference on Contemporary Computing and Informatics (IC3I), Gautam Buddha Nagar, India: IEEE, Sep. 2023, pp. 614–621. doi: 10.1109/IC3I59117.2023.10398148.
R. Vanness, M. M. Chowdhury, and N. Rifat, “Malware: A Software for Cybercrime,” in 2022 IEEE International Conference on Electro Information Technology (eIT), Mankato, MN, USA: IEEE, May 2022, pp. 513–518. doi: 10.1109/eIT53891.2022.9813970.
A. M. Kovács, “Ransomware: a comprehensive study of the exponentially increasing cybersecurity threat,” IRD, vol. 4, no. 2, pp. 96–104, Jun. 2022, doi: 10.9770/IRD.2022.4.2(8).
M. Aljaidi et al., “NHS WannaCry Ransomware Attack: Technical Explanation of The Vulnerability, Exploitation, and Countermeasures,” in 2022 International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI), Zarqa, Jordan: IEEE, Nov. 2022, pp. 1–6. doi: 10.1109/EICEEAI56378.2022.10050485.
C. M. Codreanu, “Exploring the need for human-centred cybersecurity. The WannaCry Cyberattack,” vol. 15, no. 2, 2021.
B. Fiore, K. Ha, L. Huynh, J. Falcon, R. Vendiola, and Y. Li, “Security Analysis of Ransomware: A Deep Dive into WannaCry and Locky,” in 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA: IEEE, Mar. 2023, pp. 285–294. doi: 10.1109/CCWC57344.2023.10099114.
A. Muzaffar, H. Ragab Hassen, M. A. Lones, and H. Zantout, “An in-depth review of machine learning based Android malware detection,” Computers & Security, vol. 121, p. 102833, Oct. 2022, doi: 10.1016/j.cose.2022.102833.
K. D. K. Wardhani and M. Akbar, “Diabetes Risk Prediction Using Extreme Gradient Boosting (XGBoost),” join, vol. 7, no. 2, pp. 244–250, Dec. 2022, doi: 10.15575/join.v7i2.970.
J. Palša et al., “MLMD—A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm,” Applied Sciences, vol. 12, no. 13, p. 6672, Jul. 2022, doi: 10.3390/app12136672.
R. Kumar and G. S, “Malware classification using XGboost-Gradient Boosted Decision Tree,” Adv. sci. technol. eng. syst. j., vol. 5, no. 5, pp. 536–549, 2020, doi: 10.25046/aj050566.
F. A. Rafrastara, C. Supriyanto, A. Amiral, S. R. Amalia, M. D. Al Fahreza, and F. Ahmed, “Performance Comparison of k-Nearest Neighbor Algorithm with Various k Values and Distance Metrics for Malware Detection,” mib, vol. 8, no. 1, p. 450, Jan. 2024, doi: 10.30865/mib.v8i1.6971.
T. Lu, Y. Du, L. Ouyang, Q. Chen, and X. Wang, “Android Malware Detection Based on a Hybrid Deep Learning Model,” Security and Communication Networks, vol. 2020, pp. 1–11, Aug. 2020, doi: 10.1155/2020/8863617.
M. Abujazoh, D. Al-Darras, N. A. Hamad, and S. Al-Sharaeh, “Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods,” in 2023 International Conference on Information Technology (ICIT), Amman, Jordan: IEEE, Aug. 2023, pp. 196–201. doi: 10.1109/ICIT58056.2023.10226049.
O. N. Elayan and A. M. Mustafa, “Android Malware Detection Using Deep Learning,” Procedia Computer Science, vol. 184, pp. 847–852, 2021, doi: 10.1016/j.procs.2021.03.106.
C. Supriyanto, F. A. Rafrastara, A. Amiral, S. R. Amalia, M. D. Al Fahreza, and Mohd. F. Abdollah, “Malware Detection Using K-Nearest Neighbor Algorithm and Feature Selection,” mib, vol. 8, no. 1, p. 412, Jan. 2024, doi: 10.30865/mib.v8i1.6970.
Rishitha Venumuddala and J. Krishna, “Methodological approach for designing a data pre- processing tool on textual data,” 2022, doi: 10.13140/RG.2.2.18627.27689.
T. D. Nguyen, M.-H. Shih, D. Srivastava, S. Tirthapura, and B. Xu, “Stratified random sampling from streaming and stored data,” Distrib Parallel Databases, vol. 39, no. 3, pp. 665–710, Sep. 2021, doi: 10.1007/s10619-020-07315-w.
M. Shantal, Z. Othman, and A. A. Bakar, “A Novel Approach for Data Feature Weighting Using Correlation Coefficients and Min–Max Normalization,” Symmetry, vol. 15, no. 12, p. 2185, Dec. 2023, doi: 10.3390/sym15122185.
M. Büyükkeçeci̇ and M. C. Okur, “A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine Learning,” Gazi University Journal of Science, vol. 36, no. 4, pp. 1506–1520, Dec. 2023, doi: 10.35378/gujs.993763.
K. Kishore and V. Jaswal, “Statistics Corner: Chi-squared Test,” Journal of Postgraduate Medicine, Education and Research, vol. 57, no. 1, pp. 40–44, Apr. 2023, doi: 10.5005/jp-journals-10028-1618.
D. Sitanggang, A. S. Ginting, R. M. Simanjuntak, and N. Lumbantoruan, “EEG Signal Classification using K-Nearest Neighbor Method to Measure Impulsivity Level,” SISFOKOM, vol. 13, no. 2, pp. 261–266, Jun. 2024, doi: 10.32736/sisfokom.v13i2.2154.
J. Hu and S. Szymczak, “A review on longitudinal data analysis with random forest,” Briefings in Bioinformatics, vol. 24, no. 2, p. bbad002, Mar. 2023, doi: 10.1093/bib/bbad002.
L. M. Sinaga, Sawaluddin, and S. Suwilo, “Analysis of classification and Naïve Bayes algorithm k-nearest neighbor in data mining,” IOP Conf. Ser.: Mater. Sci. Eng., vol. 725, no. 1, p. 012106, Jan. 2020, doi: 10.1088/1757-899X/725/1/012106.
K. Wang, M. Li, J. Cheng, X. Zhou, and G. Li, “Research on personal credit risk evaluation based on XGBoost,” Procedia Computer Science, vol. 199, pp. 1128–1135, 2022, doi: 10.1016/j.procs.2022.01.143.
S. Chehreh Chelgani, H. Nasiri, and A. Tohry, “Modeling of particle sizes for industrial HPGR products by a unique explainable AI tool- A ‘Conscious Lab’ development,” Advanced Powder Technology, vol. 32, no. 11, pp. 4141–4148, Nov. 2021, doi: 10.1016/j.apt.2021.09.020.
D. A. Anggoro, “Comparison of Accuracy Level of Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) Algorithms in Predicting Heart Disease,” IJETER, vol. 8, no. 5, pp. 1689–1694, May 2020, doi: 10.30534/ijeter/2020/32852020.
J. Asian, M. Dholah Rosita, and T. Mantoro, “Sentiment Analysis for the Brazilian Anesthesiologist Using Multi-Layer Perceptron Classifier and Random Forest Methods,” join, vol. 7, no. 1, pp. 132–141, Sep. 2022, doi: 10.15575/join.v7i1.900.
Downloads
Published
Issue
Section
License
The copyright of the article that accepted for publication shall be assigned to Jurnal Sisfokom (Sistem Informasi dan Komputer) and LPPM ISB Atma Luhur as the publisher of the journal. Copyright includes the right to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.
Jurnal Sisfokom (Sistem Informasi dan Komputer), LPPM ISB Atma Luhur, and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Sisfokom (Sistem Informasi dan Komputer) are the sole and exclusive responsibility of their respective authors.
Jurnal Sisfokom (Sistem Informasi dan Komputer) has full publishing rights to the published articles. Authors are allowed to distribute articles that have been published by sharing the link or DOI of the article. Authors are allowed to use their articles for legal purposes deemed necessary without the written permission of the journal with the initial publication notification from the Jurnal Sisfokom (Sistem Informasi dan Komputer).
The Copyright Transfer Form can be downloaded [Copyright Transfer Form Jurnal Sisfokom (Sistem Informasi dan Komputer).
This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s). After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted. The copyright form should be signed originally, and send it to the Editorial in the form of scanned document to sisfokom@atmaluhur.ac.id.