Comparative Analysis of Random Forest and Logistic Regression Methods in Predicting Leukemia Blood Cancer Using Microscopic Blood Cell Images

Jepri Banjarnahor; Galuh Wira Relungwangi

doi:10.32736/sisfokom.v14i3.2393

Authors

Jepri Banjarnahor Universitas Prima Indonesia
Galuh Wira Relungwangi Universitas Prima Indonesia

DOI:

https://doi.org/10.32736/sisfokom.v14i3.2393

Keywords:

Random Forest, Logistic Regression, Prediction, Leukimia, Google Colab

Abstract

Leukemia is one of the deadliest blood cancers that urgently requires early detection for effective treatment. However, conventional diagnosis methods are often subjective, time-consuming, and expensive, posing challenges especially in resource-constrained areas. This study presents a comprehensive comparative analysis of two widely-used machine learning algorithms - Random Forest (RF) and Logistic Regression (LR) - for leukemia prediction using an open-access dataset of 10,661 preprocessed microscopic blood cell images from Kaggle. The dataset was carefully partitioned into training (80%) and testing (20%) sets, with rigorous preprocessing including image normalization and feature extraction. Our evaluation incorporated multiple performance metrics: accuracy, sensitivity, specificity, and AUC. The results show that Random Forest's performance is superior with a classification accuracy of 85.23%, specificity of 0.9351, sensitivity of 0.6774, and AUC of 0.8881, significantly outperforming LR which achieved an accuracy of 78.11%, specificity of 0.8363, sensitivity of 0.6742, and AUC of 0.8120. These findings suggest that ensemble methods like RF are particularly well-suited for detecting one of the most deadly blood cancers, leukemia, due to their ability to handle complex feature interactions in medical imaging data. While both algorithms have potential as clinical decision support, future research can test deep learning techniques and larger datasets to improve the accuracy and reliability of the model.

References

A. E. Aby, S. Salaji, K. K. Anilkumar, and T. Rajan, “A review on leukemia detection and classification using Artificial Intelligence-based techniques,” Comput. Biol. Med., vol. 169, p. 107630, 2024.

J. Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Piñeros, A. Znaor, and F. Bray, "Cancer statistics for the year 2020: An overview," Int. J. Cancer, vol. 149, no. 4, pp. 778–789, Aug. 2021, doi: 10.1002/ijc.33588.

Z. Cheng et al.,“Artificial intelligence reveals the predictions of hematological indexes in children with acute leukemia,” BMC Cancer, vol. 24, p. 993, 2024.

K. Kou et al., “Comprehensive Sepsis Risk Prediction in Leukemia Using a Random Forest Model and Restricted Cubic Spline Analysis” Front. Immunol., vol. 16, p. 1514273, 2025.

H. Liao, F. Zhang, F. Chen, Y. Li, Y. Sun, D. D. Sloboda, Q. Zheng, B. Ying, and T. Hu, "Application of artificial intelligence in laboratory hematology: Advances, challenges, and prospects," Clinica Chimica Acta, vol. 550, pp. 1–10, 2024, doi: 10.1016/j.cca.2023.117180..

L. Sari et al., “Penerapan Random Forest untuk Prediksi Penyakit Berdasarkan Data Hematologi,” J. Infotekmesin, vol. 14, no. 1, pp. 45–52, 2024.

H. Li et al., “Integrating Random Forest and Logistic Regression for Disease Classification: A Hybrid Approach,” Heliyon, vol. 10, no. 3, p. e25369, 2024.

K. Kashef et al., “Comparative analysis of ML algorithms for leukemia detection,” BMC Med. Inform. Decis. Mak., vol. 24, p. 122, 2024.

S. Triglycerides, “Leukemia Detection Using CNN and ML Approaches,” Indones. J. Comput. Sci., vol. 13, no. 3, pp. 4115–4125, 2024.

X. Fu et al., “ML models for acute leukemia detection: Performance comparison,” BMC Cancer, vol. 24, p. 993, 2024.

L. Narayanan, K. Santhana, R. Harold, and M. A. Banu, “Enhancing Acute Leukemia Classification Through Hybrid Fuzzy C-Means and Random Forest Methods,” Meas. Sens., vol. 39, p. 101876, 2025.

M. A. Khan, M. Sharif, M. Raza, and T. Saba, “A Machine Learning Framework for the Classification of Leukemia Using Microscopic Blood Images,” Microsc. Res. Tech., vol. 84, no. 12, pp. 2917–2928, 2021.

M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Random Forest versus Logistic Regression: A Large-Scale Benchmark Experiment,” BMC Bioinform., vol. 19, p. 270, 2018.

L. Liu, Y. Zhang, and H. Li, “Application of logistic regression in hematological diagnosis,” BMC Cancer, vol. 24, p. 993, 2024.

N. H. Mahmood and D. H. Kadir, “Sparsity Regularization Enhances Gene Selection and Leukemia Subtype Classification via Logistic Regression,” Leuk. Res., vol. 150, p. 107663, 2025.

R. Roscher, B. Bohn, P. Feth, and B. Waske, “Explainable Machine Learning for Remote Sensing Applications,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–20, 2023.

S. Raschka, V. Mirjalili, and M. Khan, Python Machine Learning, 4th ed. Birmingham, UK: Packt Publishing, 2023.

Y. Liu, H. Yin, and Y. Zhang, “Comparative Study of Machine Learning Algorithms for Medical Image Classification,” J. Healthc. Eng., vol. 2023, pp. 1–10, 2023.

A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd ed. Sebastopol, CA: O’Reilly Media, 2023.

A. M. Najar, I. W. Sudarsana, M. U. Albab, and S. Andhika, “Machine Learning untuk Identifikasi Jenis Kanker Darah (Leukemia),” Vygotsky: J. Pendidik. Mat. dan Matematika, vol. 4, no. 1, pp. 47–56, 2022.

Y. Maulana, A. P. Nugroho, and D. D. Prasetyo, “Evaluasi Performa Machine Learning untuk Analisis Leukosit Abnormal Darah Tepi pada Penderita Acute Lymphoblastic Leukemia,” Laporan Penelitian, Universitas Gadjah Mada, 2022