Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm
DOI:
https://doi.org/10.32736/sisfokom.v14i2.2350Keywords:
SMOTE-ENN, Data Imbalance, C4.5, Diabetes, ClassificationAbstract
Data imbalance in health datasets often leads to decreased performance of classification models, especially in detecting minority classes such as diabetics. This study evaluates the effect of the SMOTE-ENN method on improving the performance of the C4.5 algorithm in the classification of diabetes health indicators. The dataset used is the 2021 Diabetes Binary Health Indicators BRFSS from Kaggle, which consists of 236,378 respondent data with unbalanced class distribution: 85.80% non-diabetic and 14.20% diabetic. The SMOTE method was used to add synthetic data to the minority classes, while ENN was applied to remove data considered noise. After balancing, the C4.5 algorithm was used for classification. Evaluation was conducted using accuracy, precision, recall, and F1-score metrics. The results showed that the application of SMOTE-ENN improved accuracy from 79.49% to 80.33% and precision from 29% to 30%. Although the recall value did not increase, this method proved to be able to improve the overall stability of the prediction, especially in terms of the accuracy of the classification of the positive class. The novelty of this research lies in the specific application of the SMOTE-ENN method on large-scale health datasets with the C4.5 algorithm, which has not been widely explored before. Therefore, further exploration of other balancing techniques and algorithms is needed to obtain more optimal classification results on unbalanced data.References
WHO, “Thermostability of human insulin,” World Heal. Organ. 2024., vol. 2050, no. 1, pp. 1–7, 2024.
H. Marlisa, N. Satyahadewi, N. Imro’ah, and N. N. Debataraja, “Application of Adasyn Oversampling Technique on K-Nearest Neighbor Algorithm,” BAREKENG J. Ilmu Mat. dan Terap., vol. 18, no. 3, pp. 1829–1838, 2024.
M. K. Rezki, M. I. Mazdadi, F. Indriani, Muliadi, T. H. Saragih, and V. A. Athavale, “Application of Smote to Address Class Imbalance in Diabetes Disease Categorization Utilizing C5.0, Random Forest, and Support Vector Machine,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 4, pp. 343–354, 2024.
J. Wang, “Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques,” Math. Biosci. Eng. aimspress.com, 2022.
R. P. Fadhillah, R. Rahma, and ..., “Klasifikasi Penyakit Diabetes Mellitus Berdasarkan Faktor-Faktor Penyebab Diabetes menggunakan Algoritma C4. 5,” … Penelit. dan …, 2022.
R. Doğan, S. M. Çınar, and E. Akarslan, “A Novel ZIP-Based NILM Method Design Robust to Undervoltage and Overvoltage Conditions,” Arab. J. Sci. Eng., 2025.
U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1060–1073, 2022.
Q. H. Nguyen et al., “Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,” Math. Probl. Eng., vol. 2021, 2021.
M. T. Akhir, M. Syarat, G. Memperoleh, G. Sarjana, S. Satu, and T. Informasi, Perbandingan Kinerja Metode Klasifikasi Naïve Bayes Dan Random Forest Dalam Analisis Sentimen Kasus Narkoba di Indonesia Pada Komentar YouTube SKRIPSI Diajukan oleh : NAILUL ‘ INAYAH PROGRAM STUDI TEKNOLOGI INFORMASI. 2023.
A. Ambarwari, Q. J. Adrian, and Y. Herdiyeni, “Analisis Pengaruh Data Scaling Terhadap Performa Algoritme Machine Learning untuk Identifikasi Tanaman,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 1, pp. 117–122, 2020.
S. Nagibzadeh, Bilgisayar Bilimleri ve Mühendisli ğ i + tr-2, no. January 2025. 2024.
M. Altalhan, A. Algarni, and M. Turki-Hadj Alouane, “Imbalanced Data Problem in Machine Learning: A Review,” IEEE Access, vol. 13, no. January, pp. 13686–13699, 2025.
M. Seyedtabib and N. Kamyari, “Predicting polypharmacy in half a million adults in the Iranian population: comparison of machine learning algorithms,” BMC medical informatics and decision making. Springer, 2023.
H. L. Ngo et al., “The composition of time-series images and using the technique SMOTE ENN for balancing datasets in land use/cover mapping,” Acta Montan. Slovaca, vol. 27, no. 2, pp. 342–359, 2022.
M. Lu, L. T. Tay, and J. Mohamad-Saleh, “Landslide susceptibility analysis using random forest model with SMOTE-ENN resampling algorithm,” Geomatics, Nat. Hazards Risk, vol. 15, no. 1, p. , 2024.
Dhea Halimah, Muhammad Ridwan Lubis, and Widodo Saputra, “Algoritma C4.5 Untuk Menentukan Klasifikasi Tingkat Pemahaman Mahasiswa Pada Matakuliah Bahasa Pemrograman,” J. Tek. Mesin, Ind. Elektro Dan Inform., vol. 1, no. 3, pp. 24–38, 2022.
P. B. N. Setio, D. R. S. Saputro, and Bowo Winarno, “Klasifikasi Dengan Pohon Keputusan Berbasis Algoritme C4.5,” Prism. Pros. Semin. Nas. Mat., vol. 3, pp. 64–71, 2020.
U. P. Budi and A. Info, “Application Of C4 . 5 Algorithm In Disease Classification,” vol. 2, no. 02, pp. 58–62, 2024.
A. Afifuddin and L. Hakim, “Deteksi Penyakit Diabetes Mellitus Menggunakan Algoritma Decision Tree Model Arsitektur C4.5,” J. Krisnadana, vol. 3, no. 1, pp. 25–33, 2023.
L. Y. L. Gaol, M. Safii, and D. Suhendro, “Prediksi Kelulusan Mahasiswa Stikom Tunas Bangsa Prodi Sistem Informasi Dengan Menggunakan Algoritma C4. 5,” Brahmana J. Penerapan …, 2021.
F. F. Nugraha, I. Sunandar, and C. Julian, “Penerapan Data Mining Dengan Metode Kalsifikasi Menggunakan Algoritma C4.5,” Teknologi, vol. 7, no. March, pp. 10–20, 2022.
M. A. Barata et al., “PERANCANGAN SISTEM ELECTRONIC NOSE BERBASIS,” pp. 117–126, 2016.
M. D. Nguyen et al., “Estimation of recompression coefficient of soil using a hybrid ANFIS-PSO machine learning model,” J. Eng. Res., vol. 12, no. September 2023, pp. 358–368, 2024.
V. R. Prasetyo, M. Mercifia, A. Averina, L. Sunyoto, and B. Budiarjo, “Prediksi Rating Film Pada Website Imdb Menggunakan Metode Neural Network,” Netw. Eng. Res. Oper., vol. 7, no. 1, p. 1, 2022.
S. Sathyanarayanan and B. R. Tantri, “Confusion Matrix-Based Performance Evaluation Metrics,” no. November, 2024.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
The copyright of the article that accepted for publication shall be assigned to Jurnal Sisfokom (Sistem Informasi dan Komputer) and LPPM ISB Atma Luhur as the publisher of the journal. Copyright includes the right to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.
Jurnal Sisfokom (Sistem Informasi dan Komputer), LPPM ISB Atma Luhur, and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Sisfokom (Sistem Informasi dan Komputer) are the sole and exclusive responsibility of their respective authors.
Jurnal Sisfokom (Sistem Informasi dan Komputer) has full publishing rights to the published articles. Authors are allowed to distribute articles that have been published by sharing the link or DOI of the article. Authors are allowed to use their articles for legal purposes deemed necessary without the written permission of the journal with the initial publication notification from the Jurnal Sisfokom (Sistem Informasi dan Komputer).
The Copyright Transfer Form can be downloaded [Copyright Transfer Form Jurnal Sisfokom (Sistem Informasi dan Komputer).
This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s). After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted. The copyright form should be signed originally, and send it to the Editorial in the form of scanned document to sisfokom@atmaluhur.ac.id.