The Effect of the SMOTE Method on the Classification of Toddler Nutritional Status Using the Naïve Bayes Method

Dewi Sartika

doi:10.32736/sisfokom.v14i3.2381

Authors

Dewi Sartika Department of Informatics Management, University of Sriwijaya

DOI:

https://doi.org/10.32736/sisfokom.v14i3.2381

Keywords:

Multinomial Naive Bayes, SMOTE, Stunting

Abstract

The first five years of life are a golden age for growth and development, so fulfilling nutritional intake during this period is very important to avoid stunting or growth failure. The problem of stunting is still the focus of the government because it is related to nutrition which is one of the key aspects for the development of qualified resources as well as in national development. According to the report of the Ministry of Health in 2023, it was stated that the results of the 2023 Indonesian Health Survey showed that there had been a decreasing in the prevalence of stunting over the past 10 years but it had not been able to meet the target of the 2020-2024 National Medium-Term Development Plan of 14% in 2024. This study will classify the toddler’s nutritional status using the Naive Bayes method. This method uses a probability technique with Bayes' theorem which is based on the assumption of mutually independent and equal conditions. The calculation of the Naive Bayes probability in this study uses the Multinomial distribution because the data used is discrete data. The total numbers of toddlers’ nutritional status data obtained was 245 data, with 4 invalid data. Based on the data set owned, the number of samples for each class label had an unbalanced number. One method could be used to handle this unbalanced data is the random oversampling method, Synthetic Minority Oversampling (SMOTE). SMOTE will create synthetic data randomly to balance minority data samples. The analysis and testing results showed that in Multinomial Naive Bayes with the 10-cross validation technique, the g-means value obtained on the original data set was 44.98% while in the balanced data set the g-means value was 80.06%. In Multinomial Naive Bayes with the split validation technique, the g-means value obtained on the original data set was 44.20% while in the balanced data set was 80.06%. This showed that there was an increase in the g-means value of 35%. It can be stated that the SMOTE method effectively improves the overall capability of the Multinomial Naive Bayes model.

References

J. I. Kesehatan, S. Husada, and K. Rahmadhita, “Permasalahan Stunting dan Pencegahannya,” Juni, vol. 11, no. 1, pp. 225–229, 2020, doi: 10.35816/jiskh.v10i2.253.

C. R. Titaley, I. Ariawan, D. Hapsari, A. Muasyaroh, and M. J. Dibley, “Determinants of the stunting of children under two years old in Indonesia: A multilevel analysis of the 2013 Indonesia basic health survey,” Nutrients, vol. 11, no. 5, May 2019, doi: 10.3390/nu11051106.

H. S. Mediani, “Predictors of Stunting Among Children Under Five Year of Age in Indonesia: A Scoping Review,” Glob J Health Sci, vol. 12, no. 8, p. 83, Jun. 2020, doi: 10.5539/gjhs.v12n8p83.

Kementerian Kesehatan, “Stunting di Indonesia dan Determinannya.”

D. Sartika, I. Saluza, and M. H. Irfani, “Perbandingan Akurasi Metode Principal Component Analysis (PCA) dan Correlation-Based Feature Selection (CFS) Pada Klasifikasi Perpanjangan Kontrak Karyawan Menggunakan Metode Naïve Bayes,” Jurnal Informatika Global, vol. 13, no. 2, pp. 82–87, 2022, doi: 10.36982/jiig.v13i2.2292.

R. Wijaya, Suciati Nanik, and W. N. Khotimah, “Implementasi Nearest Neighbour pada Data Kategorik dengan Pembobotan Atribut Menggunakan Weighted Simple Matching Coefficient,” Jurnal Teknik ITS, vol. 6, no. 2, pp. A468–A471, 2017.

E. R. Arumi, Sumarno Adi Subrata, and Anisa Rahmawati, “Implementation of Naïve bayes Method for Predictor Prevalence Level for Malnutrition Toddlers in Magelang City,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 2, pp. 201–207, Mar. 2023, doi: 10.29207/resti.v7i2.4438.

E. Faizal, “Case Based Reasoning Diagnosis Penyakit Cardiovascular Dengan Metode Simple Matching Coefficient Similarity,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 1, no. 2, pp. 83–90, 2014.

D. A. Kristiyanti, A. H. Umam, M. Wahyudi, R. Amin, and L. Marlinda, “Comparison of SVM & Naive Bayes Algorithm for Sentiment Analysis Toward West Java Governor Candidate Period 2018-2023 Based on Public Opinion on Twitter,” in The 6th International Conference on Cyber and IT Service Management (CITSM 2018), 2018. [Online]. Available: www.twitter.com

E. Apriliyani and Y. Salim, “Analisis performa metode klasifikasi Naïve Bayes Classifier pada Unbalanced Dataset,” Indonesian Journal of Data and Science (IJODAS), vol. 3, no. 2, pp. 47–54, 2022.

G. Gumelar and H. Al Fatta, “Kombinasi Algoritma Klasifikasi Dengan Algoritma Oversampling Untuk Menangani Ketidakseimbangan Kelas Pada Level Data,” vol. 10, no. 2, pp. 29–39, 2023, [Online]. Available: http://jurnal.mdp.ac.id

Z. Abdurahman Baizal, M. Arif Bijaksana, and A. S. Sastrawan, “Analisis Pengaruh Metode Over Sampling Dalam Churn Prediction Untuk Perusahaan Telekomunikasi,” Seminar Nasional Aplikasi Teknologi Informasi, pp. 61–66, 2009.

A. Surya Firmansyah, A. Aziz, and M. Ahsan, “Optimasi K-Nearest Neighbor Menggunakan Algoritma SMOTE Untuk Mengatasi Imbalance Class Pada Klasifikasi Analisis Sentimen,” Jurnal Mahasiswa Teknik Informatika, vol. 7, no. 6, pp. 3341–3347, 2023.

A. Prawita Ningrum, S. Winarno, and V. Praskatama, “Klasifikasi Kualitas Biji Kedelai Menggunakan Transfer Learning Convolutional Neural Network dan SMOTE,” Journal of Applied Computer Science and Technology, vol. 5, no. 2, pp. 155–164, Dec. 2024, doi: 10.52158/jacost.v5i2.1002.

A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, “Machine learning algorithm validation with a limited sample size,” PLoS One, vol. 14, no. 11, Nov. 2019, doi: 10.1371/journal.pone.0224365.

I. Tougui, A. Jilbab, and J. El Mhamdi, “Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications,” Healthc Inform Res, vol. 27, no. 3, pp. 189–199, Jul. 2021, doi: 10.4258/HIR.2021.27.3.189.

T. T. Wong and H. C. Tsai, “Multinomial Naive Bayesian classifier with generalized Dirichlet priors for high-dimensional imbalanced data,” Knowl Based Syst, vol. 228, pp. 1–8, Sep. 2021, doi: 10.1016/j.knosys.2021.107288.

I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,” International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61–71, Dec. 2021, doi: 10.5815/ijitcs.2021.06.05.

C. Muhamad Sidik Ramdani, A. N. Rachman, and R. Setiawan, “Comparison of the Multinomial Naive Bayes Algorithm and Decision Tree with the Application of AdaBoost in Sentiment Analysis Reviews PeduliLindungi Application,” International Journal of Information System & Technology Akreditasi, vol. 6, no. 158, pp. 419–430, 2022.

F. Ramadhani, A. Satria, and I. P. Sari, “Implementasi Metode Fuzzy K-Nearest Neighbor dalam Klasifikasi Penyakit Demam Berdarah,” Hello World Jurnal Ilmu Komputer, vol. 2, no. 2, pp. 58–62, May 2023, doi: 10.56211/helloworld.v2i2.253.

W. Irmayani, “Visualisasi Data pada Data Mining Menggunakan Metode Klasifikasi Naive Bayes,” Jurnal Khatulistiwa Informatika, vol. 9, no. 1, pp. 68–72, 2021, [Online]. Available: www.bsi.ac.id

M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

R. Siringoringo, “Klasifikasi Data Tidak Seimbang Menggunakan Algoritma SMOTE Dan k-Nearest Neighbor,” Jurnal ISD, vol. 3, no. 1, pp. 44–49, 2018.