The Effect of SMOTE and Optuna Hyperparameter Optimization on TabNet Performance for Heart Disease Classification

Authors

  • Danang Wijayanto Department of Computer Science, University of Amikom, Yogyakarta, Indonesia
  • Robert Marco Department of Computer Science, University of Amikom, Yogyakarta, Indonesia
  • Acihmah Sidauruk Department of Computer Science, University of Amikom, Yogyakarta, Indonesia
  • Mulia Sulistiyono Department of Computer Science, University of Amikom, Yogyakarta, Indonesia

DOI:

https://doi.org/10.32736/sisfokom.v14i2.2348

Keywords:

TabNet, SMOTE, Optuna, Classification, Heart Disease

Abstract

Heart disease is a medical condition affecting the cardiovascular system, disrupting blood circulation and reducing cardiac function efficiency, which can lead to severe health complications. Early diagnosis of heart disease has become increasingly crucial as delayed detection can significantly impact patient outcomes and survival rates. While numerous studies have explored various approaches for heart disease classification, challenges related to data imbalance and improper parameter settings remain persistent issues that affect model performance. This research evaluated the effectiveness of combining TabNet with SMOTE and optuna hyperparameter optimization for heart disease classification. We conducted four experimental scenarios using a heart disease dataset with 303 instances: baseline TabNet, baseline TabNet with SMOTE, TabNet with Optuna, and TabNet with both SMOTE and Optuna. Results demonstrated that applying SMOTE alone to TabNet decreased model performance (accuracy from 85.24% to 77.04%, AUC from 0.89 to 0.83). However, when combining SMOTE with Optuna hyperparameter optimization, we achieved optimal performance with 90.16% accuracy, 93.33% precision, 87.50% recall, 90.32% F1-score, and 0.93 AUC. This represented a significant improvement over other configurations and several previous classification approaches. The integration of SMOTE with Optuna optimization  provided an effective framework for heart disease classification that outperformed traditional methods particularly in discriminative capability as evidenced by the superior AUC score.

References

G.A. Roth, C. Johnson, A. Abajobir, F. Abd-Allah, S.F. Abera, G. Abyu, et al., "Global, regional, and national burden of cardiovascular diseases for 10 causes, 1990 to 2015," J. Am. Coll. Cardiol., vol. 70, no. 1, pp. 1-25, 2017.

M.I. Jordan and T.M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, no. 6245, pp. 255-260, 2015.

S. Falkner, A. Klein, and F. Hutter, "Practical hyperparameter optimization for deep learning," in AutoML: Methods, Systems, Challenges, F. Hutter, L. Kotthoff, and J. Vanschoren, Eds., Cham: Springer, pp. 3-25, 2018.

A. Homaidi and Z. Fatah, "Implementasi metode K-nearest neighbors (KNN) untuk klasifikasi penyakit jantung," G-Tech: Jurnal Teknologi Terapan, vol. 8, no. 3, pp. 1720-1728, 2024.

A. Masruriyah, H. Novita, C. Sukmawati, A. Ramadhan, S. Arif, and B. Dermawan, "Pengukuran kinerja model klasifikasi dengan data oversampling pada algoritma supervised learning untuk penyakit jantung," Computer Science (CO-SCIENCE), vol. 4, no. 1, pp. 62-70, 2024.

S.Ö. Arik and T. Pfister, "TabNet: Attentive interpretable tabular learning," in Proc. AAAI Conf. Artif. Intell., vol. 35, no. 8, pp. 6679-6687, May 2021.

A.R. Raharja, A. Pramudianto, and Y. Muchsam, "Penerapan algoritma decision tree dalam klasifikasi data ‘Framingham’ untuk menunjukkan risiko seseorang terkena penyakit jantung dalam 10 tahun mendatang," Technol. J., vol. 1, no. 1, 2024.

D. Nasien, et al., "Klasifikasi penyakit jantung menggunakan decision tree dan KNN menggunakan ekstraksi fitur PCA," JEKIN-Jurnal Teknik Informatika, vol. 4, no. 1, pp. 18-24, 2024.

T. Indriyani, et al., "Metode decision tree C4.5 untuk klasifikasi penyakit jantung," Prosiding Seminar Nasional Sains dan Teknologi Terapan, no. 1, 2024.

J.J. Tamilselvi and C.B. Gifta, "Handling duplicate data in data warehouse for data mining," Int. J. Comput. Appl., vol. 15, no. 4, pp. 7-15, 2011.

S.K. Kwak and J.H. Kim, "Statistical data preparation: Management of missing values and outliers," Korean J. Anesthesiol., vol. 70, no. 4, pp. 407, 2017.

J. Brownlee, Imbalanced Classification with Python: Better Metrics, Balance Skewed Classes, Cost-Sensitive Learning, Machine Learning Mastery, 2020.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, "Optuna: A next-generation hyperparameter optimization framework," in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 2623-2631, July 2019.

C. Sammut and G. I. Webb, Eds., Encyclopedia of Machine Learning, Springer Science & Business Media, 2011.

S. Sathyanarayanan and B. R. Tantri, “Confusion matrix-based performance evaluation metrics,” Afr. J. Biomed. Res., vol. 4023, pp. 4023-4031, 2024.

A. Tafvizi, B. Avci, and M. Sundararajan, “Attributing AUC-ROC to analyze binary classifier performance,” arXiv preprint arXiv:2205.11781, 2022.

B. Hirwono, A. Hermawan, and D. Avianto, “Implementasi metode Naïve Bayes untuk klasifikasi penderita penyakit jantung,” J. JTIK (Jurnal Teknol. Inf. Komun.), vol. 7, no. 3, pp. 450-457, 2023.

H. M. Nawawi, J. J. Purnama, and A. B. Hikmah, “Komparasi algoritma neural network dan Naïve Bayes untuk memprediksi penyakit jantung,” J. Pilar Nusa Mandiri, vol. 15, no. 2, pp. 189-194, 2019.

R. Firdaus, D. Mualfah, and J. S. Hasanah, “Klasifikasi multi-class penyakit jantung dengan SMOTE dan Pearson’s correlation menggunakan MLP,” J. CoSciTech (Comput. Sci. Inf. Technol.), vol. 4, no. 1, pp. 262-271, 2023.

M. D. I. Baliani, R. R. Huizen, and G. A. Pradipta, “Perbandingan performa data penyakit jantung menggunakan pendekatan klasifikasi boosting methods,” in Seminar Hasil Penelitian Informatika dan Komputer (SPINTER), Institut Teknologi dan Bisnis STIKOM Bali, 2024, pp. 894-899.

A. J. Wahidin, A. E. Setiawan, and P. Bintoro, “Machine learning untuk klasifikasi penyakit jantung,” Aisyah J. Inf. Electr. Eng. (AJIEE), vol. 6, no. 1, pp. 145-150, 2024.

Downloads

Published

2025-05-26

Issue

Section

Articles