Decision Tree based Data Modelling for First Detection of Thalassemia Major

Authors

  • Yohanes Setiawan Institut Teknologi Telkom Surabaya
  • Oktavia Ayu Permata Institut Teknologi Telkom Surabaya
  • M. Pradata Yuda Institut Teknologi Telkom Surabaya

DOI:

https://doi.org/10.32736/sisfokom.v13i1.1949

Keywords:

Data Mining, Decision Tree, Thalassemia Major

Abstract

Thalassemia is an inherited blood disease which lacks hemoglobin, the protein that is carrying oxygen to the body. The severe one is called Thalassemia Major which needs special care about blood transfusion. The use of rule-based method to create an inference as the first diagnosis of Thalassemia Major is not effective as rules have to be achieved from long interview with the medical personnel. This research aims to create a model based on decision tree for first detection of Thalassemia Major. The dataset is obtained by interview of Thalassemia symptoms and primary data of medical records from a hospital. Classical decision tree models used are ID3, C4.5 and CART. The models are evaluated by Train-Test Split consists of 70% training and 30% testing data and k-Fold Validation for checking model’s overfitting or underfitting. The output of this research is a final tree model from the best performance of decision tree models. The final result shows that C4.5 has the best performance with accuracy 100% and not overfitting or underfitting. Also, C4.5 performs feature selections to its tree modeling to simplify the inference. In brief, decision tree based modeling is effective to be used as first detection of Thalassemia Major by interview symptoms with generating automatic rules from its tree model.

References

D. Kristanty, D. Diyah, K. Rediyanto, and M. Si, “Analisis Polimorfieme Gen CYP pada Metabolisme Obat Deteksi Dini Thalassemia,” Pratista Patologi, vol. 8, no. 1, pp. 17–28, Jan. 2023.

M. Q. Mohammed and J. M. Al-Tuwaijari, “A Survey on various Machine Learning Approaches for thalassemia detection and classification,” Turkish Journal of Computer and Mathematics Education, vol. 12, no. 13, pp. 7866–7871, 2021, doi: 10.1182/BLOODADVANCES.2020002725.

E. R. Susanto, A. Syarif, K. Muludi, R. R. W. Perdani, and A. Wantoro, “Implementation of Fuzzy-based Model for Prediction of Thalassemia Diseases,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Jan. 2021. doi: 10.1088/1742-6596/1751/1/012034.

K. Ferih et al., “Applications of Artificial Intelligence in Thalassemia: A Comprehensive Review,” Diagnostics, vol. 13, no. 9, May 2023, doi: 10.3390/diagnostics13091551.

Z. N. Nugroho, A. Harjoko, and M. Auzan, “Klasifikasi Eritrosit Pada Thalasemia Minor Menggunakan Fitur Konvolusi dan Multi-Layer Perceptron,” IJEIS (Indonesian Journal of Electronics and Instrumentation Systems), vol. 13, no. 1, Apr. 2023, doi: 10.22146/ijeis.83473.

F. Hafidh, M. Arsyad Al Banjari Banjarmasin, and S. Kalimantan Indonesia, “Enhancing Special Needs Identification for Children: A Comparative Study on Classification Methods Using ID3 Algorithm and Alternative Approaches,” Journal of Engineering, Electrical and Informatics, vol. 3, no. 2, 2023, doi: 10.55606/jeei.v3i1.1468.

U. Muhammadiyah Jember, M. Yogi Firmansyah, and D. Lusiana Pater, “Penerapan Algoritma Itterative Dechotomiser 3 (ID3) Untuk Klasifikasi Penyakit Tifoid Application of Itterative Dechotomiser 3 (ID3) Algorithm for Typhoid Disease Classification,” 2023. [Online]. Available: http://jurnal.unmuhjember.ac.id/index.php/JST

Z. Sitorus and A. Widarma, “Data Mining Algoritma Decision Tree Itterative Dechotomiser 3 (ID3) untuk Klasifikasi Penyakit Stroke Data Mining Algorithm Decision Tree Itterative Dechotomiser 3 (ID3) for Classification of Stroke,” Journal of Computing Engineering, System and Science), vol. 8, no. 2, pp. 554–563, 2023, [Online]. Available: www.jurnal.unimed.ac.id

M. Tohir, D. Andariya Ningsih, N. Yuli Susanti, A. Umiyah, and L. Fitria, “Comparison of the Performance Results of C4.5 and Random Forest Algorithm in Data Mining to Predict Childbirth Process,” 2023.

A. Iftitah and R. Setyadi, “Penerapan Algoritma C.45 Untuk Analisis Pengadaan Peralatan dan Mesin Kantor,” Journal of Information System Research (JOSH), vol. 4, no. 2, pp. 434–442, Jan. 2023, doi: 10.47065/josh.v4i2.2673.

J. Prayoga, Z. Gustiana, and S. A. Rahmah, “Applying Data Mining to Classify Customer Satisfaction using C4.5 Algorithm Decision Tree,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 2, Apr. 2023, doi: 10.22146/ijccs.83535.

R. Nicholas Reinaldo and S. Dwiasnati, “Prediction of Customer Data Classification by Company Category Using Decision Tree Algorithm (Case Study: PT. Teknik Kreasi Solusindo),” International Journal of Advanced Multidisciplinary, vol. 2, no. 2, Jul. 2023, doi: 10.38035/ijam.v2i2.

S. Monalisa and F. Hadi, “Penerapan Algoritma CART Dalam Menentukan Jurusan Siswa di MAN 1 Inhil,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 9, no. 3, pp. 387–394, Oct. 2020, doi: 10.32736/sisfokom.v9i3.932.

F. Melani and Sulastri, “Analisis Perbandingan Klasifikasi Algoritma CART dengan Algoritma C4.5 Pada Kasus Penderita Kanker Payudara,” Jurnal TEKNO KOMPAK, vol. 17, no. 1, pp. 171–183, 2023, [Online]. Available: https://www.kaggle.com/datasets/reihanenamdari/breast-cancer.

F. Maisa Hana, W. Cholid Wahyudin, S. Ulya, and D. Setia Negara, “Implementasi Algoritma CART dalam Klasifikasi Penyakit Diabetes,” Jurnal Ilmu Komputer dan Matematika, pp. 1–8, 2023.

A. Jananto, S. Sulastri, E. Nur Wahyudi, and S. Sunardi, “Data Induk Mahasiswa sebagai Prediktor Ketepatan Waktu Lulus Menggunakan Algoritma CART Klasifikasi Data Mining,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 10, no. 1, pp. 71–78, Feb. 2021, doi: 10.32736/sisfokom.v10i1.991.

Y. Setiawan, “Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara,” Jurnal Pengembangan IT (JPIT), vol. 8, no. 2, pp. 89–96, Jul. 2023.

M. Hamdi, I. Hilali-Jaghdam, B. E. Elnaim, and A. A. Elhag, “Forecasting and classification of new cases of COVID 19 before vaccination using decision trees and Gaussian mixture model,” Alexandria Engineering Journal, vol. 62, pp. 327–333, Jan. 2023, doi: 10.1016/j.aej.2022.07.011.

A. Agarwal, K. Jain, and R. K. Yadav, “A mathematical model based on modified ID3 algorithm for healthcare diagnostics model,” International Journal of System Assurance Engineering and Management, 2023, doi: 10.1007/s13198-023-02086-w.

C. Liu, J. Lai, B. Lin, and D. Miao, “An improved ID3 algorithm based on variable precision neighborhood rough sets,” Applied Intelligence, Jul. 2023, doi: 10.1007/s10489-023-04779-y.

A. Sunarto, P. N. Kencana, B. Munadjat, I. K. Dewi, A. Z. Abidin, and R. Rahim, “Application of Boosting Technique with C4.5 Algorithm to Reduce the Classification Error Rate in Online Shoppers Purchasing Intention,” J Wirel Mob Netw Ubiquitous Comput Dependable Appl, vol. 14, no. 2, pp. 01–11, Jun. 2023, doi: 10.58346/JOWUA.2023.I2.001.

A. S. R. Siregar, Y. S. Siregar, and M. Khairani, “Implementation Of The Data Mining Cart Algorithm In The Characteristic Pattern Of New Student Admissions,” Journal of Computer Networks, Architecture and High Performance Computing, vol. 5, no. 1, pp. 263–275, Feb. 2023, doi: 10.47709/cnahpc.v5i1.1975.

Downloads

Published

2024-02-12

Issue

Section

Articles