Decision Tree based Data Modelling for First Detection of Thalassemia Major

Yohanes Setiawan(1), Oktavia Ayu Permata(2*), M. Pradata Yuda(3)

(1) Institut Teknologi Telkom Surabaya
(2) Institut Teknologi Telkom Surabaya
(3) Institut Teknologi Telkom Surabaya
(*) Corresponding Author


Thalassemia is an inherited blood disease which lacks hemoglobin, the protein that is carrying oxygen to the body. The severe one is called Thalassemia Major which needs special care about blood transfusion. The use of rule-based method to create an inference as the first diagnosis of Thalassemia Major is not effective as rules have to be achieved from long interview with the medical personnel. This research aims to create a model based on decision tree for first detection of Thalassemia Major. The dataset is obtained by interview of Thalassemia symptoms and primary data of medical records from a hospital. Classical decision tree models used are ID3, C4.5 and CART. The models are evaluated by Train-Test Split consists of 70% training and 30% testing data and k-Fold Validation for checking model’s overfitting or underfitting. The output of this research is a final tree model from the best performance of decision tree models. The final result shows that C4.5 has the best performance with accuracy 100% and not overfitting or underfitting. Also, C4.5 performs feature selections to its tree modeling to simplify the inference. In brief, decision tree based modeling is effective to be used as first detection of Thalassemia Major by interview symptoms with generating automatic rules from its tree model.


Data Mining; Decision Tree; Thalassemia Major

