Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix

Authors

  • Ellya Helmud Sistem Informasi, Fakultas Teknologi Informasi ISB Atma Luhur
  • ellya helmud
  • Fitriyani Fitriyani Sistem Informasi, Fakultas Teknologi Informasi ISB Atma Luhur
  • Parlia Romadiana Sistem Informasi, Fakultas Teknologi Informasi ISB Atma Luhur

DOI:

https://doi.org/10.32736/sisfokom.v13i1.1985

Keywords:

Data Mining, Classification, Random Forest, Decision Tree, Confusion Matrix

Abstract

The classification method is part of data mining which is used to predict existing problems and also as predictions for the future. The form of dataset used in the classification method is supervised data. The random forest classification method is processed by forming several decision trees and then combining them to get better and more precise predictions. while a decision tree is the concept of changing a pile of data into a decision tree that presents the rules of a decision. From these two classification methods, researchers will compare the level of accuracy of predictions from both methods with the same dataset, namely the employee dataset in India, to predict the level of accuracy of employees who leave their jobs or still remain to work at their company. The number of records available is 4654 records. Of the existing data, 90% was used as training data and 10% was used as test data. From the results of testing this method, it was found that the accuracy level of the random forest method was 86.45%, while the decision tree method was 84.30% accuracy level. Then, by using the confusion matrix, you can see the magnitude of the distribution of experimental validity visually to calculate precision, recall and F1-Score. The random forest algorithm obtained precision of: 96.7%, sensitivity of: 84.7%, specificity of: 91.4%, and F1-Score of: 90.2%. Meanwhile, the decision tree algorithm obtained precision of: 95.7%, sensitivity of: 82.9%, specificity of: 88.4%, and F1-Score of: 88.8%.

References

P. Han, Kamber, Data Mining Concepts and Techniques. 2012.

F. A. Hermawati, “Data Mining,” no. January, 2018.

T. Lan, H. Hu, C. Jiang, G. Yang, and Z. Zhao, “ScienceDirect A comparative study of decision tree , random forest , and convolutional neural network for spread-F identification,” Adv. Sp. Res., vol. 65, no. 8, pp. 2052–2061, 2020, doi: 10.1016/j.asr.2020.01.036.

L. Breiman, “Random Forest,” pp. 1–33, 2001.

A. Prabowo, S. Wardani, R. W. Dewantoro, and W. Wesly, “Komparasi Tingkat Akurasi Random Forest dan Decision Tree C4.5 Pada Klasifikasi Data Penyakit Infertilitas,” vol. 4, no. 1, pp. 218–224, 2023, doi: 10.30865/klik.v4i1.1115.

C. Science, “U niversity of L iège,” no. July, 2014.

C. Curtis, C. Liu, T. J. Bollerman, and O. S. Pianykh, “Machine Learning for Predicting Patient Wait Times and Appointment Delays,” J. Am. Coll. Radiol., no. Ml, pp. 1–7, 2017, doi: 10.1016/j.jacr.2017.08.021.

P. Bhargav and K. Sashirekha, “A Machine Learning Method for Predicting Loan Approval by Comparing the Random Forest and Decision Tree Algorithms .,” vol. 10, pp. 1803–1813, 2023.

N. Sunanto and G. Falah, “Penerapan Algoritma C4.5 Untuk Membuat Model Prediksi Pasien Yang Mengidap Penyakit Diabetes,” Rabit J. Teknol. dan Sist. Inf. Univrab, vol. 7, no. 2, pp. 208–216, 2022, doi: 10.36341/rabit.v7i2.2435.

R. Estian Pambudi, Sriyanto, and Firmansyah, “Klasifikasi Penyakit Stroke Menggunakan Algoritma Decision Tree C.45,” Ijccs, vol. x, No.x, no. x, pp. 1–5, 2022.

M. Ardiansyah, A. Sunyoto, and E. T. Luthfi, “Analisis Perbandingan Akurasi Algoritma Naïve Bayes Dan C4.5 untuk Klasifikasi Diabetes,” Edumatic J. Pendidik. Inform., vol. 5, no. 2, pp. 147–156, 2021, doi: 10.29408/edumatic.v5i2.3424.

Saifullah, Muhammad Zarlis, Zakaria Zakaria, Rahmat Widia Sembiring, “Analisa Terhadap Perbandingan Algoritma Decision Tree Dengan Algoritma Random Tree Untuk Pre-Prosesing Data,” J-SAKTI (Jurnal Sains Komputer & Informatika), Vol 1, No 2 (2017)

Svetnik V 2003 Random forest: a classification and regression tool for compound classification and QSAR modeling J. Journal of Chemical Information & Computer Sciences 1 43

Alvita Izana Kusumarini, Pandu Ananto Hogantara, Muammar Fadhlurohman, Nurul Chamidah, “Perbandingan Algoritma Random Forest, Naive Bayes, Dan Decision Tree Dengan Oversampling Untuk Klasifikasi Bakteri E.Coli,” Prosiding SENAMIKA, Vol 2, No 1 (2021)

Mehul Madaan, Aniket Kumar, Chirag Keshri, Rachna Jain and Preeti Nagrath “ Loan default prediction using decision trees and random forest: A comparative study ” 1st International Conference on Computational Research and Data Analytics (ICCRDA 2020) 24th October 2020, Rajpura, India

Simon Heqelich, “ Decision Trees and Random Forests : Machine Learning Techniques to Classify Rare Events” Vol 2, Issue 1 Spring 2016

Downloads

Published

2024-02-15

Issue

Section

Articles