Comparison of the Performance of Random Forest and K-Nearest Neighbor in Classifying Leukemia Using Principal Component Analysis

Sriani Sriani(1), Muhammad Ikhsan(2), lailan sofinah harahap(3*)

(1) Universitas Islam Negeri Sumatera Utara Medan
(2) Universitas Islam Negeri Sumatera Utara Medan
(3) Universitas Islam Negeri Sumatera Utara Medan
(*) Corresponding Author

Abstract


Leukemia is the most common blood cancer in Asia, one of which is Indonesia. Leukemia can affect blood cells, bone marrow, lymph nodes and other parts of the lymphatic system. One way to detect leukemia is to use microarray technology by applying gene expression. Microarrays have a very large number of genes so it is necessary to reduce the number of genes in order to eliminate irrelevant features and increase the accuracy of the classification process. The leukemia feature/gene reduction process was carried out using PCA and the classification process was carried out using RF and KNN. The accuracy results from the RF classification method using 100 n_estimators were 78.57%, while using the KNN method the accuracy results with K=1 were 78.57%, K=3 and 5 were 85.71%, and K=7 and 9 were 71.42%. The best accuracy results use KNN with K=3 and 5.

Keywords


Leukimia; Microarray Data; PCA; Random Forest; KNN

Full Text:

PDF

References


D. Prasetya, “Leukemia, Penyakit Kanker Darah, yuk simak selengkapnya,” Hermina Pateur, 2023. https://herminahospitals.com/id/articles/leukemia-penyakit-kanker-darah-yuk-simak-selengkapnya (accessed Jan. 20, 2024).

V. Rupapara, F. Rustam, W. Aljedaani, H. F. Shahzad, E. Lee, and I. Ashraf, “Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model,” Sci. Rep., vol. 12, no. 1, pp. 1–15, 2022, doi: 10.1038/s41598-022-04835-6.

B. Pradana and A. Aditsania, “Implementasi Minimum Redudancy Maksimum Relevance ( MRMR ) dan Genetic Algorithm ( GA ) untuk Reduksi Dimensi pada Klasifikasi Data Micorarray Menggunakan Functional Link Neural Network ( FLNN ),” vol. 6, no. 2, pp. 8966–8977, 2019.

I. G. N. P. V. Geramona and W. Astuti, “Implementasi Minimum Redundancy Maximum Relevance sebagai Teknik Reduksi Dimensi pada Klasifikasi Kanker Usus Besar Menggunakan Random Forest,” vol. 7, no. 1, pp. 2490–2497, 2020.

M. S. L. & P. S. Farah Diba, “Analisis Random Forest Menggunakan Principal Component Analysis Pada Data Berdimensi Tinggi Farah,” Indones. J. Comput. Sci., vol. 12, no. 4, pp. 2152–2160, 2023.

Y. Xiao, J. Wu, Z. Lin, and X. Zhao, “A deep learning-based multi-model ensemble method for cancer prediction,” Comput. Methods Programs Biomed., vol. 153, pp. 1–9, 2018, doi: 10.1016/j.cmpb.2017.09.005.

O. Gal, N. Auslander, Y. Fan, and D. Meerzaman, “Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression,” Cancer Inform., vol. 18, 2019, doi: 10.1177/1176935119835544.

R. M. Awangga and N. H. Khonsa’, “Analisis Performa Algoritma Random Forest dan Naive Bayes Multinomial pada Dataset Ulasan Obat dan Ulasan Film,” InComTech J. Telekomun. dan Komput., vol. 12, no. 1, p. 60, 2022, doi: 10.22441/incomtech.v12i1.14770.

M. Arhami and M. Nasir, Data Mining Algoritma dan Implementasi. Yogyakarta: Penerbit Andi, 2020.

A. U. Zailani and N. L. Hanun, “Penerapan Algoritma Klasifikasi Random Forest Untuk Penentuan Kelayakan Pemberian Kredit Di Koperasi Mitra Sejahtera,” Infotech J. Technol. Inf., vol. 6, no. 1, pp. 7–14, 2020, doi: 10.37365/jti.v6i1.61.

C. Patgiri and A. Ganguly, “Adaptive thresholding technique based classification of red blood cell and sickle cell using Naïve Bayes Classifier and K-nearest neighbor classifier,” Biomed. Signal Process. Control, vol. 68, no. April, p. 102745, 2021, doi: 10.1016/j.bspc.2021.102745.

M. S. Santos, P. H. Abreu, S. Wilk, and J. Santos, “How distance metrics influence missing data imputation with k-nearest neighbours,” Pattern Recognit. Lett., vol. 136, pp. 111–119, 2020, doi: 10.1016/j.patrec.2020.05.032.




DOI: https://doi.org/10.32736/sisfokom.v13i2.2165

Refbacks

  • There are currently no refbacks.



Indexed By:

 



Creative Commons License
Jurnal Sisfokom (Sistem Informasi dan Komputer) has ISSN 2301-7988 and e-ISSN 2581-0588 which is published by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) ISB Atma Luhur under a Creative Commons Attribution-ShareAlike 4.0 International License.
Web Analytics Made Easy - StatCounter