Analyze Important Features of PIMA Indian Database For Diabetes Prediction Using KNN
DOI:
https://doi.org/10.32736/sisfokom.v12i1.1598Keywords:
diabetes prediction, knn, pidd, importance features, machine learningAbstract
Diabetes is a chronic, non-communicable disease, and a long-term health condition that affects how the body uses glucose, the type of sugar that gives energy. In Indonesia, diabetes ranks as the sixth highest cause of death, following conditions related to childbirth. In 2021, Indonesia has a total of 19.5 million diabetes patients, making it the fifth-highest in the world. Some machine learning research has used data from the PIDD (PIMA Indian Diabetes Dataset) to predict diabetes. In this research, in addition to prediction accuracy, data complexity is also important. This research analyzes important features in the PIMA Indian database using the KNN (k-nearest neighbor) method for classification. The results show that using KNN with k=22 value results in the highest accuracy of 83.12%. The analysis also found that the important features required by the KNN method to achieve high accuracy from the PIMA Indian database, in order of importance, are glucose, age, insulin, blood pressure, Body Mass Index, pregnancy, skin thickness, and diabetes pedigree function. However, when used in the KNN classification method, the diabetes pedigree function feature was found to be unnecessary, not relevant, and can be reduced.References
M. G. Simeng Zhu, Indrin Chetty, Farzan Siddiqui, "The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: An analysis of the characteristics and intended use," International Journal of Medical Informatics, 2022, doi: https://doi.org/10.1016/j.ijmedinf.2022.104828.
R. D. S. Khariri, "Transisi Epidemiologi Stroke sebagai Penyebab Kematian pada Semua Kelompok Usia di Indonesia," in Seminar Nasional Riset Kedokteran, 2021, vol. 2, no. 1: Fakultas Kedokteran, UPN Veteran Jakarta.
I. D. Federation, IDF Diabetes Atlas, 10 ed., 2021. [Online]. Available: https://diabetesatlas.org/idfawp/resource-files/2021/07/IDF_Atlas_10th_Edition_2021.pdf.
M. M. R. Sajratul Yakin Rubaiat, Md.Kamrul Hasan, "Important Feature Selection & Accuracy Comparisons of Different Machine Learning Models for Early Diabetes Detection," presented at the 2018 International Conference on Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 2018.
X. Y. Nana Zhang, Xiaolin Zhu, Bin Zhao, Tianyi Huang and Qiuhe Ji, "Type 2 diabetes mellitus unawareness, prevalence, trends and risk factors: National Health and Nutrition Examination Survey (NHANES) 1999–2010," Journal of International Medical Research, vol. 45, no. 2, 2017, doi: https://doi.org/10.1177/0300060517693178.
E. B. B. Ama G. Ampofo, "Beyond 2020: Modelling obesity and diabetes prevalence," Diabetes Research and Clinical Practice, vol. 167, 2020, doi: https://doi.org/10.1016/j.diabres.2020.108362.
O. K. NIKOS FAZAKIS, ELIAS DRITSAS, SOTIRIS ALEXIOU, NIKOS and A. K. M. FAKOTAKIS, "Machine Learning Tools for Long-term Type 2 Diabetes Risk Prediction," IEEE Access, vol. 9, 2021, doi: http://dx.doi.org/10.1109/ACCESS.2021.3098691.
S. S. Sarthak, and Surya Prakash Tripathi, "EmbPred30: Assessing 30-days Readmission for Diabetic Patients using Categorical Embeddings," arXiv, 2020, doi: https://doi.org/10.48550/arXiv.2002.11215.
X. G. Lanxin Miao, Hasan T Abbas, Khalid A Qaraqe, and Qammer H Abbasi, "Using Machine Learning to Predict the Future Development of Disease," presented at the 2020 International Conference on UK-China Emerging Technologies (UCET), Glasgow, UK, 2020.
M. S. Sajida Perveen, Karim Keshavjee, & AzizGuergachi "Prognostic Modeling and Prevention of Diabetes Using Machine Learning Technique," Scientific Reports, 2019, doi: https://doi.org/10.1038/s41598-019-49563-6.
E. S. Yochai Edlitz, "Prediction of type 2 diabetes mellitus onset using logistic regression-based scorecards," eLife, 2022, doi: https://doi.org/10.7554/eLife.71862.
V. S. Salliah Shafi Bhat, Gufran Ahmad Ansari, Mohd Dilshad Ansari, and Md Habibur Rahman, "Prevalence and Early Prediction of Diabetes Using Machine Learning in North Kashmir: A Case Study of District Bandipora," Computational Intelligence and Neuroscience, 2022, doi: https://doi.org/10.1155/2022/2789760.
R. F. M. M. Faniqul Islam, Sadikur Rahman & Humayra Yasmin Bushra, "Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques," Computer Vision and Machine Intelligence in Medical Image Analysis, pp. 113-125, 2019, doi: http://dx.doi.org/10.1007/978-981-13-8798-2_12.
E. D. a. M. Trigka, "Data-Driven Machine-Learning Methods for Diabetes Risk Prediction," Sensors, vol. 22, 2022, doi: https://doi.org/10.3390/s22145304.
J. P. L. Amin Ul Haq , Jalaluddin Khan , Muhammad Hammad Memon, Shah Nazir , Sultan Ahmad , Ghufran Ahmad Khan and Amjad Ali, "Intelligent Machine Learning Approach for Effective Recognition of Diabetes in E-Healthcare Using Clinical Data," Sensors, 2020, doi: https://doi.org/10.3390/s20092649.
R. P. a. B. khuntia, "Analysis and Prediction Of Pima Indian Diabetes Dataset Using SDKNN Classifier Technique," presented at the IOP Conference Series: Materials Science and Engineering, Tamil Nadu, India, 2020.
A. A. Y. Nada Ali Noori, "A Comparative Analysis for Diabetic Prediction Based on Machine Learning Techniques," Journal of Basrah Researches ((Sciences)) 2021.
M. Y. a. l. Shamriz Nahzat, "Diabetes Prediction Using Machine Learning Classification Algorithms," presented at the 2nd International Conference on Access to Recent Advances in Engineering and Digitalization (ARACONF), 10–12 March 2021, 2021.
M. N. A.Thammi Reddy, "Minimal Rule-Based Classifiers using PCA on Pima-Indians-Diabetes-Dataset," International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 12, 2022, doi: https://doi.org/10.35940/ijitee.l2476.1081219.
I. Choudhary. "PIMA Indian Diabetes Prediction - Predicting the onset of diabetes." https://towardsdatascience.com/pima-indian-diabetes-prediction-7573698bd5fe (accessed December 16, 2022).
M. N. Hafsa Binte Kibria, Md. Omaer Faruq Goni, Mominul Ahsan and Julfikar Haider, "An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI," Sensors, 2022, doi: https://doi.org/10.3390/s22197268.
D. R. S. Vaishali R, S Ramasubbareddy, S Remya, Sravani Nalluri, "Genetic algorithm based feature selection and MOE fuzzy classification algorithm on Pima Indians Diabetes dataset," presented at the International Conference on Computing Networking and Informatics (ICCNI), 2017.
M. K. Sanghyuck YOU, "A Study on Methods to Prevent Pima Indians Diabetes using SVM," Korea Journal of Artificial Intelligence, vol. 8, no. 2, pp. 7-10, 2020, doi: http://dx.doi.org/10.24225/kjai.2020.vol8.no2.7.
S. A. S.Velliangiria, S Iwin Thankumar joseph, "A Review of Dimensionality Reduction Techniques for Efficient Computation," presented at the INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING (ICRTAC), 2019.
A. M. A. Rizgar R. Zebari, Diyar Qader Zeebaree, Dilovan Asaad Zebari, Jwan Najeeb Saeed, "A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction," Journal of Applied Science and Technology Trends, vol. 01, no. 02, pp. 56-70, 2020, doi: http://dx.doi.org/10.38094/jastt1224.
D. W. PAN SUN, VINCENT CT MOK, LIN SHI, "Comparison of Feature Selection Methods and Machine Learning Classifiers for Radiomics Analysis in Glioma Grading," IEEE Access, 2019, doi: http://dx.doi.org/10.1051/matecconf/20164206002.
S. D. Kashvi Taunk, Srishti Verma, Aleena Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," presented at the International Conference on Intelligent Computing and Control Systems (ICCS), 2019.
I. H. Shahadat Uddin, Haohui Lu, Mohammad Ali Moni & Ergun Gide "Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction," Scientific Reports, 2022, doi: https://doi.org/10.1038/s41598-022-10358-x.
N. K. Ishan Arora, Mayank Bansal, "Effect of Distance Metric and Feature Scaling on KNN Algorithm while Classifying X-rays," presented at the The 10th Seminary of Computer Science Research at Feminine, 2021.
S. K. P. Sourav Kumar Bhoi, Kalyan Kumar Jena, P. Anshuman Abhisekh, Kshira Sagar Sahoo, Najm Us Sama, Shweta Supriya Pradhan, Rashmi Ranjan Sahoo, "Prediction of Diabetes in Females of Pima Indian Heritage: A Complete Supervised Learning Approach," Turkish Journal of Computer and Mathematics Education, vol. 12, no. 10, 2021, doi: https://doi.org/10.17762/turcomat.v12i10.4958.
Downloads
Additional Files
Published
Issue
Section
License
The copyright of the article that accepted for publication shall be assigned to Jurnal Sisfokom (Sistem Informasi dan Komputer) and LPPM ISB Atma Luhur as the publisher of the journal. Copyright includes the right to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.
Jurnal Sisfokom (Sistem Informasi dan Komputer), LPPM ISB Atma Luhur, and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Sisfokom (Sistem Informasi dan Komputer) are the sole and exclusive responsibility of their respective authors.
Jurnal Sisfokom (Sistem Informasi dan Komputer) has full publishing rights to the published articles. Authors are allowed to distribute articles that have been published by sharing the link or DOI of the article. Authors are allowed to use their articles for legal purposes deemed necessary without the written permission of the journal with the initial publication notification from the Jurnal Sisfokom (Sistem Informasi dan Komputer).
The Copyright Transfer Form can be downloaded [Copyright Transfer Form Jurnal Sisfokom (Sistem Informasi dan Komputer).
This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s). After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted. The copyright form should be signed originally, and send it to the Editorial in the form of scanned document to sisfokom@atmaluhur.ac.id.