Analyze Important Features of PIMA Indian Database For Diabetes Prediction Using KNN

Aziz Perdana(1*), Arief Hermawan(2), Donny Avianto(3)

(1) Universitas Teknologi Yogyakarta
(2) Universitas Teknologi Yogyakarta
(3) Universitas Teknologi Yogyakarta
(*) Corresponding Author


Diabetes is a chronic, non-communicable disease, and a long-term health condition that affects how the body uses glucose, the type of sugar that gives energy. In Indonesia, diabetes ranks as the sixth highest cause of death, following conditions related to childbirth. In 2021, Indonesia has a total of 19.5 million diabetes patients, making it the fifth-highest in the world. Some machine learning research has used data from the PIDD (PIMA Indian Diabetes Dataset) to predict diabetes. In this research, in addition to prediction accuracy, data complexity is also important. This research analyzes important features in the PIMA Indian database using the KNN (k-nearest neighbor) method for classification. The results show that using KNN with k=22 value results in the highest accuracy of 83.12%. The analysis also found that the important features required by the KNN method to achieve high accuracy from the PIMA Indian database, in order of importance, are glucose, age, insulin, blood pressure, Body Mass Index, pregnancy, skin thickness, and diabetes pedigree function. However, when used in the KNN classification method, the diabetes pedigree function feature was found to be unnecessary, not relevant, and can be reduced. 


diabetes prediction; knn; pidd; importance features; machine learning

Full Text:



M. G. Simeng Zhu, Indrin Chetty, Farzan Siddiqui, "The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: An analysis of the characteristics and intended use," International Journal of Medical Informatics, 2022, doi:

R. D. S. Khariri, "Transisi Epidemiologi Stroke sebagai Penyebab Kematian pada Semua Kelompok Usia di Indonesia," in Seminar Nasional Riset Kedokteran, 2021, vol. 2, no. 1: Fakultas Kedokteran, UPN Veteran Jakarta.

I. D. Federation, IDF Diabetes Atlas, 10 ed., 2021. [Online]. Available:

M. M. R. Sajratul Yakin Rubaiat, Md.Kamrul Hasan, "Important Feature Selection & Accuracy Comparisons of Different Machine Learning Models for Early Diabetes Detection," presented at the 2018 International Conference on Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 2018.

X. Y. Nana Zhang, Xiaolin Zhu, Bin Zhao, Tianyi Huang and Qiuhe Ji, "Type 2 diabetes mellitus unawareness, prevalence, trends and risk factors: National Health and Nutrition Examination Survey (NHANES) 1999–2010," Journal of International Medical Research, vol. 45, no. 2, 2017, doi:

E. B. B. Ama G. Ampofo, "Beyond 2020: Modelling obesity and diabetes prevalence," Diabetes Research and Clinical Practice, vol. 167, 2020, doi:

O. K. NIKOS FAZAKIS, ELIAS DRITSAS, SOTIRIS ALEXIOU, NIKOS and A. K. M. FAKOTAKIS, "Machine Learning Tools for Long-term Type 2 Diabetes Risk Prediction," IEEE Access, vol. 9, 2021, doi:

S. S. Sarthak, and Surya Prakash Tripathi, "EmbPred30: Assessing 30-days Readmission for Diabetic Patients using Categorical Embeddings," arXiv, 2020, doi:

X. G. Lanxin Miao, Hasan T Abbas, Khalid A Qaraqe, and Qammer H Abbasi, "Using Machine Learning to Predict the Future Development of Disease," presented at the 2020 International Conference on UK-China Emerging Technologies (UCET), Glasgow, UK, 2020.

M. S. Sajida Perveen, Karim Keshavjee, & AzizGuergachi "Prognostic Modeling and Prevention of Diabetes Using Machine Learning Technique," Scientific Reports, 2019, doi:

E. S. Yochai Edlitz, "Prediction of type 2 diabetes mellitus onset using logistic regression-based scorecards," eLife, 2022, doi:

V. S. Salliah Shafi Bhat, Gufran Ahmad Ansari, Mohd Dilshad Ansari, and Md Habibur Rahman, "Prevalence and Early Prediction of Diabetes Using Machine Learning in North Kashmir: A Case Study of District Bandipora," Computational Intelligence and Neuroscience, 2022, doi:

R. F. M. M. Faniqul Islam, Sadikur Rahman & Humayra Yasmin Bushra, "Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques," Computer Vision and Machine Intelligence in Medical Image Analysis, pp. 113-125, 2019, doi:

E. D. a. M. Trigka, "Data-Driven Machine-Learning Methods for Diabetes Risk Prediction," Sensors, vol. 22, 2022, doi:

J. P. L. Amin Ul Haq , Jalaluddin Khan , Muhammad Hammad Memon, Shah Nazir , Sultan Ahmad , Ghufran Ahmad Khan and Amjad Ali, "Intelligent Machine Learning Approach for Effective Recognition of Diabetes in E-Healthcare Using Clinical Data," Sensors, 2020, doi:

R. P. a. B. khuntia, "Analysis and Prediction Of Pima Indian Diabetes Dataset Using SDKNN Classifier Technique," presented at the IOP Conference Series: Materials Science and Engineering, Tamil Nadu, India, 2020.

A. A. Y. Nada Ali Noori, "A Comparative Analysis for Diabetic Prediction Based on Machine Learning Techniques," Journal of Basrah Researches ((Sciences)) 2021.

M. Y. a. l. Shamriz Nahzat, "Diabetes Prediction Using Machine Learning Classification Algorithms," presented at the 2nd International Conference on Access to Recent Advances in Engineering and Digitalization (ARACONF), 10–12 March 2021, 2021.

M. N. A.Thammi Reddy, "Minimal Rule-Based Classifiers using PCA on Pima-Indians-Diabetes-Dataset," International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 12, 2022, doi:

I. Choudhary. "PIMA Indian Diabetes Prediction - Predicting the onset of diabetes." (accessed December 16, 2022).

M. N. Hafsa Binte Kibria, Md. Omaer Faruq Goni, Mominul Ahsan and Julfikar Haider, "An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI," Sensors, 2022, doi:

D. R. S. Vaishali R, S Ramasubbareddy, S Remya, Sravani Nalluri, "Genetic algorithm based feature selection and MOE fuzzy classification algorithm on Pima Indians Diabetes dataset," presented at the International Conference on Computing Networking and Informatics (ICCNI), 2017.

M. K. Sanghyuck YOU, "A Study on Methods to Prevent Pima Indians Diabetes using SVM," Korea Journal of Artificial Intelligence, vol. 8, no. 2, pp. 7-10, 2020, doi:

S. A. S.Velliangiria, S Iwin Thankumar joseph, "A Review of Dimensionality Reduction Techniques for Efficient Computation," presented at the INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING (ICRTAC), 2019.

A. M. A. Rizgar R. Zebari, Diyar Qader Zeebaree, Dilovan Asaad Zebari, Jwan Najeeb Saeed, "A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction," Journal of Applied Science and Technology Trends, vol. 01, no. 02, pp. 56-70, 2020, doi:

D. W. PAN SUN, VINCENT CT MOK, LIN SHI, "Comparison of Feature Selection Methods and Machine Learning Classifiers for Radiomics Analysis in Glioma Grading," IEEE Access, 2019, doi:

S. D. Kashvi Taunk, Srishti Verma, Aleena Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," presented at the International Conference on Intelligent Computing and Control Systems (ICCS), 2019.

I. H. Shahadat Uddin, Haohui Lu, Mohammad Ali Moni & Ergun Gide "Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction," Scientific Reports, 2022, doi:

N. K. Ishan Arora, Mayank Bansal, "Effect of Distance Metric and Feature Scaling on KNN Algorithm while Classifying X-rays," presented at the The 10th Seminary of Computer Science Research at Feminine, 2021.

S. K. P. Sourav Kumar Bhoi, Kalyan Kumar Jena, P. Anshuman Abhisekh, Kshira Sagar Sahoo, Najm Us Sama, Shweta Supriya Pradhan, Rashmi Ranjan Sahoo, "Prediction of Diabetes in Females of Pima Indian Heritage: A Complete Supervised Learning Approach," Turkish Journal of Computer and Mathematics Education, vol. 12, no. 10, 2021, doi:



  • There are currently no refbacks.

Indexed By:


Creative Commons License
Jurnal Sisfokom (Sistem Informasi dan Komputer) has ISSN 2301-7988 and e-ISSN 2581-0588 which is published by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) ISB Atma Luhur under a Creative Commons Attribution-ShareAlike 4.0 International License.
Web Analytics Made Easy - StatCounter