Clustering OKU Timur Script Images using VGG Feature extraction and K-Means

Authors

  • Liu Toriko Intelligent Systems Research Group, Faculty of Science Technology, Bina Darma University
  • Susan Dian Purnamasari Intelligent Systems Research Group, Faculty of Science Technology, Bina Darma University
  • Yesi Novaria Kunang Intelligent Systems Research Group, Faculty of Science Technology, Bina Darma University
  • Ilman Zuhri Yadi Intelligent Systems Research Group, Faculty of Science Technology, Bina Darma University
  • Andri Andri Intelligent Systems Research Group, Faculty of Science Technology, Bina Darma University

DOI:

https://doi.org/10.32736/sisfokom.v14i1.2292

Keywords:

OKU Timur Script, VGG16 Model, Clustering, K-Means, Manuscript Images

Abstract

This study focuses on the utilization of clustering models to group manuscript images from the OKU Timur region based on specific characteristics. OKU Timur is rich in cultural heritage, including a unique writing system known as the OKU Timur script. The development of intelligent systems technology can be employed to recognize the OKU Timur script. For this purpose, a dataset of OKU Timur script is needed, which will later be used for classifying script images. One of the challenges in preparing the dataset is grouping a large number of script image samples according to the number of characters. A proposed solution in this research is to automatically group script images by applying the K-Means algorithm.  The dataset comprises 2,280 images, representing 19 characters and 228 variations with different diacritics. Features are extracted using the VGG16 model, which are then clustered with the K-Means algorithm. Clustering performance is evaluated based on the percentage of correctly grouped characters. For 19 groups (character count), the model achieves an accuracy of 82.6%. For 228 groups (variations and diacritics), it correctly groups 48.16% of characters. Despite the challenges, the results demonstrate the model’s potential for further refinement. This study’s contribution lies in introducing an efficient clustering approach for cultural manuscripts, supporting digital preservation, and advancing automatic recognition of the OKU Timur script. These efforts aim to preserve the script for future generations.

References

E. E. Panjaitan and N. Siregar, “THE IMPORTANCE OF LEARNING INDONESIAN LANGUAGE IN PRIMARY SCHOOL,” Ontol. J. PEMBELAJARAN DAN Ilm. Pendidik., vol. 2, no. 1, pp. 37–46, 2024.

E. Roza, “Aksara Arab-Melayu di Nusantara dan Sumbangsihnya dalam Pengembangan Khazanah Intelektual,” Tsaqafah, vol. 13, no. 1, pp. 177–204, 2017.

G. Aceto, V. Persico, and A. Pescapé, “A survey on information and communication technologies for industry 4.0: State-of-the-art, taxonomies, perspectives, and challenges,” IEEE Commun. Surv. Tutor., vol. 21, no. 4, pp. 3467–3501, 2019.

A. Drajat, E. W. Harahap, and others, “Rajah dan Spiritualitas Lokal dalam Hukum Islam; Studi Analisis Tafsir Hermeneutik,” Jurisprudensi J. Ilmu Syariah Perundang-Undangan Dan Ekon. Islam, vol. 16, no. 1, pp. 225–240, 2024.

L. Johanson, “The history of Turkic,” in The Turkic Languages, Routledge, 2021, pp. 83–120.

C. Agus, S. R. Saktimulya, P. Dwiarso, B. Widodo, S. Rochmiyati, and M. Darmowiyono, “Revitalization of local traditional culture for sustainable development of national character building in Indonesia,” Innov. Tradit. Sustain. Dev., pp. 347–369, 2021.

D. Iskandar, S. Hidayat, U. Jamaludin, and S. M. Leksono, “Javanese script digitalization and its utilization as learning media: an etnopedagogical approach,” Int. J. Math. Sci. Educ., vol. 1, no. 1, pp. 21–30, 2023.

I. Siregar, “Papuan Tabla Language Preservation Strategy,” LingLit J. Sci. J. Linguist. Lit., vol. 3, no. 1, pp. 1–12, 2022.

Y. N. Kunang, I. Z. Yadi, Mahmud, and M. Husin, “A New Deep Learning-Based Mobile Application for Komering Character Recognition,” in 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia: IEEE, Dec. 2022, pp. 294–299. doi: 10.1109/ISRITI56927.2022.10053072.

T. P. Sari and Y. N. Kunang, “Pengembangan Aplikasi Transliterasi Teks Latin ke Aksara Ulu (Komering) Berbasis Web,” J. Process., vol. 18, no. 2, 2023.

S. Huang, H. Wang, Y. Liu, X. Shi, and L. Jin, “OBC306: A large-scale oracle bone character recognition dataset,” in 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 681–688.

R. Deng et al., “Automatic Identification of Sea Rice Grains in Complex Field Environment Based on Deep Learning,” Agriculture, vol. 14, no. 7, p. 1135, 2024.

A. E. Ezugwu, A. K. Shukla, M. B. Agbaje, O. N. Oyelade, A. José-García, and J. O. Agushaka, “Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature,” Neural Comput. Appl., vol. 33, pp. 6247–6306, 2021.

I. Chatterjee, M. Ghosh, P. K. Singh, R. Sarkar, and M. Nasipuri, “A clustering-based feature selection framework for handwritten Indic script classification,” Expert Syst., vol. 36, no. 6, p. e12459, 2019.

A. R. Widiarti, G. R. Prima, and C. K. Adi, “Preliminary research for provision of Javanese script image dataset from Javanese script printed book,” in AIP Conference Proceedings, AIP Publishing, 2024.

J. Oyelade et al., “Data clustering: Algorithms and its applications,” in 2019 19th international conference on computational science and its applications (ICCSA), IEEE, 2019, pp. 71–81.

S. Setyaningtyas, B. I. Nugroho, and Z. Arif, “Tinjauan Pustaka Sistematis: Penerapan Data Mining Teknik Clustering Algoritma K-Means,” J. Teknoif Tek. Inform. Inst. Teknol. Padang, vol. 10, no. 2, pp. 52–61, 2022.

A. Ghosal, A. Nandy, A. K. Das, S. Goswami, and M. Panday, “A short review on different clustering techniques and their applications,” Emerg. Technol. Model. Graph. Proc. IEM Graph 2018, pp. 69–83, 2020.

S. Panda, M. Nayak, and A. K. Nayak, “Clustering of Odia character images using K-means algorithm and spectral clustering algorithm,” in ICICCT 2019–System Reliability, Quality Control, Safety, Maintenance and Management: Applications to Electrical, Electronics and Computer Science and Engineering, Springer, 2020, pp. 55–64.

A. R. Widiarti and C. K. Adi, “Clustering Balinese Script Image in Palm Leaf Using Hierarchical K-Means Algorithm,” in International Conference on Innovation in Science and Technology (ICIST 2020), Atlantis Press, 2021, pp. 38–42.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf. Sci., vol. 622, pp. 178–210, 2023.

N. H. Shrifan, M. F. Akbar, and N. A. M. Isa, “An adaptive outlier removal aided k-means clustering algorithm,” J. King Saud Univ.-Comput. Inf. Sci., vol. 34, no. 8, pp. 6365–6376, 2022.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf. Sci., vol. 622, pp. 178–210, 2023.

M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electronics, vol. 9, no. 8, p. 1295, 2020.

S. Sen, P. Chakraborty, S. Das, K. Pandey, and P. Narayana, “Investigation of Clustering Methods for SDSS Galaxy Images through Feature Extraction with VGG-16,” in 2024 IEEE Space, Aerospace and Defence Conference (SPACE), IEEE, 2024, pp. 660–664.

S. Tammina, “Transfer learning using vgg-16 with deep convolutional neural network for classifying images,” Int. J. Sci. Res. Publ. IJSRP, vol. 9, no. 10, pp. 143–150, 2019.

Y. Ren et al., “Deep clustering: A comprehensive survey,” IEEE Trans. Neural Netw. Learn. Syst., 2024.

I. Ioannou, C. Christophorou, P. Nagaradjane, and V. Vassiliou, “Performance Evaluation of Machine Learning Cluster Metrics for Mobile Network Augmentation,” in 2024 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET), IEEE, 2024, pp. 1–7.

W. Bao, N. Lianju, and K. Yue, “Integration of unsupervised and supervised machine learning algorithms for credit risk assessment,” Expert Syst. Appl., vol. 128, pp. 301–315, 2019.

Downloads

Published

2024-12-31

Issue

Section

Articles