Klasifikasi Multi Label pada Hadis Bukhari Terjemahan Bahasa Indonesia Menggunakan Mutual Information dan k-Nearest Neighbor

Afrian Hanafi(1), Adiwijaya Adiwijaya(2), Widi Astuti(3*)

(1) School of Computing, Telkom University
(2) School of Computing, Telkom University
(3) School of Computing, Telkom University
(*) Corresponding Author

Abstract


Hadith is the second source of law for Muslims after the Qur'an which comes from various forms of the words, actions and stipulations of the Prophet Muhammad or referred to as his sunnah. In order to make it easier for Muslims to apply the teachings of the hadiths, a classification system is needed that can categorize a hadith into a class or a combination of two of the three classes which called a multi-label classification. In building a text classification system, there are various classification techniques, one of which is k-Nearest Neighbor (KNN). KNN is a simple and effective classification method for text classification, but has a weakness in processing data with high vector dimensions so that the computation time is higher and the efficiency of text classification is very low. Mutual Information (MI) is used as a feature selection method to reduce vector dimensions because it has the ability to show how strong a feature is in making a correct prediction of a class. In this study Problem Transformation Method with the Binary Relevance (BR) approach is used so that the multi label classification process can be accomplished. The optimum results obtained in this study shows the value of hamming loss is 0.0886 or about 91.14% of data were correctly classified and computational time for 595 seconds by using MI as a feature selection, but without stemming.

Keywords


multi-label classification; bukhari’s hadith; k-nearest neighbor; mutual information; hamming loss

Full Text:

PDF

References


K. A. Aldhlan, A. M. Zeki, A. M. Zeki, and H. A. Alreshidi, “Novel mechanism to improve hadith classifier performance,” in Proceedings - 2012 International Conference on Advanced Computer Science Applications and Technologies, ACSAT 2012, 2013, pp. 512–517.

M. D. Purbolaksono, F. D. Reskyadita, Adiwijaya, A. A. Suryani, and A. F. Huda, “Indonesian text classification using back propagation and sastrawi stemming analysis with information gain for selection feature,” Int. J. Adv. Sci. Eng. Inf. Technol., no. 1, pp. 234–238, 2020.

G. Mediamer, adiwijaya@telkomuniversity ac id Adiwijaya, and S. Al Faraby, “Development of rule-based feature extraction in multi-label text classification,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 4, pp. 1460–1465, 2019.

S. Jiang, G. Pang, M. Wu, and L. Kuang, “An improved K-nearest-neighbor algorithm for text categorization,” Expert Syst. Appl., vol. 39, no. 1, pp. 1503–1509, Jan. 2012.

Z. Yong, L. Youwen, and X. Shixiong, “An Improved KNN Text Classification Algorithm Based on Clustering,” 2009.

Adiwijaya, M. N. Aulia, M. S. Mubarok, W. Untari Novia, and F. Nhita, “A comparative study of MFCC-KNN and LPC-KNN for hijaiyyah letters pronounciation classification system,” in 2017 5th International Conference on Information and Communication Technology, ICoIC7 2017, 2017.

L. G. Irham, A. Adiwijaya, and U. N. Wisesty, “Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine,” J. Media Inform. Budidarma, vol. 3, no. 4, p. 284, 2019.

M. A. Ulfa, B. Irmawati, and A. Y. Husodo, “Twitter Sentiment Analysis using Na¨ive Bayes Classifier with Mutual Information Feature Selection,” J. Comput. Sci. Informatics Eng., vol. 2, no. 2, pp. 106–111, Dec. 2018.

M. L. Zhang and Z. H. Zhou, “A review on multi-label learning algorithms,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 8. IEEE Computer Society, pp. 1819–1837, 2014.

S. Al Faraby, E. R. R. Jasin, A. Kusumaningrum, and Adiwijaya, “Classification of hadith into positive suggestion, negative suggestion, and information,” in Journal of Physics: Conference Series, 2018, vol. 971, no. 1.

H. Prasetyo, Adiwijaya, and W. Astuti, “Klasifikasi Multi -Label pada Hadis Bukhari dalam Terjemahan Bahasa Indonesia Menggunakan Mutual Information dan Backpropagation Neural Network,” vol. 6, no. 2, pp. 9086–9098, 2019.

A. O. Adeleke, N. A. Samsudin, A. Mustapha, and N. M. Nawi, “Comparative Analysis of Text Classification Algorithms for Automated Labelling of Quranic Verses,” vol. 7, no. 4, 2017.

N. Isnaini, Adiwijaya, M. S. Mubarok, and M. Y. A. Bakar, “A multi-label classification on topics of Indonesian news using K-Nearest Neighbor,” in Journal of Physics: Conference Series, 2019, vol. 1192, no. 1.

X. F. Zhang, H. Y. Huang, and K. L. Zhang, “KNN text categorization algorithm based on semantic centre,” in Proceedings - 2009 International Conference on Information Technology and Computer Science, ITCS 2009, 2009, vol. 1, pp. 249–252.

A. I. Pratiwi and Adiwijaya, “On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis,” Appl. Comput. Intell. Soft Comput., vol. 2018, 2018.

R. Bintang Purnomoputra and U. Novia Wisesty, “Sentiment Analysis of Movie Reviews using Naïve Bayes Method with Gini Index Feature Selection,” OPEN ACCESS J DATA SCI APPL, vol. 2, no. 2, pp. 85–094, Nov. 2019.

J. Kaur and J. Saini, “A Study of Text Classification Natural Language Processing Algorithms for Indian Languages,” VNSGU J. Sci. Technol., vol. 4, no. 1, pp. 162–167, 2015.

R. A. Pane, M. S. Mubarok, N. S. Huda, and Adiwijaya, “A multi-lable classification on topics of Quranic verses in English translation using multinomial naive bayes,” in 2018 6th International Conference on Information and Communication Technology, ICoICT 2018, 2018, pp. 481–484.

A. M. K. Izzaty, M. S. Mubarok, N. S. Huda, and Adiwijaya, “A multi-label classification on topics of quranic verses in English translation using Tree Augmented Naïve Bayes,” in 2018 6th International Conference on Information and Communication Technology, ICoICT 2018, 2018, pp. 103–106.

M. L. Zhang, J. M. Peña, and V. Robles, “Feature selection for multi-label naive Bayes classification,” Inf. Sci. (Ny)., vol. 179, no. 19, pp. 3218–3229, Sep. 2009.

A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,” Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014.

G. Doquire and M. Verleysen, “Mutual information-based feature selection for multilabel classification,” Neurocomputing, vol. 122, pp. 148–155, Dec. 2013.

A. Yusuf and T. Priambadha, “Support Vector Machines yang Didukung K-Means Clustering dalam Klasifikasi Dokumen,” JUTI J. Ilm. Teknol. Inf., vol. 11, no. 1, p. 15, Jan. 2013.

Suyanto, Data Mining Untuk Klasifikasi dan Klasterisasi Data, 1st ed. Penerbit Informatika, 2017.

N. Octaviani Faomasi Daeli, “Sentiment Analysis on Movie Reviews Using Information Gain and K-Nearest Neighbor,” OPEN ACCESS J DATA SCI APPL, vol. 3, no. 1, pp. 1–007, May 2020.

T. T. Wong, “Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation,” Pattern Recognit., vol. 48, no. 9, pp. 2839–2846, Sep. 2015.

I. K. Syuriadi, W. Astuti, F. Informatika, and U. Telkom, “Klasifikasi Teks Multi Label pada Hadis dalam Terjemahan Bahasa Indonesia Berdasarkan Anjuran , Larangan dan Informasi menggunakan TF-IDF dan KNN.”

M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, “Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation)Using Information Gain and Backpropagation Neural Network,” in Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, 2019, pp. 344–350.




DOI: https://doi.org/10.32736/sisfokom.v9i3.980

Refbacks

  • There are currently no refbacks.



Indexed By:

 



Creative Commons License
Jurnal Sisfokom (Sistem Informasi dan Komputer) has ISSN 2301-7988 and e-ISSN 2581-0588 which is published by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) ISB Atma Luhur under a Creative Commons Attribution-ShareAlike 4.0 International License.
Web Analytics Made Easy - StatCounter