Building Synsets for Indonesian WordNet using ROCK (Robust Clustering Using Links) Algorithm

Mubaroq Iqbal(1*), Moch. Arif Bijaksana(2), Widi Astuti(3)

(1) Telkom University
(2) Telkom University
(3) Telkom University
(*) Corresponding Author

Abstract


On the development of Indonesian WordNet, the synonym set is an important part that represents the similarity of meaning between words. Synonym sets are built using the Indonesian Thesaurus as the lexical database. After going through the extraction process from the Indonesian Thesaurus, we will get a synonym set that has a similarity or word sense between words. In general, the difference between WordNet and the dictionary is their main focus, in which the dictionary usually focuses on just one word, while in WordNet the focus is on the meaning of words and connectedness with other words. Explained in previous research, the constructions of synonym sets were done using several approaches, which is clustering to produce synonym sets and WSD (Word Sense Disambiguation). In this article, the approach used to produce synonym sets is the ROCK (Robust Clustering Using Links) algorithm, which uses similarity and link values. The resulting synonym sets will then be used for lexical database development. Therefore, the main focus of this article is to produce synonym sets through the clustering process and calculate their accuracy, using the F-Measure method involving the gold standard for performance calculation and evaluation.

Keywords


ROCK Clustering; Synonym Sets; WordNet

Full Text:

PDF

References


G. A. Miller, R. Beckwith dan C. Fellbaum, “Introduction to WordNet: An on-line lexical database,” International journal of lexicography, vol. 3, no. 4, pp. 235--244, 1990.

C. Fellbaum, “WordNet,” The encyclopedia of applied linguistics, no. Wiley Online Library, 1998.

m. montazery dan H. Faili, “Automatic persian wordnet construction. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 846–850.,” dalam Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational, 2010, pp. 846--850.

E. Barbu dan V. B. Mititelu, “Automatic Building of WordNets,” Recent Advances in Natural Language Processing IV: Selected Papers from RANLP 2005, vol. 292, p. 217, 2007.

D. M. Bikel, “Automatic WordNet mapping using word sense disambiguation,” dalam 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 2000, pp. 142--147.

S. Guha, R. Rastogi dan K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Information systems, vol. 25, no. Elsevier, pp. 345--366, 1999.

A. Saputra dan G. , “Building synsets for Indonesian Wordnet with monolingual lexical resources,” dalam 2010 International Conference on Asian Language Processing, Harbin, IEEE, 2010, pp. 297--300.

G. A. Miller, “WordNet: a lexical database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39--41, 1995.

C. Havasi, R. Speer dan J. Alonso, “ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge,” dalam Recent advances in natural language processing, Citeseer, 2007, pp. 27--29.

G. “Akuisisi Gloss Berbasis Ekstraksi Synonim Set Menggunakan Supervised Learning.,” Sepuluh Nopember Institute of Technology, Surabaya, 2016.

E. Endarmoko, Tesaurus Bahasa Indonesia, Gramedia Pustaka Utama, 2007.

I. P. P. ananda, M. A. Bijaksana dan I. Asror, “Pembangunan Synsets untuk WordNet Bahasa Indonesia dengan Metode Komutatif,” eProceedings of Engineering, vol. 3, p. 5, 2018.

P. Berkhin, “A survey of clustering data mining techniques,” dalam Grouping multidimensional data, Springer, 2006, pp. 25--71.

H. Garcia-Molina, Database systems: the complete book, Pearson Education India, 2008.

N. S. Belinda, I. R. Hg dan H. Yozza, “Penerapan Analisis Cluster Ensemble dengan Metode Rock Untuk Mengelompokkan Provinsi di Indonesia Berdasarkan Indikator Kesejahteraan Rakyat,” Andalas University, Padang, 2019.

Y. Nan, K. Chai, W. Lee dan H. Chieu, “Optimizing F-measure: A tale of two approaches,” arXiv preprint arXiv:1206.4625, 2012.

J. Euzenat, “Semantic Precision and Recall for Ontology Alignment Evaluation.,” IJCAI, vol. 7, pp. 348--353, 2007.

D. R. Musicant, V. Kumar dan A. Ozgur, “Optimizing F-Measure with Support Vector Machines.,” dalam FLAIRS conference, 2003, pp. 356--360.

T. D. Latifah, C. Suhaeni dan R. Anisa, “Segmentasi Pelanggan Susu Formula Menggunakan Cluster Ensemble Berbasis Algoritme ROCK,” Bogor Agricultural University, Bogor, 2018.

D. P. Sari, M. A. Bijaksana dan T. Suhardijanto, “Membangun synset untuk wordnet bahasa indonesia menggunakan hierarchical clustering,” Telkom University, Bandung, 2019.

L. D. Anggaraini, M. A. Bijaksana dan D. Puspandari, “Analisis Pembangunan Word Sense pada WordNet Bahasa Indonesia Menggunakan Metode Hierarchical Clustering,” Telkom University, Bandung, 2019.




DOI: https://doi.org/10.32736/sisfokom.v9i2.853

Refbacks




Indexed By:

 



Creative Commons License
Jurnal Sisfokom (Sistem Informasi dan Komputer) has ISSN 2301-7988 and e-ISSN 2581-0588 which is published by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) ISB Atma Luhur under a Creative Commons Attribution-ShareAlike 4.0 International License.
Web Analytics Made Easy - StatCounter