Building Synsets for Indonesian WordNet using ROCK (Robust Clustering Using Links) Algorithm
DOI:
https://doi.org/10.32736/sisfokom.v9i2.853Keywords:
ROCK Clustering, Synonym Sets, WordNetAbstract
On the development of Indonesian WordNet, the synonym set is an important part that represents the similarity of meaning between words. Synonym sets are built using the Indonesian Thesaurus as the lexical database. After going through the extraction process from the Indonesian Thesaurus, we will get a synonym set that has a similarity or word sense between words. In general, the difference between WordNet and the dictionary is their main focus, in which the dictionary usually focuses on just one word, while in WordNet the focus is on the meaning of words and connectedness with other words. Explained in previous research, the constructions of synonym sets were done using several approaches, which is clustering to produce synonym sets and WSD (Word Sense Disambiguation). In this article, the approach used to produce synonym sets is the ROCK (Robust Clustering Using Links) algorithm, which uses similarity and link values. The resulting synonym sets will then be used for lexical database development. Therefore, the main focus of this article is to produce synonym sets through the clustering process and calculate their accuracy, using the F-Measure method involving the gold standard for performance calculation and evaluation.References
G. A. Miller, R. Beckwith dan C. Fellbaum, “Introduction to WordNet: An on-line lexical database,” International journal of lexicography, vol. 3, no. 4, pp. 235--244, 1990.
C. Fellbaum, “WordNet,” The encyclopedia of applied linguistics, no. Wiley Online Library, 1998.
m. montazery dan H. Faili, “Automatic persian wordnet construction. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 846–850.,” dalam Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational, 2010, pp. 846--850.
E. Barbu dan V. B. Mititelu, “Automatic Building of WordNets,” Recent Advances in Natural Language Processing IV: Selected Papers from RANLP 2005, vol. 292, p. 217, 2007.
D. M. Bikel, “Automatic WordNet mapping using word sense disambiguation,” dalam 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 2000, pp. 142--147.
S. Guha, R. Rastogi dan K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Information systems, vol. 25, no. Elsevier, pp. 345--366, 1999.
A. Saputra dan G. , “Building synsets for Indonesian Wordnet with monolingual lexical resources,” dalam 2010 International Conference on Asian Language Processing, Harbin, IEEE, 2010, pp. 297--300.
G. A. Miller, “WordNet: a lexical database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39--41, 1995.
C. Havasi, R. Speer dan J. Alonso, “ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge,” dalam Recent advances in natural language processing, Citeseer, 2007, pp. 27--29.
G. “Akuisisi Gloss Berbasis Ekstraksi Synonim Set Menggunakan Supervised Learning.,” Sepuluh Nopember Institute of Technology, Surabaya, 2016.
E. Endarmoko, Tesaurus Bahasa Indonesia, Gramedia Pustaka Utama, 2007.
I. P. P. ananda, M. A. Bijaksana dan I. Asror, “Pembangunan Synsets untuk WordNet Bahasa Indonesia dengan Metode Komutatif,” eProceedings of Engineering, vol. 3, p. 5, 2018.
P. Berkhin, “A survey of clustering data mining techniques,” dalam Grouping multidimensional data, Springer, 2006, pp. 25--71.
H. Garcia-Molina, Database systems: the complete book, Pearson Education India, 2008.
N. S. Belinda, I. R. Hg dan H. Yozza, “Penerapan Analisis Cluster Ensemble dengan Metode Rock Untuk Mengelompokkan Provinsi di Indonesia Berdasarkan Indikator Kesejahteraan Rakyat,” Andalas University, Padang, 2019.
Y. Nan, K. Chai, W. Lee dan H. Chieu, “Optimizing F-measure: A tale of two approaches,” arXiv preprint arXiv:1206.4625, 2012.
J. Euzenat, “Semantic Precision and Recall for Ontology Alignment Evaluation.,” IJCAI, vol. 7, pp. 348--353, 2007.
D. R. Musicant, V. Kumar dan A. Ozgur, “Optimizing F-Measure with Support Vector Machines.,” dalam FLAIRS conference, 2003, pp. 356--360.
T. D. Latifah, C. Suhaeni dan R. Anisa, “Segmentasi Pelanggan Susu Formula Menggunakan Cluster Ensemble Berbasis Algoritme ROCK,” Bogor Agricultural University, Bogor, 2018.
D. P. Sari, M. A. Bijaksana dan T. Suhardijanto, “Membangun synset untuk wordnet bahasa indonesia menggunakan hierarchical clustering,” Telkom University, Bandung, 2019.
L. D. Anggaraini, M. A. Bijaksana dan D. Puspandari, “Analisis Pembangunan Word Sense pada WordNet Bahasa Indonesia Menggunakan Metode Hierarchical Clustering,” Telkom University, Bandung, 2019.
Downloads
Additional Files
- Lampiran 1 - Data Uji
- Lampiran 2 - Data Validasi
- Lampiran 3 - Nilai F-Measure
- Lampiran 4 - Eksperimen Threshold 0.1
- Lampiran 5 - Eksperimen Threshold 0.2
- Lampiran 6 - Eksperimen Threshold 0.3
- Lampiran 7 - Eksperimen Threshold 0.4
- Lampiran 8 - Eksperimen Threshold 0.5
- Lampiran 9 - Eksperimen Threshold 0.6
- Lampiran 10 - Eksperimen Threshold 7
- Lampiran 11 - Eksperimen Threshold 0.8
- Lampiran 12 - Eksperimen Threshold 0.9
- Lampiran 13 - references url source
- Building Synsets for Indonesian WordNet using ROCK (Robust Clustering Using Links) Algorithm
Published
Issue
Section
License
The copyright of the article that accepted for publication shall be assigned to Jurnal Sisfokom (Sistem Informasi dan Komputer) and LPPM ISB Atma Luhur as the publisher of the journal. Copyright includes the right to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.
Jurnal Sisfokom (Sistem Informasi dan Komputer), LPPM ISB Atma Luhur, and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Sisfokom (Sistem Informasi dan Komputer) are the sole and exclusive responsibility of their respective authors.
Jurnal Sisfokom (Sistem Informasi dan Komputer) has full publishing rights to the published articles. Authors are allowed to distribute articles that have been published by sharing the link or DOI of the article. Authors are allowed to use their articles for legal purposes deemed necessary without the written permission of the journal with the initial publication notification from the Jurnal Sisfokom (Sistem Informasi dan Komputer).
The Copyright Transfer Form can be downloaded [Copyright Transfer Form Jurnal Sisfokom (Sistem Informasi dan Komputer).
This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s). After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted. The copyright form should be signed originally, and send it to the Editorial in the form of scanned document to sisfokom@atmaluhur.ac.id.