Building Synsets for Indonesian WordNet using ROCK (Robust Clustering Using Links) Algorithm

Mubaroq Iqbal(1*), Moch. Arif Bijaksana(2), Widi Astuti(3)

(1) Telkom University
(2) Telkom University
(3) Telkom University
(*) Corresponding Author


On the development of Indonesian WordNet, the synonym set is an important part that represents the similarity of meaning between words. Synonym sets are built using the Indonesian Thesaurus as the lexical database. After going through the extraction process from the Indonesian Thesaurus, we will get a synonym set that has a similarity or word sense between words. In general, the difference between WordNet and the dictionary is their main focus, in which the dictionary usually focuses on just one word, while in WordNet the focus is on the meaning of words and connectedness with other words. Explained in previous research, the constructions of synonym sets were done using several approaches, which is clustering to produce synonym sets and WSD (Word Sense Disambiguation). In this article, the approach used to produce synonym sets is the ROCK (Robust Clustering Using Links) algorithm, which uses similarity and link values. The resulting synonym sets will then be used for lexical database development. Therefore, the main focus of this article is to produce synonym sets through the clustering process and calculate their accuracy, using the F-Measure method involving the gold standard for performance calculation and evaluation.


ROCK Clustering; Synonym Sets; WordNet

