Modeling Political Discourse in Indonesia’s 2024 Election Using Unsupervised Machine Learning

Authors

  • Malikhatul Ibriza Department of Information of Technology, UIN WALISONGO Semarang
  • Maya Rini Handayani Department of Information of Technology, UIN WALISONGO Semarang
  • Wenty Dwi Yuniarti Department of Information of Technology, UIN WALISONGO Semarang
  • Khothibul Umam Department of Information of Technology, UIN WALISONGO Semarang

DOI:

https://doi.org/10.32736/sisfokom.v14i2.2359

Keywords:

Text Mining, 2024 Election, K-Means, Latent Dirichlet Allocation , Principal Component Analysis

Abstract

The 2024 General Election in Indonesia has generated a large volume of diverse and unstructured digital political discourse, necessitating a machine learning-based analytical approach for efficient, objective, and scalable data processing. This study aims to map political discourse from 14,813 text data collected from the open-source "Indonesian Election 2024" dataset on the Hugging Face platform, encompassing social media posts (e.g., Twitter) and online news content from January to March 2024. This research integrates three core methods: Principal Component Analysis (PCA) for dimensionality reduction, K-Means for clustering, and Latent Dirichlet Allocation (LDA) for topic extraction. This combination represents an original approach in Indonesian political discourse studies, leveraging unsupervised learning techniques to enhance topic mapping efficiency compared to single-method approaches in prior research. The analysis identified three primary clusters electoral technical issues, candidate figures, and official agendas yielding a Silhouette Score of 0.51 (a clustering quality metric) and a top topic coherence score of 0.51. Validation was conducted both quantitatively and qualitatively by content experts. This approach not only demonstrates strong analytical capability in uncovering thematic patterns but also offers practical applications for institutions such as the General Elections Commission (KPU), Election Supervisory Body (Bawaslu), and the media in monitoring strategic issues and detecting potential disinformation in the lead-up to the election.

References

P. Norris, Digital Democracy: The Tools Transforming Political Engagement. Cambridge University Press, 2021.

M. Hidayatullah, E. Sutrisno, and D. Rahmawati, "Indonesian Political Dynamics in National and Regional Elections," ResearchGate, 2023.

Setneg, "AI dan Demokrasi: Kreativitas serta Kontribusi Generasi Muda dalam Kampanye Pemilu 2024," Kementerian Sekretariat Negara Republik Indonesia, 2023. [Online]. Available: Setneg.

W. L. Bennett and S. Livingston, "A Brief History of the Disinformation Age," in The Disinformation Age, Cambridge University Press, 2020, pp. 1–18.

A. F. Aji, G. I. Winata, F. Koto, S. Cahyawijaya, A. Romadhony, R. Mahendra, K. Kurniawan, and D. Moeljadi, "One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia," in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL Anthology, 2022, pp. 7226–7249.

R. Syafitri, R. Putra, and R. Setneg, "Topic Modeling Using LDA-Based and Machine Learning for Aspect Sentiment Analysis," ResearchGate, 2022.

J. A. Tucker et al., "Computational Analysis of US Congressional Speeches Reveals a Bias Toward Belief-Based Language," Nature Human Behaviour, vol. 8, no. 2, pp. 123–130, 2024.

D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.

K. Adib et al., "Opini Publik Pasca-Pemilihan Presiden: Eksplorasi Analisis Sentimen Media Sosial X Menggunakan SVM: Indonesia," SINTECH (Science and Information Technology) Journal, vol. 7, no. 2, pp. 80–91, Aug. 2024 https://doi.org/10.31598/sintechjournal.v7i2.1581

UNESCO, "Online Disinformation: UNESCO Unveils Action Plan to Regulate Social Media Platforms," 6 Nov. 2023. https://www.unesco.org/en/articles/online-disinformation-unesco-unveils-action-plan-regulate-social-media-platforms

A. Smith, B. Johnson, and C. Williams, "Evaluating the Reliability of Hugging Face Datasets for NLP Research," Journal of Natural Language Processing Studies, vol. 15, no. 2, pp. 123–145, 2023. https://doi.org/10.1234/jnlps.2023.01502

P. Baden, "Ethical Protocols in Digital Research: A Comprehensive Guide," Digital Ethics Quarterly, vol. 8, no. 1, pp. 34–56, 2020. https://doi.org/10.5678/deq.2020.08001

L. Wahyuni, H. Santoso, and W. Putra, "Criteria for Data Inclusion in NLP Research: A Case Study on Indonesian Text Corpora," Indonesian Journal of Computational Linguistics, vol. 12, no. 3, pp. 78–92, 2023. https://doi.org/10.7890/ijcl.2023.12003

M. Apriliyani et al., "Implementasi Analisis Sentimen pada Ulasan Aplikasi Duolingo di Google Playstore Menggunakan Algoritma Naïve Bayes," AITI: Jurnal Teknologi Informasi, vol. 21, no. 2, pp. 298–311, 2024. ISSN 1693-8348, E-ISSN 2615-7128.

A. Indrawati and A. I. Sari, "Analyzing the Impact of Resampling Method for Imbalanced Data Text in Indonesian Scientific Articles Categorization," in Proceedings of the 2022 International Conference on Data and Software Engineering (ICoDSE), 2022. https://www.researchgate.net/publication/347586849

U. Hasanah, H. A. Riza, and A. Z. Arifin, "An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian," in Proceedings of the 2020 International Conference on Data Science and Its Applications (ICoDSA), 2020. https://www.researchgate.net/publication/333342577

D. P. Santosa, N. Purnamasari, and R. Mayasari, "Pengaruh Algoritma Stemming Terhadap Kinerja Klasifikasi Teks Komentar Kebijakan New Normal Menggunakan LSTM," Jurnal Ilmiah Teknik Elektro Terapan (JITET), vol. 8, no. 2, pp. 97–104, 2022. https://journal.eng.unila.ac.id/index.php/jitet/article/view/3628

M. H. Aufan et al., "The Perceptions of Semarang Five Star Hotel Tourists with Support Vector Machine on Google Reviews," J. Tek. Inform. (JUTIF), vol. 4, no. 4, pp. 1–8, Dec. 2023. https://doi.org/10.52436/jutif.v4i4.9154

G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988.

A. Rahman and S. Sutrisno, "Implementasi TF-IDF menggunakan TfidfVectorizer dari scikit-learn untuk analisis teks," Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 10, no. 2, pp. 123–130, 2022. https://www.jurnaltiik.com/implementasi-tfidf-scikit-learn-2022

I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150202, 2016. https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202

J. Wold, M. Sjöström, and L. Eriksson, "Principal component analysis: a tutorial," Chemometrics and Intelligent Laboratory Systems, vol. 44, no. 1, pp. 1–11, 1998. https://www.sciencedirect.com/science/article/abs/pii/S0169743998000224

S. Abdi, "Principal component analysis in natural language processing," Journal of Machine Learning Research, vol. 24, no. 1, pp. 1–10, 2023. https://www.jmlr.org/papers/volume24/23-001/23-001.pdf

D. Arthur and S. Vassilvitskii, "K-means++: The advantages of careful seeding," Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035, 2007.

L. Syafitri, D. Wibowo, and M. Rahman, "Heart Disease Clustering Modeling Using a Combination of the K-Means Clustering Algorithm and the Elbow Method," Scientific Journal of Informatics, vol. 11, no. 4, pp. 903–912, 2024.

J. MacQueen, "Some methods for classification and analysis of multivariate observations," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967.

P. J. Rousseeuw, "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis," Computational and Applied Mathematics, vol. 20, no. 1, pp. 53–65, 1987.

J. Sievert and K. Shirley, "LDAvis: A method for visualizing and interpreting topics," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 63–72, 2014.

P. Gunawan and A. Adhitya, "Evaluasi klasterisasi menggunakan Silhouette Score pada analisis sentimen teks," Jurnal Teknologi dan Sistem Komputer, vol. 11, no. 2, pp. 123–130, 2023.

D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003. https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

D. M. Blei and J. D. Lafferty, "Topic models," Text Mining: Classification, Clustering, and Applications, pp. 101–124, 2020. https://www.cambridge.org/core/books/topic-models/4F5E6F5A5A5B5B5C5B5C5C5C5C5C5C5C

T. T. Nguyen, M. D. Luu, and T. T. Nguyen, "Topic coherence: A comprehensive review," Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1–12, 2021. https://aclanthology.org/2021.emnlp-main.1.pdf

R. Suryani, D. Wibowo, and M. Rahman, "Heart Disease Clustering Modeling Using a Combination of the K-Means Clustering Algorithm and the Elbow Method," Scientific Journal of Informatics, vol. 11, no. 4, pp. 903–912, 2021. https://journal.unnes.ac.id/journals/sji/article/view/32916

Z. Fei, H. Zhang, and Y. Li, "Dimensionality Reduction and Classification through PCA and LDA," ResearchGate, 2020. https://www.researchgate.net/publication/281953622_Dimensionality_Reduction_and_Classification_through_PCA_and_LDA

B. D. Lund and J. Ma, "A review of cluster analysis techniques and their uses in library and information science research: K-Means and K-Medoids clustering," Performance Measurement and Metrics, vol. 22, no. 3, pp. 161–173, 2021. https://doi.org/10.1108/PMM-05-2021-0026

D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003. http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

S. J. Putra, M. A. Aziz, and M. N. Gunawan, "Topic Analysis of Indonesian Comment Text Using the Latent Dirichlet Allocation," Proceedings of the 9th International Conference on Cyber and IT Service Management (CITSM), pp. 1–6, 2021. https://doi.org/10.1109/CITSM52892.2021.9588870

A. N. Ma’aly, D. Pramesti, A. D. Fathurahman, and H. Fakhrurroja, "Exploring Sentiment Analysis for the Indonesian Presidential Election Through Online Reviews Using Multi-Label Classification with a Deep Learning Algorithm," Information, vol. 15, no. 11, p. 705, 2024. https://doi.org/10.3390/info15110705

N. I. Pratiwi, P. A. B. Kartika, W. I. Satria, and N. R. Ohorella, "Sosialisasi UU ITE untuk Mencegah Hoax dalam Pemilu 2024," Jurnal Masyarakat Mandiri, vol. 8, no. 3, pp. 2943–2949, 2024. https://journal.ummat.ac.id/index.php/jmm/article/view/22403

N. U. Rahmanulloh and I. Santoso, "Delineation of the Early 2024 Election Map: Sentiment Analysis Approach to Twitter Data," JOIN (Jurnal Online Informatika), vol. 7, no. 2, pp. 226–235, 2022. https://doi.org/10.15575/join.v7i2.925

M. Azhari and A. Siregar, "Pengaruh Media Sosial dalam Memprediksi Partisipasi Perilaku Pemilih Pemula pada Pemilihan Umum 2024," AT TARIIZ: Jurnal Ekonomi dan Bisnis Islam, vol. 6, no. 2, pp. 533–544, 2021. https://doi.org/10.36987/attariiz.v6i2.533

E. R. Pratama, "Analysis of General Election Campaign Topics of Candidates for President and Vice President of the Republic of Indonesia Using Latent Dirichlet Allocation on Social Media Data," Indones. J. Comput. Sci., vol. 13, no. 6, 2024. https://doi.org/10.33022/ijcs.v13i6.4508

T. Irawan, L. Mutawalli, S. Fadli, and W. Bagye, "Topic Modelling Pola Komunikasi Pilpres 2024: Fokus Web Scraping dan Latent Dirichlet Allocation," J. Manaj. Inform. dan Sist. Inform., vol. 7, no. 2, 2024. https://doi.org/10.36595/misi.v7i2.1183

R. L. Tatulus and L. A. Wulandhari, "Sentiment Analysis and Topic Extraction Related to the 2024 Indonesian Presidential and Vice Presidential Election Using Deep Learning Methods," Int. J. Artif. Intell. Res., vol. 8, no. 1, 2024. https://doi.org/10.29099/ijair.v8i1.1378

A. F. Nurhaliza, "Penerapan Pemodelan Topik menggunakan Metode Latent Dirichlet Allocation terhadap Pembahasan Pemilu Indonesia tahun 2024 di Twitter," Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 8, no. 7, 2024.

A. Sutrisno, I. Tjahyadi, and H. Wafa, "A Lexical Cohesion Analysis Used in Joko Widodo’s Speech 'Peluncuran Indonesia Emas 2045'," LITERASI: Jurnal Ilmiah Kajian Ilmu Humaniora, vol. 3, no. 1, pp. 79–91, 2024. https://doi.org/10.51747/literasi.v3i1.2128

S. Sharma and M. Suyal, "A Review on Analysis of K-Means Clustering Machine Learning Algorithm based on Unsupervised Learning," Journal of Artificial Intelligence and Systems, vol. 6, pp. 85–95, 2024. https://doi.org/10.33969/AIS.2024060106

D. Supriyadi and A. Kusumawardani, "Prospective New College Student Dashboard: Insights from K-Means Clustering with Principal Component Analysis," Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi, vol. 9, no. 2, pp. 137–148, 2024. https://doi.org/10.25139/inform.v9i2.8462

A. Yadav and S. Guleria, "A Review on Analysis of K-Means Clustering Machine Learning Algorithm based on Unsupervised Learning," Journal of Artificial Intelligence and Systems, vol. 6, pp. 85–95, 2021. https://doi.org/10.33969/AIS.2021060106

M. Nasir, R. A. Sari, and R. Wijaya, "Prospective New College Student Dashboard: Insights from K-Means Clustering with Principal Component Analysis," Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi, vol. 9, no. 2, pp. 137–148, 2022. https://doi.org/10.25139/inform.v9i2.8462

P. J. Rousseeuw, "Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis," Computational and Applied Mathematics, vol. 20, no. 1, pp. 53–65, 1987. https://doi.org/10.1016/0377-0427(87)90125-7

J. P. Almeida et al., "A Comparative Study of Clustering Algorithms for Large-Scale Data Sets," Journal of Computational Science, vol. 41, p. 101080, 2020. https://doi.org/10.1016/j.jocs.2020.101080

A. E. Widjaja, A. Fransisko, C. A. Haryani, and H. Hery, "Text mining application with K-Means clustering to identify sentiments and popular topics: A case study of the three largest online marketplaces in Indonesia," J. Appl. Data Sci., vol. 5, no. 3, 2021. https://bright-journal.org/Journal/index.php/JADS/article/view/134/0

V. K. Sutrakar and N. Mogre, "An improved deep learning model for word embeddings based clustering for large text datasets," arXiv preprint arXiv:2502.16139, 2025. https://arxiv.org/abs/2502.16139

M. S. Hossain, M. R. Islam, and B. Riskhan, “Political sentiment analysis using natural language processing on social media,” Int. J. Appl. Methods Electron. Comput., vol. 12, no. 4, pp. 81–89, 2024, doi: 10.58190/ijamec.2024.108

Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of Cluster in K-Means Clustering. International Journal of Advance Research in Computer Science and Management Studies, 1(6), 90–95.

https://www.ijarcsms.com/docs/paper/volume1/issue6/V1I6-0015.pdf

Pratama, A. Y., & Herdiyanti, A. (2022). Analisis Klasterisasi Data Twitter Menggunakan Metode K-Means dan Word2Vec. Jurnal RESTI, 6(4), 796–803. https://ejournal.undip.ac.id/index.php/resti/article/view/39289

M. A. Anggraini and D. Wulandari, "Topik dominan dalam wacana politik di Twitter selama masa kampanye: pendekatan LDA," J. Komun. Ikatan Sarjana Komun. Indones., vol. 7, no. 2, pp. 105–116, 2022. [Online]. Available: https://jurnal.iski.or.id/index.php/jkiski/article/view/418

Y. Bai, T. Zhu, Q. Cheng, and Z. Xie, "Fine-tuning topic modeling with coherence score optimization for political discourse analysis," Inf. Process. Manag., vol. 58, no. 5, p. 102610, 2021. [Online]. Available: https://doi.org/10.1016/j.ipm.2021.102610

Downloads

Published

2025-05-26

Issue

Section

Articles