Comparative Analysis of Random Forest and Support Vector Machine for Sundanese Dialect Classification Using Speech Recognition Features

Abdull Halim Anshor; Tri Ngudi Wiyatno

doi:10.32736/sisfokom.v14i2.2347

Authors

Abdull Halim Anshor Department of Informatics Engineering, University of Pelita Bangsa
Tri Ngudi Wiyatno Department of Industrial Engineering, University of Pelita Bangsa

DOI:

https://doi.org/10.32736/sisfokom.v14i2.2347

Keywords:

Classification of Sundanese Dialects, Machine Learning, Random Forest, Support Vector Machine, Mel-Frequency Cepstral Coefficient

Abstract

This study investigates the classification of West and South Sundanese dialects using Random Forest (RF) and Support Vector Machine (SVM). Using a dataset of 100 recordings with features extracted via Mel Frequency Cepstral Coefficient (MFCC), models were evaluated by accuracy, precision, recall, and F1-score. Results show RF achieved an accuracy of 93.33%, outperforming SVM's 73.33%. The analysis demonstrates that RF is more reliable in distinguishing dialectal features. This research contributes to regional speech recognition, supporting language preservation and improved dialectal analysis.

References

M. Azhar and H. F. Pardede, “Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest,” J. MEDIA Inform. BUDIDARMA, vol. 5, no. 2, p. 439, Apr. 2021, doi: 10.30865/mib.v5i2.2754.

A. M. Afida, “KLASIFIKASI JENIS BURUNG BERDASARKAN SUARA MENGGUNAKAN ALGORITME SUPPORT VECTOR MACHINE,” MATHunesa J. Ilm. Mat., vol. 8, no. 1, pp. 1–6, Jan. 2020, doi: 10.26740/mathunesa.v8n1.p1-6.

Akhiril Anwar Harahap, R. Novita, T. K. Ahsyar, and Z. Zarnelly, “Classification of Beef and Pork with Deep Learning Approach,” J. Sist. Cerdas, vol. 7, no. 1, pp. 55–65, Apr. 2024, doi: 10.37396/jsc.v7i1.393.

A. Alam, S. Urooj, and A. Q. Ansari, “Design and Development of a Non-Contact ECG-Based Human Emotion Recognition System Using SVM and RF Classifiers,” Diagnostics, vol. 13, no. 12, p. 2097, Jun. 2023, doi: 10.3390/diagnostics13122097.

A. Aljohani, N. Alharbe, R. E. Al Mamlook, and M. M. Khayyat, “A hybrid combination of CNN Attention with optimized random forest with grey wolf optimizer to discriminate between Arabic hateful, abusive tweets,” J. King Saud Univ. - Comput. Inf. Sci., vol. 36, no. 2, p. 101961, Feb. 2024, doi: 10.1016/j.jksuci.2024.101961.

M. A. As Sarofi, I. Irhamah, and A. Mukarromah, “Identifikasi Genre Musik dengan Menggunakan Metode Random Forest,” J. Sains Dan Seni ITS, vol. 9, no. 1, pp. D79–D86, Jun. 2020, doi: 10.12962/j23373520.v9i1.51311.

C. Avci, M. Budak, N. Yağmur, and F. Balçik, “Comparison between random forest and support vector machine algorithms for LULC classification,” Int. J. Eng. Geosci., vol. 8, no. 1, pp. 1–10, Feb. 2023, doi: 10.26833/ijeg.987605.

C. Doğdu, T. Kessler, D. Schneider, M. Shadaydeh, and S. R. Schweinberger, “A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech,” Sensors, vol. 22, no. 19, p. 7561, Oct. 2022, doi: 10.3390/s22197561.

G. Drougkas, E. Bakker, and M. Spruit, “Multimodal Machine Learning for Language and Speech Markers Identification in Mental Health,” Sep. 27, 2024, In Review. doi: 10.21203/rs.3.rs-4925232/v1.

M. R. Adrian, M. P. Putra, M. H. Rafialdy, and N. A. Rakhmawati, “Perbandingan Metode Klasifikasi Random Forest dan SVM Pada Analisis Sentimen PSBB,” J. Inform. Upgris, vol. 7, no. 1, Jun. 2021, doi: 10.26877/jiu.v7i1.7099.

Q. Du, J. Shen, P. Wen, and X. Chen, “Parkinson’s Disease Detection by Using Machine Learning Method based on Local Classification on Class Boundary,” Discov. Appl. Sci., vol. 6, no. 11, p. 576, Oct. 2024, doi: 10.1007/s42452-024-06295-1.

U. Farooq, K. K. S. Reddy, K. S. Shishira, M. G. Jayanthi, and P. Kannadaguli, “Comparing Hindustani Music Raga Prediction Systems using DL and ML Models,” in 2024 International Conference on Emerging Technologies in Computer Science for Interdisciplinary Applications (ICETCS), Bengaluru, India: IEEE, Apr. 2024, pp. 1–6. doi: 10.1109/ICETCS61022.2024.10543647.

S. Garg and B. Raghavan, “Comparison of machine learning algorithms for the classification of spinal cord tumor,” Ir. J. Med. Sci. 1971 -, vol. 193, no. 2, pp. 571–575, Apr. 2024, doi: 10.1007/s11845-023-03487-3.

R. B. Handoko and S. Suyanto, “Klasifikasi Gender Berdasarkan Suara Menggunakan Support Vector Machine,” Indones. J. Comput. Indo-JC, vol. 4, no. 1, p. 9, Mar. 2019, doi: 10.21108/INDOJC.2019.4.1.244.

O. Peña-Cáceres, H. Silva-Marchan, M. Albert, and M. Gil, “Recognition of Human Actions through Speech or Voice Using Machine Learning Techniques,” Comput. Mater. Contin., vol. 77, no. 2, pp. 1873–1891, 2023, doi: 10.32604/cmc.2023.043176.

S.-M. Jeong, Y.-D. Song, C.-L. Seok, J.-Y. Lee, E. C. Lee, and H.-J. Kim, “Machine learning-based classification of Parkinson’s disease using acoustic features: Insights from multilingual speech tasks,” Comput. Biol. Med., vol. 182, p. 109078, Nov. 2024, doi: 10.1016/j.compbiomed.2024.109078.

M. Kasahun and A. Legesse, “Machine learning for urban land use/ cover mapping: Comparison of artificial neural network, random forest and support vector machine, a case study of Dilla town,” Heliyon, vol. 10, no. 20, p. e39146, Oct. 2024, doi: 10.1016/j.heliyon.2024.e39146.

S. Madanian et al., “Speech emotion recognition using machine learning — A systematic review,” Intell. Syst. Appl., vol. 20, p. 200266, Nov. 2023, doi: 10.1016/j.iswa.2023.200266.

G. H. Mohmad Dar and R. Delhibabu, “Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review,” IEEE Access, vol. 12, pp. 151122–151152, 2024, doi: 10.1109/ACCESS.2024.3476960.

A. Omar and T. Abd El-Hafeez, “Quantum computing and machine learning for Arabic language sentiment classification in social media,” Sci. Rep., vol. 13, no. 1, p. 17305, Oct. 2023, doi: 10.1038/s41598-023-44113-7.

D. Subhash, J. L. G., P. B., and V. Ravi, “A robust accent classification system based on variational mode decomposition,” Eng. Appl. Artif. Intell., vol. 139, p. 109512, Jan. 2025, doi: 10.1016/j.engappai.2024.109512.

M. Ur Rehman, A. Shafique, Q.-U.-A. Azhar, S. S. Jamal, Y. Gheraibia, and A. B. Usman, “Voice disorder detection using machine learning algorithms: An application in speech and language pathology,” Eng. Appl. Artif. Intell., vol. 133, p. 108047, Jul. 2024, doi: 10.1016/j.engappai.2024.108047.

N. Widjiyati, “Implementasi Algoritme Random Forest Pada Klasifikasi Dataset Credit Approval,” J. Janitra Inform. Dan Sist. Inf., vol. 1, no. 1, pp. 1–7, Apr. 2021, doi: 10.25008/janitra.v1i1.118.

N. Zhantileuov and S. Ospanov, “A Comparative Study of Supervised Machine Learning and Deep Learning Techniques with Feature Selection Methods for Classifying Parkinson’s Disease Based on Speech Impairments,” in 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan: IEEE, May 2024, pp. 124–129. doi: 10.1109/SIST61555.2024.10629274.

E. Raczko and B. Zagajewski, “Comparison of support vector machine, random forest and neural network classifiers for tree species classification on airborne hyperspectral APEX images,” Eur. J. Remote Sens., vol. 50, no. 1, pp. 144–154, Jan. 2017, doi: 10.1080/22797254.2017.1299557.

E. Renata and M. Ayub, “Penerapan Metode Random forest untuk Analisis Risiko pada dataset Peer to peer lending,” J. Tek. Inform. Dan Sist. Inf., vol. 6, no. 3, Dec. 2020, doi: 10.28932/jutisi.v6i3.2890.

A. B. Dina, R. Sarno, R. N. E. Anggraini, A. T. Haryono, and A. F. Septiyanto, “Comparison of Oversampling Techniques in Prediction Judicial Decisions of Divorce Trials in Family Courts,” in 2024 International Conference on Information Technology Research and Innovation (ICITRI), Jakarta, Indonesia: IEEE, Sep. 2024, pp. 13–18. doi: 10.1109/ICITRI62858.2024.10699016.