Peningkatan Performa Klasifikasi Machine Learning Melalui Perbandingan Metode Machine Learning dan Peningkatan Dataset

Fikri Baharuddin; Aris Tjahyanto

doi:10.32736/sisfokom.v11i1.1337

Authors

Fikri Baharuddin Institut Teknologi Sepuluh Nopember https://orcid.org/0000-0002-2495-5878
Aris Tjahyanto Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.32736/sisfokom.v11i1.1337

Keywords:

Classification, StringToWordVector, Machine Learning, Exam Classification

Abstract

Classification using machine learning is an alternative that is widely used to classify data. There are various classification methods or also known as machine learning classification algorithms that can be used. However, to get the best classification results, we need a classifier that fits the dataset type to provide the best classification performance. In addition, the quality and quantity of data contained in a dataset also has an influence on the classification performance. In this study, several attempts were made to improve the classification performance of the dataset of Indonesian language exam questions at the elementary school level based on the category of difficulty level. The efforts made consist of improving the quality of the dataset and using the StringToWordVector filter algorithm to manage textual data, as well as the use of several classification algorithms such as the nave Bayes algorithm, Random Forest, and REPTree. Classification is done by using WEKA Tools. The results of the experiments carried out showed the highest performance increase of 15% after improving the quality of the dataset and using the right classification method.

Author Biographies

Fikri Baharuddin, Institut Teknologi Sepuluh Nopember

Departemen Sistem Informasi

Aris Tjahyanto, Institut Teknologi Sepuluh Nopember

Lektor Kepala, Departemen Sistem Informasi

References

B. Mahesh, “Machine Learning Algorithms-A Review,” International Journal of Science and Research (IJSR).[Internet], vol. 9, pp. 381–386, 2020.

T. P. Carvalho, F. A. Soares, R. Vita, R. da P. Francisco, J. P. Basto, and S. G. S. Alcalá, “A systematic literature review of machine learning methods applied to predictive maintenance,” Computers & Industrial Engineering, vol. 137, p. 106024, 2019.

A. Althnian et al., “Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain,” Applied Sciences, vol. 11, no. 2, p. 796, 2021.

G. Zayaraz, “Concept relation extraction using Naïve Bayes classifier for ontology-based question answering systems,” Journal of King Saud University-Computer and Information Sciences, vol. 27, no. 1, pp. 13–24, 2015.

T. GopalaKrishnan and P. Sengottuvelan, “A hybrid PSO with Naïve Bayes classifier for disengagement detection in online learning,” Program, 2016.

W. G. Touw et al., “Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?,” Briefings in bioinformatics, vol. 14, no. 3, pp. 315–326, 2013.

A. Verikas, A. Gelzinis, and M. Bacauskiene, “Mining data with random forests: A survey and results of new tests,” Pattern recognition, vol. 44, no. 2, pp. 330–349, 2011.

D. Denisko and M. M. Hoffman, “Classification and interaction in random forests,” Proceedings of the National Academy of Sciences, vol. 115, no. 8, pp. 1690–1692, 2018.

“How To Increase Accuracy Of Machine Learning Model.” https://www.analyticsvidhya.com/blog/2015/12/improve-machine-learning-results/ (accessed Dec. 19, 2021).

M. Mohamad, A. Selamat, I. M. Subroto, and O. Krejcar, “Improving the classification performance on imbalanced data sets via new hybrid parameterisation model,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 7, pp. 787–797, 2021.

J. Gong and H. Kim, “RHSBoost: Improving classification performance in imbalance data,” Computational Statistics & Data Analysis, vol. 111, pp. 1–13, 2017.

P. Pujari and J. B. Gupta, “Improving classification accuracy by using feature selection and ensemble model,” International Journal of Soft Computing and Engineering (IJSCE), vol. 2, no. 2, pp. 380–386, 2012.

J.-S. Chou, M.-Y. Cheng, and Y.-W. Wu, “Improving classification accuracy of project dispute resolution using hybrid artificial intelligence and support vector machine models,” Expert Systems with Applications, vol. 40, no. 6, pp. 2263–2274, 2013.

W.-W. Wu, “Improving classification accuracy and causal knowledge for better credit decisions,” International Journal of Neural Systems, vol. 21, no. 04, pp. 297–309, 2011.

M. Mowafy, A. Rezk, and H. El-Bakry, “An efficient classification model for unstructured text document,” American Journal of Computer Science and Information Technology, vol. 6, no. 1, p. 16, 2018.

S. K. Trivedi and P. K. Panigrahi, “Spam classification: a comparative analysis of different boosted decision tree approaches,” Journal of Systems and Information Technology, 2018.

H. Naji and W. Ashour, “Text Classification for Arabic Words Using Rep-Tree,” International Journal of Computer Science & Information Technology (IJCSIT) Vol, vol. 8, 2016.

F. Baharuddin and A. Tjahyanto, “Dataset Soal Ujian Bahasa Indonesia Tingkat Sekolah Dasar,” Dec. 2021, doi: 10.5281/ZENODO.5793377.