Game and Application Purchasing Patterns on Steam using K-Means Algorithm

Salman Fauzan Fahri Aulia; Yana Aditia Gerhana; Eva Nurlatifah

doi:10.32736/sisfokom.v13i3.2214

Authors

Salman Fauzan Fahri Aulia Faculty of Science and Technology, Universitas Islam Negeri Sunan Gunung Djati Bandung
Yana Aditia Gerhana Faculty of Science and Technology, Universitas Islam Negeri Sunan Gunung Djati Bandung
Eva Nurlatifah Faculty of Science and Technology, Universitas Islam Negeri Sunan Gunung Djati Bandung

DOI:

https://doi.org/10.32736/sisfokom.v13i3.2214

Keywords:

Clustering, CRISP-DM, Game, K-Means, Purity, Silhouette Coefficient

Abstract

Online games are visual games that utilize the internet or LAN networks. With the growth of the gaming industry, platforms like Steam offer a wide variety of games, making it challenging for users to decide which game to play. This study employs the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to address this issue by understanding user preferences. The k-means algorithm clusters game data based on similar characteristics, helping users and developers identify the most popular game types. Data sourced from Kaggle, obtained through the Steam API and Steamspy, consists of 85,103 entries. A normalization process is applied to enhance calculation accuracy. The elbow method determines the optimal number of clusters, resulting in three clusters from the k-means algorithm. The evaluation includes the silhouette coefficient, which measures the proximity between variables, and precision purity, which compares labels by assigning a value of 1 (actual) or 0 (false). The study finds an average silhouette coefficient of 0.345 and a precision purity value of 0.734, indicating that the k-means algorithm performs optimally based on the precision purity metric. The findings reveal that free-to-play games are the most popular among users, while the "Animation & Modelling" category is the most expensive based on price comparisons

References

R. Machfiroh, A. Rahmansyah, and A. Budiman, “The Effect of Massively Multiplayer Online Game on Player Behaviour,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Feb. 2021. doi: 10.1088/1742-6596/1764/1/012081.

D. Bai, L. Chen, Z. Shang, Y. Wang, and G. Guan, “E-sports Industry, Video Game Industry and Economic Growth: An Empirical Research in China,” 2022. doi: 10.2139/ssrn.4074000.

C. H. Cheng and S. F. Huang, “A novel clustering-based purity and distance imputation for handling medical data with missing values,” Soft comput, vol. 25, no. 17, pp. 11781–11801, Sep. 2021, doi: 10.1007/s00500-021-05947-3.

S. Wijaya, N. Nur Setyo, and W. Nur Azizah, “Potential Analysis And Supervision Of VAT On The Utilization Of Digital Contents (Case Study: Steam Platform,” Dinasti International Journal Of Digital Business Management, vol. 1, no. 3, 2020, doi: 10.31933/dijdbm.v1i3.238.

Z. Wang, V. Chang, and G. Horvath, “Explaining and Predicting Helpfulness and Funniness of Online Reviews on the Steam Platform,” Journal of Global Information Management, vol. 29, no. 6, 2021, doi: 10.4018/JGIM.20211101.oa16.

I. Busurkina, V. Karpenko, E. Tulubenskaya, and D. Bulygin, “Game Experience Evaluation. A Study of Game Reviews on the Steam Platform,” in Digital Transformation and Global Society, D. A. Alexandrov, A. V. Boukhanovsky, A. V. Chugunov, Y. Kabanov, O. Koltsova, I. Musabirov, and S. Pashakhin, Eds., Cham: Springer International Publishing, 2022, pp. 117–127. doi: 10.1007/978-3-030-93715-7.

K. P. Sinaga and M. S. Yang, “Unsupervised K-means clustering algorithm,” IEEE Access, vol. 8, pp. 80716–80727, 2020, doi: 10.1109/ACCESS.2020.2988796.

B. Zhang, L. Wang, and Y. Li, “Precision Marketing Method of E-Commerce Platform Based on Clustering Algorithm,” Complexity, vol. 2021, 2021, doi: 10.1155/2021/5538677.

A. Moubayed, M. Injadat, A. Shami, and H. Lutfiyya, “Student Engagement Level in an e-Learning Environment: Clustering Using K-means,” American Journal of Distance Education, vol. 34, no. 2, pp. 137–156, Apr. 2020, doi: 10.1080/08923647.2020.1696140.

L. Pamungkas, N. A. Dewi, and N. A. Putri, “Classification of Student Grade Data Using the K-Means Clustering Method,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 1, pp. 86–91, Feb. 2024, doi: 10.32736/sisfokom.v13i1.1983.

S. Ariska, D. Puspita, and I. Anggraini, “Comparison Of K-Means and K-Medoids Algorithm for Clustering Data UMKM in Pagar Alam City,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 2, pp. 193–199, Jun. 2024, doi: 10.32736/sisfokom.v13i2.2090.

A. Abdulhafedh, “Incorporating K-means, Hierarchical Clustering and PCA in Customer Segmentation,” Journal of City and Development, vol. 3, no. 1, pp. 12–30, 2021, doi: 10.12691/jcd-3-1-3.

S. D. K. Wardani, A. S. Ariyanto, M. Umroh, and D. Rolliawati, “Comparison of K-Means, Db Scanner & Hierarchical Clustering Method Results for Market Segmentation Analysis,” JIKO (Jurnal Informatika dan Komputer), vol. 7, no. 2, p. 191, Sep. 2023, doi: 10.26798/jiko.v7i2.796.

D. P. Agustino, I. G. B. A. Budaya, I. G. Harsemadi, I. K. Dharmendra, and I. M. S. A. Pande, “Comparison of the DBSCAN Algorithm and Affinity Propagation on Business Incubator Tenant Customer Segmentation,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 2, pp. 315–321, Jul. 2023, doi: 10.32736/sisfokom.v12i2.1682.

M. Sarkar, ✉ Aisharyja, R. Puja, and F. R. Chowdhury, “Optimizing Marketing Strategies with RFM Method and K-Means Clustering-Based AI Customer Segmentation Analysis,” Journal of Business and Management Studies, 2024, doi: 10.32996/jbms.2024.6.2.5.

C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 526–534. doi: 10.1016/j.procs.2021.01.199.

J. F. Pimentel, L. Murta, V. Braganholo, and J. Freire, “Understanding and improving the quality and reproducibility of Jupyter notebooks,” Empir Softw Eng, vol. 26, no. 4, Jul. 2021, doi: 10.1007/s10664-021-09961-9.

S. Chandel, C. B. Clement, G. Serrato, and N. Sundaresan, “Training and Evaluating a Jupyter Notebook Data Science Assistant,” 2020. [Online]. Available: www.aaai.org

C. H. Cheng and S. F. Huang, “A novel clustering-based purity and distance imputation for handling medical data with missing values,” Soft comput, vol. 25, no. 17, pp. 11781–11801, Sep. 2021, doi: 10.1007/s00500-021-05947-3.

D. Solunke, G. Deshmukh, S. Wagh, A. Agrawal, and I. Priyadarshini, “Unlocking The Genres: Multilabel Anime Genre Predicition,” International Research Journal of Modernization in Engineering Technology and Science , vol. 05, no. 04, 2023, [Online]. Available: www.irjmets.com

F. Aldi, F. Hadi, N. A. Rahmi, and S. Defit, “Standardscaler’s Potential In Enhancing Breast Cancer Accuracy Using Machine Learning,” Journal of Applied Engineering and Technological Science, vol. 5, no. 1, pp. 401–413, 2023, doi: 10.37385/jaets.v5i1.3080.

H. Humaira and R. Rasyidah, “Determining The Appropiate Cluster Number Using Elbow Method for K-Means Algorithm,” in Proceedings of the 2nd Workshop on Multidisciplinary and Applications (WMA), European Alliance for Innovation n.o., Mar. 2020. doi: 10.4108/eai.24-1-2018.2292388.

D.-T. Dinh, T. Fujinami, and V.-N. Huynh, “Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient,” in Knowledge and Systems Sciences, J. Chen, V. N. Huynh, G.-N. Nguyen, and X. Tang, Eds., in Communications in Computer and Information Science, vol. 1103. Singapore: Springer Singapore, 2019, pp. 1–17. doi: 10.1007/978-981-15-1209-4.

I. Aljarah, M. Mafarja, A. A. Heidari, H. Faris, and S. Mirjalili, “Multi-verse optimizer: Theory, literature review, and application in data clustering,” in Studies in Computational Intelligence, vol. 811, Springer Verlag, 2020, pp. 123–141. doi: 10.1007/978-3-030-12127-3_8.