Automatic Categorization of Multi Marketplace FMCGs Products using TF-IDF and PCA Features

Authors

DOI:

https://doi.org/10.32736/sisfokom.v12i2.1621

Keywords:

E-commerce, Fast Moving Consumer Goods, K-Means, Text Clustering, TF-IDF

Abstract

The use of technology in line with the increasing number of internet users has caused a shift in the product sales ecosystem to the realm of electronic commerce (electronic commerce). A total of 73.23 customers made purchase transactions using e-commerce and the most purchased products were products classified as Fast Moving Consumer Goods (FMCGs). The increasingly varied FMCGs data coupled with the increasing number of marketplaces is felt to need to be broken down into specific groups. The process is carried out by analyzing e-commerce product information, especially product names, and descriptions. In this study, we propose an automatic categorization of multiple marketplaces using data from multiple marketplaces. Data text is converted into structured data with a series of preprocessing, and comprehensive experiments are carried out to see the extraction performance of variables including TF-IDF, BOW, and N-Gram.  All three methods are used to validate text data sets with K-Means grouping results used with the help of PCA to reduce data dimensions.  The results show that the performance of the TF-IDF algorithm with a dimension reduction value of 70 and the use of Python can provide optimal results for the percentage of grouping data.

Author Biographies

Sri Suci Indasari, Institut Teknologi Sepuluh Nopember

Information Systems Departement, Sepuluh Nopember Institute of TechnologyGraduate Student

Aris Tjahyanto, Institut Teknologi Sepuluh Nopember

Information Systems Departement, Sepuluh Nopember Institute of TechnologyLecturer

References

B.-S. Indonesia, “Statistik Telekomunikasi Indonesia,” 2021.

H. Al Mashalah, E. Hassini, A. Gunasekaran, and D. Bhatt (Mishra), “The impact of digital transformation on supply chains through e-commerce: Literature review and a conceptual framework,” Transp. Res. Part E Logist. Transp. Rev., vol. 165, no. August, p. 102837, 2022, doi: 10.1016/j.tre.2022.102837.

B.-S. Indonesia, Statistik E-Commerce 2021. 2021.

A. Etuk, J. A. Anyadighibe, E. E. James, and P. M. Egemba, “Trade sales promotion and distributors’ performance of fast-moving consumer goods (FMCGS),” Int. Res. J. Manag. IT Soc. Sci., vol. 9, no. 2, pp. 254–263, 2022, doi: 10.21744/irjmis.v9n2.2011.

A. O. Binuyo, H. Ekpe, and B. O. Binuyo, “Innovative strategies and firm growth: Evidence from selected fast moving consumer goods firms in Lagos state, Nigeria,” Probl. Perspect. Manag., vol. 17, no. 2, pp. 313–322, 2019, doi: 10.21511/ppm.17(2).2019.24.

L. Tan, M. Y. Li, and S. Kok, “E-Commerce Product Categorization via Machine Translation,” ACM Trans. Manag. Inf. Syst., vol. 11, no. 3, 2020, doi: 10.1145/3382189.

C. Chavaltada, K. Pasupa, and D. R. Hardoon, “A comparative study of machine learning techniques for Automatic Product Categorisation,” Adv. Intell. Syst. Comput., vol. 906, pp. 459–464, 2019, doi: 10.1007/978-981-13-6001-5_37.

N. Verbeeck, R. M. Caprioli, and R. Van de Plas, “Unsupervised machine learning for exploratory data analysis in imaging mass spectrometry,” Mass Spectrom. Rev., vol. 39, no. 3, pp. 245–291, 2020, doi: 10.1002/mas.21602.

C. M. Eckhardt et al., “Unsupervised machine learning methods and emerging applications in healthcare,” Knee Surgery, Sport. Traumatol. Arthrosc., no. 0123456789, 2022, doi: 10.1007/s00167-022-07233-7.

S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of Machine Learning Tools in Healthcare Decision Making,” J. Healthc. Eng., vol. 2021, 2021, doi: 10.1155/2021/6679512.

J. Pereira and M. Silveira, “Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection,” 2019 IEEE Int. Conf. Big Data Smart Comput. BigComp 2019 - Proc., 2019, doi: 10.1109/BIGCOMP.2019.8679157.

S. T. Jagtap, K. Phasinam, T. Kassanuk, S. S. Jha, T. Ghosh, and C. M. Thakar, “Towards application of various machine learning techniques in agriculture,” Mater. Today Proc., vol. 51, pp. 793–797, 2021, doi: 10.1016/j.matpr.2021.06.236.

M. Xu, S. Yoon, J. Lee, and D. S. Park, “Unsupervised Transfer Learning for Plant Anomaly Recognition,” vol. 11, no. 4, pp. 30–37, 2022.

L. Abualigah et al., “Nature-inspired optimization algorithms for text document clustering—a comprehensive analysis,” Algorithms, vol. 13, no. 12, pp. 1–32, 2020, doi: 10.3390/a13120345.

D. Abdullah, S. Susilo, A. S. Ahmar, R. Rusli, and R. Hidayat, “The application of K-means clustering for province clustering in Indonesia of the risk of the COVID-19 pandemic based on COVID-19 data,” Qual. Quant., vol. 56, no. 3, pp. 1283–1291, 2022, doi: 10.1007/s11135-021-01176-w.

J. Hwang, J. Kim, S. Chi, and J. O. Seo, “Development of training image database using web crawling for vision-based site monitoring,” Autom. Constr., vol. 135, p. 104141, 2022, doi: 10.1016/j.autcon.2022.104141.

X. Zhouyi, H. Weijun, and H. Yanrong, “INTELLIGENT ACQUISITION METHOD OF HERBACEOUS FLOWERS IMAGE BASED ON THEME CRAWLER, DEEP LEARNING AND GAME THEORY,” vol. 3, no. 65, pp. 44–52, 2022.

J. Yang et al., “Unified Contrastive Learning in Image-Text-Label Space,” pp. 19141–19151, 2022, doi: 10.1109/cvpr52688.2022.01857.

N. Kumar, M. Gupta, D. Sharma, and I. Ofori, “Technical Job Recommendation System Using APIs and Web Crawling,” Comput. Intell. Neurosci., vol. 2022, p. 7797548, 2022, doi: 10.1155/2022/7797548.

S. Neelakandan, A. Arun, R. R. Bhukya, B. M. Hardas, T. C. Anil Kumar, and M. Ashok, “An Automated Word Embedding with Parameter Tuned Model for Web Crawling,” Intell. Autom. Soft Comput., vol. 32, no. 3, pp. 1617–1632, 2022, doi: 10.32604/IASC.2022.022209.

S. Gupta and K. K. Bhatia, “Design of a Parallel and Scalable Crawler for the Hidden Web,” Int. J. Inf. Retr. Res., vol. 12, no. 1, pp. 1–23, 2022, doi: 10.4018/ijirr.289612.

P. Weninggalih and Y. Sibaroni, “Identify User Behavior based on Tweet Type on Twitter Platform using Agglomerative Hierarchical Clustering,” J. Media Inform. Budidarma, vol. 6, no. 3, p. 1404, 2022, doi: 10.30865/mib.v6i3.4342.

C. Gao et al., “Innovative Materials Science via Machine Learning,” Advantaced Funct. Mater., vol. 32, no. 1, 2022.

A. Venketeswaran et al., “Recent Advances in Machine Learning for Fiber Optic Sensor Applications,” Adv. Intell. Syst., vol. 4, no. 1, p. 2100067, 2022, doi: 10.1002/aisy.202100067.

A. Sarirete, Z. Balfagih, T. Brahimi, M. D. Lytras, and A. Visvizi, “Artificial intelligence and machine learning research: towards digital transformation at a global scale,” J. Ambient Intell. Humaniz. Comput., vol. 13, no. 7, pp. 3319–3321, 2022, doi: 10.1007/s12652-021-03168-y.

J. Qin et al., “Research and application of machine learning for additive manufacturing,” Addit. Manuf., vol. 52, no. October 2021, 2022, doi: 10.1016/j.addma.2022.102691.

M. D. McCradden et al., “A Research Ethics Framework for the Clinical Translation of Healthcare Machine Learning,” Am. J. Bioeth., vol. 22, no. 5, pp. 8–22, 2022, doi: 10.1080/15265161.2021.2013977.

M. M. M. Megdad, B. S. Abu-Nasser, and S. S. Abu-Naser, “Fraudulent Financial Transactions Detection Using Machine Learning,” Int. J. Acad. Inf. Syst. Res., vol. 6, no. 3, pp. 30–39, 2022, [Online]. Available: www.ijeais.org/ijaisr.

R. Rawat, Y. N. Rimal, P. William, S. Dahima, S. Gupta, and K. S. Sankaran, “Malware Threat Affecting Financial Organization Analysis Using Machine Learning Approach,” Int. J. Inf. Technol. Web Eng., vol. 17, no. 1, pp. 1–20, 2022, doi: 10.4018/ijitwe.304051.

K. K. Bhardwaj, S. Banyal, and D. K. Sharma, “Chapter 7 - Artificial Intelligence Based Diagnostics, Therapeutics and Applications in Biomedical Engineering and Bioinformatics,” Internet Things Biomed. Eng., pp. 161–187, 2019.

S. Xu et al., “An integrated K-means – Laplacian cluster ensemble approach for document datasets,” Neurocomputing, vol. 214, pp. 495–507, 2016, doi: 10.1016/j.neucom.2016.06.034.

B. Diallo, J. Hu, T. Li, G. A. Khan, and A. S. Hussein, “Multi-view document clustering based on geometrical similarity measurement,” Int. J. Mach. Learn. Cybern., vol. 13, no. 3, pp. 663–675, 2022, doi: 10.1007/s13042-021-01295-8.

M. kyu Kim, J. W. Chang, K. Park, and D. R. Yang, “Comprehensive assessment of the effects of operating conditions on membrane intrinsic parameters of forward osmosis (FO) based on principal component analysis (PCA),” J. Memb. Sci., vol. 641, no. September 2021, p. 119909, 2022, doi: 10.1016/j.memsci.2021.119909.

S. Wattanakriengkrai et al., “Automatic Classifying Self-Admitted Technical Debt Using N-Gram IDF,” Proc. - Asia-Pacific Softw. Eng. Conf. APSEC, vol. 2019-December, pp. 316–322, 2019, doi: 10.1109/APSEC48747.2019.00050.

T. Hasan and A. Matin, Extract Sentiment from Customer Reviews: A Better Approach of TF-IDF and BOW-Based Text Classification Using N-Gram Technique. Springer Singapore, 2021.

M. R. Mahmoudi, M. H. Heydari, S. N. Qasem, A. Mosavi, and S. S. Band, “Principal component analysis to study the relations between the spread rates of COVID-19 in high risks countries,” Alexandria Eng. J., vol. 60, no. 1, pp. 457–464, 2021, doi: 10.1016/j.aej.2020.09.013.

T. H. Huong, K. Tran-Trung, D. T. C. Lai, and V. T. Hoang, “Sentiment Analysis based on word vector representation for short comments in Vietnamese language,” Proc. - 2022 9th NAFOSTED Conf. Inf. Comput. Sci. NICS 2022, pp. 165–169, 2022, doi: 10.1109/NICS56915.2022.10013426.

T. Alqurashi, “Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data,” Sensors, vol. 22, no. 3, 2022, doi: 10.3390/s22031006.

R. Nainggolan, R. Perangin-Angin, E. Simarmata, and A. F. Tarigan, “Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method,” J. Phys. Conf. Ser., vol. 1361, no. 1, 2019, doi: 10.1088/1742-6596/1361/1/012015.

M. Mughnyanti, S. Efendi, and M. Zarlis, “Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation,” IOP Conf. Ser. Mater. Sci. Eng., vol. 725, no. 1, 2020, doi: 10.1088/1757-899X/725/1/012128.

A. M. Bagirov, R. M. Aliguliyev, and N. Sultanova, “Finding compact and well-separated clusters: Clustering using silhouette coefficients,” Pattern Recognit., vol. 135, 2023, doi: 10.1016/j.patcog.2022.109144.

J. ZHU, S. HUANG, Y. SHI, K. WU, and Y. WANG, “A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language,” IEICE Trans. Inf. Syst., vol. 105, no. 4, pp. 736–754, 2022, doi: 10.1587/transinf.2021EDP7144.

Downloads

Published

2023-07-01

Issue

Section

Articles