Automatic Categorization of Multi Marketplace FMCGs Products using TF-IDF and PCA Features
DOI:
https://doi.org/10.32736/sisfokom.v12i2.1621Keywords:
E-commerce, Fast Moving Consumer Goods, K-Means, Text Clustering, TF-IDFAbstract
The use of technology in line with the increasing number of internet users has caused a shift in the product sales ecosystem to the realm of electronic commerce (electronic commerce). A total of 73.23 customers made purchase transactions using e-commerce and the most purchased products were products classified as Fast Moving Consumer Goods (FMCGs). The increasingly varied FMCGs data coupled with the increasing number of marketplaces is felt to need to be broken down into specific groups. The process is carried out by analyzing e-commerce product information, especially product names, and descriptions. In this study, we propose an automatic categorization of multiple marketplaces using data from multiple marketplaces. Data text is converted into structured data with a series of preprocessing, and comprehensive experiments are carried out to see the extraction performance of variables including TF-IDF, BOW, and N-Gram. All three methods are used to validate text data sets with K-Means grouping results used with the help of PCA to reduce data dimensions. The results show that the performance of the TF-IDF algorithm with a dimension reduction value of 70 and the use of Python can provide optimal results for the percentage of grouping data.References
B.-S. Indonesia, “Statistik Telekomunikasi Indonesia,” 2021.
H. Al Mashalah, E. Hassini, A. Gunasekaran, and D. Bhatt (Mishra), “The impact of digital transformation on supply chains through e-commerce: Literature review and a conceptual framework,” Transp. Res. Part E Logist. Transp. Rev., vol. 165, no. August, p. 102837, 2022, doi: 10.1016/j.tre.2022.102837.
B.-S. Indonesia, Statistik E-Commerce 2021. 2021.
A. Etuk, J. A. Anyadighibe, E. E. James, and P. M. Egemba, “Trade sales promotion and distributors’ performance of fast-moving consumer goods (FMCGS),” Int. Res. J. Manag. IT Soc. Sci., vol. 9, no. 2, pp. 254–263, 2022, doi: 10.21744/irjmis.v9n2.2011.
A. O. Binuyo, H. Ekpe, and B. O. Binuyo, “Innovative strategies and firm growth: Evidence from selected fast moving consumer goods firms in Lagos state, Nigeria,” Probl. Perspect. Manag., vol. 17, no. 2, pp. 313–322, 2019, doi: 10.21511/ppm.17(2).2019.24.
L. Tan, M. Y. Li, and S. Kok, “E-Commerce Product Categorization via Machine Translation,” ACM Trans. Manag. Inf. Syst., vol. 11, no. 3, 2020, doi: 10.1145/3382189.
C. Chavaltada, K. Pasupa, and D. R. Hardoon, “A comparative study of machine learning techniques for Automatic Product Categorisation,” Adv. Intell. Syst. Comput., vol. 906, pp. 459–464, 2019, doi: 10.1007/978-981-13-6001-5_37.
N. Verbeeck, R. M. Caprioli, and R. Van de Plas, “Unsupervised machine learning for exploratory data analysis in imaging mass spectrometry,” Mass Spectrom. Rev., vol. 39, no. 3, pp. 245–291, 2020, doi: 10.1002/mas.21602.
C. M. Eckhardt et al., “Unsupervised machine learning methods and emerging applications in healthcare,” Knee Surgery, Sport. Traumatol. Arthrosc., no. 0123456789, 2022, doi: 10.1007/s00167-022-07233-7.
S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of Machine Learning Tools in Healthcare Decision Making,” J. Healthc. Eng., vol. 2021, 2021, doi: 10.1155/2021/6679512.
J. Pereira and M. Silveira, “Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection,” 2019 IEEE Int. Conf. Big Data Smart Comput. BigComp 2019 - Proc., 2019, doi: 10.1109/BIGCOMP.2019.8679157.
S. T. Jagtap, K. Phasinam, T. Kassanuk, S. S. Jha, T. Ghosh, and C. M. Thakar, “Towards application of various machine learning techniques in agriculture,” Mater. Today Proc., vol. 51, pp. 793–797, 2021, doi: 10.1016/j.matpr.2021.06.236.
M. Xu, S. Yoon, J. Lee, and D. S. Park, “Unsupervised Transfer Learning for Plant Anomaly Recognition,” vol. 11, no. 4, pp. 30–37, 2022.
L. Abualigah et al., “Nature-inspired optimization algorithms for text document clustering—a comprehensive analysis,” Algorithms, vol. 13, no. 12, pp. 1–32, 2020, doi: 10.3390/a13120345.
D. Abdullah, S. Susilo, A. S. Ahmar, R. Rusli, and R. Hidayat, “The application of K-means clustering for province clustering in Indonesia of the risk of the COVID-19 pandemic based on COVID-19 data,” Qual. Quant., vol. 56, no. 3, pp. 1283–1291, 2022, doi: 10.1007/s11135-021-01176-w.
J. Hwang, J. Kim, S. Chi, and J. O. Seo, “Development of training image database using web crawling for vision-based site monitoring,” Autom. Constr., vol. 135, p. 104141, 2022, doi: 10.1016/j.autcon.2022.104141.
X. Zhouyi, H. Weijun, and H. Yanrong, “INTELLIGENT ACQUISITION METHOD OF HERBACEOUS FLOWERS IMAGE BASED ON THEME CRAWLER, DEEP LEARNING AND GAME THEORY,” vol. 3, no. 65, pp. 44–52, 2022.
J. Yang et al., “Unified Contrastive Learning in Image-Text-Label Space,” pp. 19141–19151, 2022, doi: 10.1109/cvpr52688.2022.01857.
N. Kumar, M. Gupta, D. Sharma, and I. Ofori, “Technical Job Recommendation System Using APIs and Web Crawling,” Comput. Intell. Neurosci., vol. 2022, p. 7797548, 2022, doi: 10.1155/2022/7797548.
S. Neelakandan, A. Arun, R. R. Bhukya, B. M. Hardas, T. C. Anil Kumar, and M. Ashok, “An Automated Word Embedding with Parameter Tuned Model for Web Crawling,” Intell. Autom. Soft Comput., vol. 32, no. 3, pp. 1617–1632, 2022, doi: 10.32604/IASC.2022.022209.
S. Gupta and K. K. Bhatia, “Design of a Parallel and Scalable Crawler for the Hidden Web,” Int. J. Inf. Retr. Res., vol. 12, no. 1, pp. 1–23, 2022, doi: 10.4018/ijirr.289612.
P. Weninggalih and Y. Sibaroni, “Identify User Behavior based on Tweet Type on Twitter Platform using Agglomerative Hierarchical Clustering,” J. Media Inform. Budidarma, vol. 6, no. 3, p. 1404, 2022, doi: 10.30865/mib.v6i3.4342.
C. Gao et al., “Innovative Materials Science via Machine Learning,” Advantaced Funct. Mater., vol. 32, no. 1, 2022.
A. Venketeswaran et al., “Recent Advances in Machine Learning for Fiber Optic Sensor Applications,” Adv. Intell. Syst., vol. 4, no. 1, p. 2100067, 2022, doi: 10.1002/aisy.202100067.
A. Sarirete, Z. Balfagih, T. Brahimi, M. D. Lytras, and A. Visvizi, “Artificial intelligence and machine learning research: towards digital transformation at a global scale,” J. Ambient Intell. Humaniz. Comput., vol. 13, no. 7, pp. 3319–3321, 2022, doi: 10.1007/s12652-021-03168-y.
J. Qin et al., “Research and application of machine learning for additive manufacturing,” Addit. Manuf., vol. 52, no. October 2021, 2022, doi: 10.1016/j.addma.2022.102691.
M. D. McCradden et al., “A Research Ethics Framework for the Clinical Translation of Healthcare Machine Learning,” Am. J. Bioeth., vol. 22, no. 5, pp. 8–22, 2022, doi: 10.1080/15265161.2021.2013977.
M. M. M. Megdad, B. S. Abu-Nasser, and S. S. Abu-Naser, “Fraudulent Financial Transactions Detection Using Machine Learning,” Int. J. Acad. Inf. Syst. Res., vol. 6, no. 3, pp. 30–39, 2022, [Online]. Available: www.ijeais.org/ijaisr.
R. Rawat, Y. N. Rimal, P. William, S. Dahima, S. Gupta, and K. S. Sankaran, “Malware Threat Affecting Financial Organization Analysis Using Machine Learning Approach,” Int. J. Inf. Technol. Web Eng., vol. 17, no. 1, pp. 1–20, 2022, doi: 10.4018/ijitwe.304051.
K. K. Bhardwaj, S. Banyal, and D. K. Sharma, “Chapter 7 - Artificial Intelligence Based Diagnostics, Therapeutics and Applications in Biomedical Engineering and Bioinformatics,” Internet Things Biomed. Eng., pp. 161–187, 2019.
S. Xu et al., “An integrated K-means – Laplacian cluster ensemble approach for document datasets,” Neurocomputing, vol. 214, pp. 495–507, 2016, doi: 10.1016/j.neucom.2016.06.034.
B. Diallo, J. Hu, T. Li, G. A. Khan, and A. S. Hussein, “Multi-view document clustering based on geometrical similarity measurement,” Int. J. Mach. Learn. Cybern., vol. 13, no. 3, pp. 663–675, 2022, doi: 10.1007/s13042-021-01295-8.
M. kyu Kim, J. W. Chang, K. Park, and D. R. Yang, “Comprehensive assessment of the effects of operating conditions on membrane intrinsic parameters of forward osmosis (FO) based on principal component analysis (PCA),” J. Memb. Sci., vol. 641, no. September 2021, p. 119909, 2022, doi: 10.1016/j.memsci.2021.119909.
S. Wattanakriengkrai et al., “Automatic Classifying Self-Admitted Technical Debt Using N-Gram IDF,” Proc. - Asia-Pacific Softw. Eng. Conf. APSEC, vol. 2019-December, pp. 316–322, 2019, doi: 10.1109/APSEC48747.2019.00050.
T. Hasan and A. Matin, Extract Sentiment from Customer Reviews: A Better Approach of TF-IDF and BOW-Based Text Classification Using N-Gram Technique. Springer Singapore, 2021.
M. R. Mahmoudi, M. H. Heydari, S. N. Qasem, A. Mosavi, and S. S. Band, “Principal component analysis to study the relations between the spread rates of COVID-19 in high risks countries,” Alexandria Eng. J., vol. 60, no. 1, pp. 457–464, 2021, doi: 10.1016/j.aej.2020.09.013.
T. H. Huong, K. Tran-Trung, D. T. C. Lai, and V. T. Hoang, “Sentiment Analysis based on word vector representation for short comments in Vietnamese language,” Proc. - 2022 9th NAFOSTED Conf. Inf. Comput. Sci. NICS 2022, pp. 165–169, 2022, doi: 10.1109/NICS56915.2022.10013426.
T. Alqurashi, “Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data,” Sensors, vol. 22, no. 3, 2022, doi: 10.3390/s22031006.
R. Nainggolan, R. Perangin-Angin, E. Simarmata, and A. F. Tarigan, “Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method,” J. Phys. Conf. Ser., vol. 1361, no. 1, 2019, doi: 10.1088/1742-6596/1361/1/012015.
M. Mughnyanti, S. Efendi, and M. Zarlis, “Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation,” IOP Conf. Ser. Mater. Sci. Eng., vol. 725, no. 1, 2020, doi: 10.1088/1757-899X/725/1/012128.
A. M. Bagirov, R. M. Aliguliyev, and N. Sultanova, “Finding compact and well-separated clusters: Clustering using silhouette coefficients,” Pattern Recognit., vol. 135, 2023, doi: 10.1016/j.patcog.2022.109144.
J. ZHU, S. HUANG, Y. SHI, K. WU, and Y. WANG, “A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language,” IEICE Trans. Inf. Syst., vol. 105, no. 4, pp. 736–754, 2022, doi: 10.1587/transinf.2021EDP7144.
Downloads
Additional Files
Published
Issue
Section
License
The copyright of the article that accepted for publication shall be assigned to Jurnal Sisfokom (Sistem Informasi dan Komputer) and LPPM ISB Atma Luhur as the publisher of the journal. Copyright includes the right to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.
Jurnal Sisfokom (Sistem Informasi dan Komputer), LPPM ISB Atma Luhur, and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Sisfokom (Sistem Informasi dan Komputer) are the sole and exclusive responsibility of their respective authors.
Jurnal Sisfokom (Sistem Informasi dan Komputer) has full publishing rights to the published articles. Authors are allowed to distribute articles that have been published by sharing the link or DOI of the article. Authors are allowed to use their articles for legal purposes deemed necessary without the written permission of the journal with the initial publication notification from the Jurnal Sisfokom (Sistem Informasi dan Komputer).
The Copyright Transfer Form can be downloaded [Copyright Transfer Form Jurnal Sisfokom (Sistem Informasi dan Komputer).
This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s). After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted. The copyright form should be signed originally, and send it to the Editorial in the form of scanned document to sisfokom@atmaluhur.ac.id.