Comparative Analysis of RAG-Based Open-Source LLMs for Indonesian Banking Customer Service Optimization Using Simulated Data
DOI:
https://doi.org/10.32736/sisfokom.v14i3.2383Keywords:
Bank Customer Service, Large Language Model (LLM), LLM-as-a-Judge, Semantic Similarity, Retrieval-Augmented Generation (RAG)Abstract
In the digital era, banks face challenges in delivering fast, accurate, and efficient customer service, especially for frequently asked simple questions. This study evaluates the effectiveness of three open-source Large Language Models (LLMs), namely Gemma2-9B-Sahabat-AI, Qwen2.5-14B-Instruct, and Mistral-Nemo-Instruct in supporting a Retrieval-Augmented Generation (RAG) question-answering system for the banking sector. Using 12,000 synthetic billing documents indexed with intfloat/multilingual-e5-large-instruct embeddings (1024 dimensions), model performance was assessed via semantic similarity metrics, LLM-as-a-Judge scores (GPT-4o-mini and Gemini 2.0 Flash), and human validation Gemma2-9B-Sahabat-AI achieved the highest semantic similarity score (0.9627), followed by Mistral (0.9614) and Qwen2.5 (0.9284). In LLM-as-a-Judge evaluations, Qwen2.5 ranked highest on GPT-4o-mini (92.2), while Gemma2 led under Gemini 2.0 Flash (88.4). Human evaluators gave perfect scores for factual questions (1–10), but all models struggled with arithmetic in question 13. Gemma2’s average response time was 41 seconds, faster than Qwen2.5’s 72 seconds and Mistral’s 48 seconds, confirming Gemma2’s balanced performance in accuracy, speed, and computational efficiency. These findings underscore the potential of locally operated open-source LLMs for banking applications, ensuring privacy and regulatory compliance. However, limitations include reliance on synthetic data, a narrow question set, and lack of user diversity. Future research should involve broader queries, real user testing, and numeric reasoning modules to ensure robust and scalable deployment in real-world banking customer service environments.References
N. Izzah and M. Z. Rachmawan, “Penerapan Strategi Cost Efficiency (Efisiensi Biaya) Pada PT Bank Muamalat Indonesia, Tbk. Tahun 2017,” Abiwara: Jurnal Vokasi Administrasi Bisnis, vol. 1, no. 1, pp. 37–44, 2019, doi: https://doi.org/10.31334/abiwara.v1i1.500.
N. Lekhawichit, C. Chavaha, K. Chienwattanasook, and K. Jermsittiparsert, “THE IMPACT OF SERVICE QUALITY ON THE CUSTOMER SATISFACTION: MEDIATING ROLE OF WAITING TIME,” 2021. Accessed: May 03, 2025. [Online]. Available: http://www.psychologyandeducation.net/pae/index.php/pae/article/view/2552/2228
A. Z. Desta and T. H. Belete, “The Influence of Waiting Lines Management on Customer Satisfaction in Commercial Bank of Ethiopia,” Institutions and Risks, vol. 3, no. 3, pp. 5–12, 2019, doi: 10.21272/fmir.3(3).
J.-H. Yang and A.-S. Park, “Analysis on Relationship between Waiting Time and Customer Satisfaction of General Hospitals,” 2021. Accessed: May 03, 2025. [Online]. Available: https://www.nveo.org/index.php/journal/article/view/248/223
A. Subyantoro, D. Tri Mardiana, M. S. Zulfikar, and M. Hasan, PELATIHAN DAN PENGEMBANGAN SUMBER DAYA MANUSIA. ZAHIR PUBLISHING, 2022. Accessed: May 03, 2025. [Online]. Available: http://eprints.upnyk.ac.id/34931/1/Buku%20Pengembangan%20Sumber%20Daya%20Manusia.pdf
J. Wirtz and V. Zeithaml, “Cost-effective service excellence,” J Acad Mark Sci, vol. 46, no. 1, pp. 59–80, Jan. 2018, doi: 10.1007/s11747-017-0560-7.
F. O. Edeh, N. M. Zayed, V. Nitsenko, O. Brezhnieva-Yermolenko, J. Negovska, and M. Shtan, “Predicting Innovation Capability through Knowledge Management in the Banking Sector,” Journal of Risk and Financial Management, vol. 15, no. 7, Jul. 2022, doi: 10.3390/jrfm15070312.
K. Bahl, R. Kiran, and A. Sharma, “Evaluating the effectiveness of training of managerial and non-managerial bank employees using Kirkpatrick’s model for evaluation of training,” Humanit Soc Sci Commun, vol. 11, no. 1, Dec. 2024, doi: 10.1057/s41599-024-02973-y.
M. Gumede, “THE IMPACT OF TRAINING AND DEVELOPMENT ON EMPLOYEE PERFORMANCE: A CASE STUDY OF CAPITEC BANK IN DURBAN,” Durban University of Technology, 2021. Accessed: May 03, 2025. [Online]. Available: https://openscholar.dut.ac.za/server/api/core/bitstreams/72d4bd9f-4019-473d-b04b-aae4f34ec578/content
A. Vilard et al., “The Effects of Training and Development on Employees Performance: The Case of the National Financial Credit Bank (NFCB) of the Centre Region of Cameroon,” International Journal of Science and Business, vol. 4, no. 6, pp. 88–106, 2020, doi: 10.5281/zenodo.3897174.
Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” ArXiv, Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.10997
S. Setty, H. Thakkar, A. Lee, E. Chung, and N. Vidra, “Improving Retrieval for RAG based Question Answering Models on Financial Documents,” ArXiv, Mar. 2024, [Online]. Available: http://arxiv.org/abs/2404.07221
Z. Xu et al., “Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering,” in SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, Jul. 2024, pp. 2905–2909. doi: 10.1145/3626772.3661370.
K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang, “REALM: Retrieval-Augmented Language Model Pre-Training,” ArXiv, Feb. 2020, [Online]. Available: http://arxiv.org/abs/2002.08909
V. Karpukhin et al., “Dense Passage Retrieval for Open-Domain Question Answering,” ArXiv, Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.04906
P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” ArXiv, May 2020, [Online]. Available: http://arxiv.org/abs/2005.11401
S. Kim, H. Song, H. Seo, and H. Kim, “Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems,” ArXiv, Mar. 2025, [Online]. Available: http://arxiv.org/abs/2503.15191
C. Choi et al., “FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation,” ArXiv, Apr. 2025, [Online]. Available: http://arxiv.org/abs/2504.15800
D. Staegemann, C. Haertel, C. Daase, M. Pohl, M. Abdallah, and K. Turowski, “A Review on Large Language Models and Generative AI in Banking,” in Proceedings of the 7th International Conference on Finance, Economics, Management and IT Business, SCITEPRESS - Science and Technology Publications, 2025, pp. 267–278. doi: 10.5220/0013472600003956.
G. Olaoye and H. Jonathan, “EasyChair Preprint The Evolving Role of Large Language Models (LLMs) in Banking,” 2024.
K. A. Maspul and N. K. Putri, “Will Big Data and AI Redefine Indonesia’s Financial Future?,” Jurnal Bisnis dan Komunikasi Digital, vol. 2, no. 2, p. 21, Feb. 2025, doi: 10.47134/jbkd.v2i2.3739.
M. Nadzirin Anshari Nur and G. K. Kassymova, “The Potential Misuse of Artificial Intelligence Technology Systems in Banking Fraud,” Universitas Diponegoro, vol. 21, no. 1, p. 17, Feb. 2025, Accessed: May 28, 2025. [Online]. Available: https://www.researchgate.net/publication/390122387_The_Potential_Misuse_of_Artificial_Intelligence_Technology_Systems_in_Banking_Fraud
L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei, “Multilingual E5 Text Embeddings: A Technical Report,” ArXiv, Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.05672
T. Jiang et al., “E5-V: Universal Embeddings with Multimodal Large Language Models,” ArXiv, Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.12580
M. Riviere et al., “Gemma 2: Improving Open Language Models at a Practical Size,” ArXiv, Jul. 2024, [Online]. Available: http://arxiv.org/abs/2408.00118
A. Yang et al., “Qwen2.5 Technical Report,” ArXiv, Dec. 2024, [Online]. Available: http://arxiv.org/abs/2412.15115
A. Q. Jiang et al., “Mistral 7B,” ArXiv, Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.06825
M. A. Oumano and S. M. Pickett, “Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions,” J Nucl Med Technol, vol. 00, p. jnmt.124.269335, May 2025, doi: 10.2967/JNMT.124.269335.
A. Mahboub, M. E. Za’ter, B. Al-Rfooh, Y. Estaitia, A. Jaljuli, and A. Hakouz, “Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic Language,” ArXiv, Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.18350
D. Chandrasekaran and V. Mago, “Evolution of Semantic Similarity -- A Survey,” ArXiv, Apr. 2020, doi: 10.1145/3440755.
J. Gu et al., “A Survey on LLM-as-a-Judge,” ArXiv, Nov. 2024, [Online]. Available: http://arxiv.org/abs/2411.15594
H. Huang et al., “An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Model is not a General Substitute for GPT-4,” ArXiv, Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.02839
E. Oro, F. M. Granata, A. Lanza, A. Bachir, L. De Grandis, and M. Ruffolo, “Evaluating Retrieval-Augmented Generation for Question Answering with Large Language Models,” in 4th National Conference on Artificial Intelligence, CINI, May 2024, pp. 1–6. Accessed: May 30, 2025. [Online]. Available: https://ceur-ws.org/Vol-3762/495.pdf
J. Ahn, R. Verma, R. Lou, D. Liu, R. Zhang, and W. Yin, “Large Language Models for Mathematical Reasoning: Progresses and Challenges,” ArXiv, vol. 1, p. 114, Jan. 2024, [Online]. Available: http://arxiv.org/abs/2402.00157
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Hendra Lijaya, Patricia Ho, Handri Santoso

This work is licensed under a Creative Commons Attribution 4.0 International License.
The copyright of the article that accepted for publication shall be assigned to Jurnal Sisfokom (Sistem Informasi dan Komputer) and LPPM ISB Atma Luhur as the publisher of the journal. Copyright includes the right to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.
Jurnal Sisfokom (Sistem Informasi dan Komputer), LPPM ISB Atma Luhur, and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Sisfokom (Sistem Informasi dan Komputer) are the sole and exclusive responsibility of their respective authors.
Jurnal Sisfokom (Sistem Informasi dan Komputer) has full publishing rights to the published articles. Authors are allowed to distribute articles that have been published by sharing the link or DOI of the article. Authors are allowed to use their articles for legal purposes deemed necessary without the written permission of the journal with the initial publication notification from the Jurnal Sisfokom (Sistem Informasi dan Komputer).
The Copyright Transfer Form can be downloaded [Copyright Transfer Form Jurnal Sisfokom (Sistem Informasi dan Komputer).
This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s). After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted. The copyright form should be signed originally, and send it to the Editorial in the form of scanned document to sisfokom@atmaluhur.ac.id.