Leveraging Topic Modelling to Analyze Biomedical Research Trends from the PubMed Database Using LDA Method
DOI:
https://doi.org/10.32736/sisfokom.v13i2.2117Keywords:
Biomedical Research, Topic Modelling, LDA Method, PubMed, Text ProcessingAbstract
Biomedical research has become an essential entity in human life. However, finding trends related to research topics in the health sector contained in the repository is a challenging matter. In this study, we implemented topic modelling to analyze biomedical research trends using the LDA method. Topic modelling was carried out using data from 7000 articles from PubMed, which were processed with text processing such as lowercase, punctuation removal, tokenization, stop-word removal, and lemmatization. For topic modelling, the LDA with corpus conditions varied to 75% and 100% for validation. Alpha and beta parameters are also set with variations between 0.01, 0.31, 0.61, 0.91, symmetry, and asymmetry when the number of the corpus is changed. When the number of the corpus is 75%, the optimal number of topics is 7, with a coherence value of 0.52. Whereas when the number of the corpus is 100%, the optimal number of topics is 10 with a coherence value of 0.51. In addition, based on the results of article topic modelling, several topics are trending, including disease diagnosis, patient care, and genetic or cell research. Based on the classification of biomedical topics into seven categories, the optimal accuracy, precision, and recall values using the Random Forest algorithm were obtained, namely 85.57%, 87.36%, and 87.58%. The results of this study suggest that topic modelling using the LDA can be used to identify trends in biomedical research with high accuracy. This information can help stakeholders make informed decisions about the direction of future research.References
Kesiku, C.Y., Chaves-Villota, A., & Garcia-Zapirain, B. (2022). Natural Language Processing Techniques for Text Classification of Biomedical Documents: A Systematic Review. Information. 13(10):49. https://doi.org/10.3390/info13100499
Capuano, N., Foggia, P., Greco, L., & Ritrovato, P. (2022). A Linked Data Application for Harmonizing Heterogeneous Biomedical Information. Appl. Sci. 12(18):9317. https://doi.org/10.3390/app12189317
Chaves, A., Kesiku, C., & Garcia-Zapirain, B. (2022). Automatic Text Summarization of Biomedical Text Data: A Systematic Review. Information. 13(8):393. https://doi.org/10.3390/info13080393
Pedral Sampaio, R., Aguiar Costa, A., & Flores-Colen, I. (2022). A Systematic Review of Artificial Intelligence Applied to Facility Management in the Building Information Modeling Context and Future Research Directions. Buildings. 12(11):1939. https://doi.org/10.3390/buildings12111939
Daud, S., Ullah, M., Rehman, A., Saba, T., Damaševičius, R., & Sattar, A. (2023). Topic Classification of Online News Articles Using Optimized Machine Learning Models. Computers. 12(1):16. https://doi.org/10.3390/computers12010016
Asmussen, C.B., & Møller, C. (2019). Smart literature review: a practical topic modelling approach to exploratory literature review. J Big Data. 6 (93) . https://doi.org/10.1186/s40537-019-0255-7
Albalawi, R., Yeap, T.H., & Benyoucef, M. (2020). Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis. Front. Artif. Intell. 3:42. doi: 10.3389/frai.2020.00042
Silva, C.C., Galster, M. & Gilson, F. (2021). Topic modeling in software engineering research. Empir Software Eng. 26:120. https://doi.org/10.1007/s10664-021-10026-0
Owa, D. (2021) Identification of Topics from Scientific Papers through Topic Modeling. Open Journal of Applied Sciences, 11, 541-548. doi: 10.4236/ojapps.2021.104038.
Kukushkin, K., Ryabov, Y., & Borovkov, A. (2022). Digital Twins: A Systematic Literature Review Based on Data Analysis and Topic Modeling. Data. 7(12):173. https://doi.org/10.3390/data7120173
Chiu, C.-C., Wu, C.-M., Chien, T.-N., Kao, L.-J., & Qiu, J.T. (2022). Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques. Healthcare. 10(6):1087. https://doi.org/10.3390/healthcare10061087
Wu, F., Xu, W., Lin, C., & Zhang, Y. (2022). Knowledge Trajectories on Public Crisis Management Research from Massive Literature Text Using Topic-Clustered Evolution Extraction. Mathematics. 10(12):1966. https://doi.org/10.3390/math10121966
Delcea, C., Cotfas, L.-A., Crăciun, L., & Molănescu, A.G. (2022). New Wave of COVID-19 Vaccine Opinions in the Month the 3rd Booster Dose Arrived. Vaccines. 10(6):881. https://doi.org/10.3390/vaccines10060881
Ong, S.-Q., Pauzi, M.B.M., & Gan, K.H. (2022). Text Mining and Determinants of Sentiments towards the COVID-19 Vaccine Booster of Twitter Users in Malaysia. Healthcare. 10(6):994. https://doi.org/10.3390/healthcare10060994
Feizollah, A., Anuar, N.B., Mehdi, R., Firdaus, A., & Sulaiman, A. (2022). Understanding COVID-19 Halal Vaccination Discourse on Facebook and Twitter Using Aspect-Based Sentiment Analysis and Text Emotion Analysis. Int. J. Environ. Res. Public Health. 19(10):6269. https://doi.org/10.3390/ijerph19106269
Gourisaria, M.K., Chandra, S., Das, H., Patra, S.S., Sahni, M., Leon-Castro, E., Singh, V., & Kumar, S. (2022). Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies. Healthcare. 10(5):881. https://doi.org/10.3390/healthcare10050881
Gupta, R. K., Agarwalla, R., Naik, H. H., Evuri, J. R., Thapa, A., & Singh, T. D. (2022). Prediction of research trends using LDA based topic modeling. Global Transitions Proceedings. 3:1 (298-304). https://doi.org/10.1016/j.gltp.2022.03.015
Rani, S., & Kumar, M. (2021). Topic modeling and its applications in materials science and engineering. Materialstoday:Proceedings. 45:6 (5591-5596). https://doi.org/10.1016/j.matpr.2021.02.313
Choi, H. S., Lee, W. S., & Sohn, S. Y. (2017). Analyzing research trends in personal information privacy using topic modeling. Computers & Security. 67 (244-253). https://doi.org/10.1016/j.cose.2017.03.007
Trivedi, S.K., Patra, P., Singh, A., Deka, P. and Srivastava, P.R. (2022), "Analyzing the research trends of COVID-19 using topic modeling approach", Journal of Modelling in Management, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/JM2-02-2022-0045
Gupta, A., Aeron, S., Agrawal, A., & Gupta, H. (2021). Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA. Front. Digit. Health 3:686720. doi: 10.3389/fdgth.2021.686720
Kim, M., & Kim, D. A. (2022). Suggestion on the LDA-Based Topic Modeling Technique Based on ElasticSearch for Indexing Academic Research Results. Appl. Sci. 12, 3118. https://doi.org/10.3390/app12063118
Lee, J. W., Kim, Y. B., & Han, D. H. (2022). LDA-based topic modeling for COVID-19 related sports research trends. Front. Psychol. 13:1033872. doi: 10.3389/fpsyg.2022.1033872
Moshawrab, M., Adda, M., Bouzouane, A., Ibrahim, H., & Raad, A. (2023). Smart Wearables for the Detection of Cardiovascular Diseases: A Systematic Literature Review. Sensors. 23(2):828. https://doi.org/10.3390/s23020828
Davagdorj, K., Wang, L., Li, M., Pham, V.-H., Ryu, K.H., & Theera-Umpon, N. (2022). Discovering Thematically Coherent Biomedical Documents Using Contextualized Bidirectional Encoder Representations from Transformers-Based Clustering. Int. J. Environ. Res. Public Health. 19(10):5893. https://doi.org/10.3390/ijerph19105893
Diaf, S., & Fritsche, U. (2022). Topic Scaling: A Joint Document Scaling–Topic Model Approach to Learn Time-Specific Topics. Algorithms. 15(11):430. https://doi.org/10.3390/a15110430
Duan, Z., Lu, L., Yang, W., Wang, J., & Wang, Y. (2022). An Abstract Summarization Method Combining Global Topics. Appl. Sci. 12(20):10378. https://doi.org/10.3390/app122010378
Scarpino, I., Zucco, C., Vallelunga, R., Luzza, F., & Cannataro, M. (2022). Investigating Topic Modeling Techniques to Extract Meaningful Insights in Italian Long COVID Narration. BioTech. 11(3):41. https://doi.org/10.3390/biotech11030041
Xu, H., Zhang, M., Zeng, J., Hao, H., Lin, H.-C.K., & Xiao, M. (2022). Use of Latent Dirichlet Allocation and Structural Equation Modeling in Determining the Factors for Continuance Intention of Knowledge Payment Platform. Sustainability. 14(15):8992. https://doi.org/10.3390/su14158992
Hananto, V.R., Serdült, U., & Kryssanov, V. (2022). A Text Segmentation Approach for Automated Annotation of Online Customer Reviews, Based on Topic Modeling. Appl. Sci. 12(7):3412. https://doi.org/10.3390/app12073412
Murakami, R., & Chakraborty, B. (2022). Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts. Sensors. 22(3):852. https://doi.org/10.3390/s22030852
Sangaji, A. H., Pamungkas, Y., Nugroho , S. M. S. ., & Wibawa , A. D. (2022). Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 7(1), 69-80. https://doi.org/10.22219/kinetik.v7i1.1377
Quatrini, E., Colabianchi, S., Costantino, F., & Tronci, M. Clustering Application for Condition-Based Maintenance in Time-Varying Processes: A Review Using Latent Dirichlet Allocation. Appl. Sci. 12(2):814. https://doi.org/10.3390/app12020814
Downloads
Published
Issue
Section
License
The copyright of the article that accepted for publication shall be assigned to Jurnal Sisfokom (Sistem Informasi dan Komputer) and LPPM ISB Atma Luhur as the publisher of the journal. Copyright includes the right to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.
Jurnal Sisfokom (Sistem Informasi dan Komputer), LPPM ISB Atma Luhur, and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Sisfokom (Sistem Informasi dan Komputer) are the sole and exclusive responsibility of their respective authors.
Jurnal Sisfokom (Sistem Informasi dan Komputer) has full publishing rights to the published articles. Authors are allowed to distribute articles that have been published by sharing the link or DOI of the article. Authors are allowed to use their articles for legal purposes deemed necessary without the written permission of the journal with the initial publication notification from the Jurnal Sisfokom (Sistem Informasi dan Komputer).
The Copyright Transfer Form can be downloaded [Copyright Transfer Form Jurnal Sisfokom (Sistem Informasi dan Komputer).
This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s). After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted. The copyright form should be signed originally, and send it to the Editorial in the form of scanned document to sisfokom@atmaluhur.ac.id.