Leveraging Topic Modelling to Analyze Biomedical Research Trends from the PubMed Database Using LDA Method
(1) Department of Medical Technology, Institut Teknologi Sepuluh Nopember
(*) Corresponding Author
Abstract
Biomedical research has become an essential entity in human life. However, finding trends related to research topics in the health sector contained in the repository is a challenging matter. In this study, we implemented topic modelling to analyze biomedical research trends using the LDA method. Topic modelling was carried out using data from 7000 articles from PubMed, which were processed with text processing such as lowercase, punctuation removal, tokenization, stop-word removal, and lemmatization. For topic modelling, the LDA with corpus conditions varied to 75% and 100% for validation. Alpha and beta parameters are also set with variations between 0.01, 0.31, 0.61, 0.91, symmetry, and asymmetry when the number of the corpus is changed. When the number of the corpus is 75%, the optimal number of topics is 7, with a coherence value of 0.52. Whereas when the number of the corpus is 100%, the optimal number of topics is 10 with a coherence value of 0.51. In addition, based on the results of article topic modelling, several topics are trending, including disease diagnosis, patient care, and genetic or cell research. Based on the classification of biomedical topics into seven categories, the optimal accuracy, precision, and recall values using the Random Forest algorithm were obtained, namely 85.57%, 87.36%, and 87.58%. The results of this study suggest that topic modelling using the LDA can be used to identify trends in biomedical research with high accuracy. This information can help stakeholders make informed decisions about the direction of future research.
Keywords
Full Text:
PDFReferences
Kesiku, C.Y., Chaves-Villota, A., & Garcia-Zapirain, B. (2022). Natural Language Processing Techniques for Text Classification of Biomedical Documents: A Systematic Review. Information. 13(10):49. https://doi.org/10.3390/info13100499
Capuano, N., Foggia, P., Greco, L., & Ritrovato, P. (2022). A Linked Data Application for Harmonizing Heterogeneous Biomedical Information. Appl. Sci. 12(18):9317. https://doi.org/10.3390/app12189317
Chaves, A., Kesiku, C., & Garcia-Zapirain, B. (2022). Automatic Text Summarization of Biomedical Text Data: A Systematic Review. Information. 13(8):393. https://doi.org/10.3390/info13080393
Pedral Sampaio, R., Aguiar Costa, A., & Flores-Colen, I. (2022). A Systematic Review of Artificial Intelligence Applied to Facility Management in the Building Information Modeling Context and Future Research Directions. Buildings. 12(11):1939. https://doi.org/10.3390/buildings12111939
Daud, S., Ullah, M., Rehman, A., Saba, T., Damaševičius, R., & Sattar, A. (2023). Topic Classification of Online News Articles Using Optimized Machine Learning Models. Computers. 12(1):16. https://doi.org/10.3390/computers12010016
Asmussen, C.B., & Møller, C. (2019). Smart literature review: a practical topic modelling approach to exploratory literature review. J Big Data. 6 (93) . https://doi.org/10.1186/s40537-019-0255-7
Albalawi, R., Yeap, T.H., & Benyoucef, M. (2020). Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis. Front. Artif. Intell. 3:42. doi: 10.3389/frai.2020.00042
Silva, C.C., Galster, M. & Gilson, F. (2021). Topic modeling in software engineering research. Empir Software Eng. 26:120. https://doi.org/10.1007/s10664-021-10026-0
Owa, D. (2021) Identification of Topics from Scientific Papers through Topic Modeling. Open Journal of Applied Sciences, 11, 541-548. doi: 10.4236/ojapps.2021.104038.
Kukushkin, K., Ryabov, Y., & Borovkov, A. (2022). Digital Twins: A Systematic Literature Review Based on Data Analysis and Topic Modeling. Data. 7(12):173. https://doi.org/10.3390/data7120173
Chiu, C.-C., Wu, C.-M., Chien, T.-N., Kao, L.-J., & Qiu, J.T. (2022). Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques. Healthcare. 10(6):1087. https://doi.org/10.3390/healthcare10061087
Wu, F., Xu, W., Lin, C., & Zhang, Y. (2022). Knowledge Trajectories on Public Crisis Management Research from Massive Literature Text Using Topic-Clustered Evolution Extraction. Mathematics. 10(12):1966. https://doi.org/10.3390/math10121966
Delcea, C., Cotfas, L.-A., Crăciun, L., & Molănescu, A.G. (2022). New Wave of COVID-19 Vaccine Opinions in the Month the 3rd Booster Dose Arrived. Vaccines. 10(6):881. https://doi.org/10.3390/vaccines10060881
Ong, S.-Q., Pauzi, M.B.M., & Gan, K.H. (2022). Text Mining and Determinants of Sentiments towards the COVID-19 Vaccine Booster of Twitter Users in Malaysia. Healthcare. 10(6):994. https://doi.org/10.3390/healthcare10060994
Feizollah, A., Anuar, N.B., Mehdi, R., Firdaus, A., & Sulaiman, A. (2022). Understanding COVID-19 Halal Vaccination Discourse on Facebook and Twitter Using Aspect-Based Sentiment Analysis and Text Emotion Analysis. Int. J. Environ. Res. Public Health. 19(10):6269. https://doi.org/10.3390/ijerph19106269
Gourisaria, M.K., Chandra, S., Das, H., Patra, S.S., Sahni, M., Leon-Castro, E., Singh, V., & Kumar, S. (2022). Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies. Healthcare. 10(5):881. https://doi.org/10.3390/healthcare10050881
Gupta, R. K., Agarwalla, R., Naik, H. H., Evuri, J. R., Thapa, A., & Singh, T. D. (2022). Prediction of research trends using LDA based topic modeling. Global Transitions Proceedings. 3:1 (298-304). https://doi.org/10.1016/j.gltp.2022.03.015
Rani, S., & Kumar, M. (2021). Topic modeling and its applications in materials science and engineering. Materialstoday:Proceedings. 45:6 (5591-5596). https://doi.org/10.1016/j.matpr.2021.02.313
Choi, H. S., Lee, W. S., & Sohn, S. Y. (2017). Analyzing research trends in personal information privacy using topic modeling. Computers & Security. 67 (244-253). https://doi.org/10.1016/j.cose.2017.03.007
Trivedi, S.K., Patra, P., Singh, A., Deka, P. and Srivastava, P.R. (2022), "Analyzing the research trends of COVID-19 using topic modeling approach", Journal of Modelling in Management, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/JM2-02-2022-0045
Gupta, A., Aeron, S., Agrawal, A., & Gupta, H. (2021). Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA. Front. Digit. Health 3:686720. doi: 10.3389/fdgth.2021.686720
Kim, M., & Kim, D. A. (2022). Suggestion on the LDA-Based Topic Modeling Technique Based on ElasticSearch for Indexing Academic Research Results. Appl. Sci. 12, 3118. https://doi.org/10.3390/app12063118
Lee, J. W., Kim, Y. B., & Han, D. H. (2022). LDA-based topic modeling for COVID-19 related sports research trends. Front. Psychol. 13:1033872. doi: 10.3389/fpsyg.2022.1033872
Moshawrab, M., Adda, M., Bouzouane, A., Ibrahim, H., & Raad, A. (2023). Smart Wearables for the Detection of Cardiovascular Diseases: A Systematic Literature Review. Sensors. 23(2):828. https://doi.org/10.3390/s23020828
Davagdorj, K., Wang, L., Li, M., Pham, V.-H., Ryu, K.H., & Theera-Umpon, N. (2022). Discovering Thematically Coherent Biomedical Documents Using Contextualized Bidirectional Encoder Representations from Transformers-Based Clustering. Int. J. Environ. Res. Public Health. 19(10):5893. https://doi.org/10.3390/ijerph19105893
Diaf, S., & Fritsche, U. (2022). Topic Scaling: A Joint Document Scaling–Topic Model Approach to Learn Time-Specific Topics. Algorithms. 15(11):430. https://doi.org/10.3390/a15110430
Duan, Z., Lu, L., Yang, W., Wang, J., & Wang, Y. (2022). An Abstract Summarization Method Combining Global Topics. Appl. Sci. 12(20):10378. https://doi.org/10.3390/app122010378
Scarpino, I., Zucco, C., Vallelunga, R., Luzza, F., & Cannataro, M. (2022). Investigating Topic Modeling Techniques to Extract Meaningful Insights in Italian Long COVID Narration. BioTech. 11(3):41. https://doi.org/10.3390/biotech11030041
Xu, H., Zhang, M., Zeng, J., Hao, H., Lin, H.-C.K., & Xiao, M. (2022). Use of Latent Dirichlet Allocation and Structural Equation Modeling in Determining the Factors for Continuance Intention of Knowledge Payment Platform. Sustainability. 14(15):8992. https://doi.org/10.3390/su14158992
Hananto, V.R., Serdült, U., & Kryssanov, V. (2022). A Text Segmentation Approach for Automated Annotation of Online Customer Reviews, Based on Topic Modeling. Appl. Sci. 12(7):3412. https://doi.org/10.3390/app12073412
Murakami, R., & Chakraborty, B. (2022). Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts. Sensors. 22(3):852. https://doi.org/10.3390/s22030852
Sangaji, A. H., Pamungkas, Y., Nugroho , S. M. S. ., & Wibawa , A. D. (2022). Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 7(1), 69-80. https://doi.org/10.22219/kinetik.v7i1.1377
Quatrini, E., Colabianchi, S., Costantino, F., & Tronci, M. Clustering Application for Condition-Based Maintenance in Time-Varying Processes: A Review Using Latent Dirichlet Allocation. Appl. Sci. 12(2):814. https://doi.org/10.3390/app12020814
DOI: https://doi.org/10.32736/sisfokom.v13i2.2117
Refbacks
- There are currently no refbacks.