Leveraging Topic Modelling to Analyze Biomedical Research Trends from the PubMed Database Using LDA Method

Yuri Pamungkas(1*)

(1) Department of Medical Technology, Institut Teknologi Sepuluh Nopember
(*) Corresponding Author

Abstract


Biomedical research has become an essential entity in human life. However, finding trends related to research topics in the health sector contained in the repository is a challenging matter. In this study, we implemented topic modelling to analyze biomedical research trends using the LDA method. Topic modelling was carried out using data from 7000 articles from PubMed, which were processed with text processing such as lowercase, punctuation removal, tokenization, stop-word removal, and lemmatization. For topic modelling, the LDA with corpus conditions varied to 75% and 100% for validation. Alpha and beta parameters are also set with variations between 0.01, 0.31, 0.61, 0.91, symmetry, and asymmetry when the number of the corpus is changed. When the number of the corpus is 75%, the optimal number of topics is 7, with a coherence value of 0.52. Whereas when the number of the corpus is 100%, the optimal number of topics is 10 with a coherence value of 0.51. In addition, based on the results of article topic modelling, several topics are trending, including disease diagnosis, patient care, and genetic or cell research. Based on the classification of biomedical topics into seven categories, the optimal accuracy, precision, and recall values using the Random Forest algorithm were obtained, namely 85.57%, 87.36%, and 87.58%. The results of this study suggest that topic modelling using the LDA can be used to identify trends in biomedical research with high accuracy. This information can help stakeholders make informed decisions about the direction of future research.


Keywords


Biomedical Research; Topic Modelling; LDA Method; PubMed; Text Processing

Full Text:

PDF

References


Kesiku, C.Y., Chaves-Villota, A., & Garcia-Zapirain, B. (2022). Natural Language Processing Techniques for Text Classification of Biomedical Documents: A Systematic Review. Information. 13(10):49. https://doi.org/10.3390/info13100499

Capuano, N., Foggia, P., Greco, L., & Ritrovato, P. (2022). A Linked Data Application for Harmonizing Heterogeneous Biomedical Information. Appl. Sci. 12(18):9317. https://doi.org/10.3390/app12189317

Chaves, A., Kesiku, C., & Garcia-Zapirain, B. (2022). Automatic Text Summarization of Biomedical Text Data: A Systematic Review. Information. 13(8):393. https://doi.org/10.3390/info13080393

Pedral Sampaio, R., Aguiar Costa, A., & Flores-Colen, I. (2022). A Systematic Review of Artificial Intelligence Applied to Facility Management in the Building Information Modeling Context and Future Research Directions. Buildings. 12(11):1939. https://doi.org/10.3390/buildings12111939

Daud, S., Ullah, M., Rehman, A., Saba, T., Damaševičius, R., & Sattar, A. (2023). Topic Classification of Online News Articles Using Optimized Machine Learning Models. Computers. 12(1):16. https://doi.org/10.3390/computers12010016

Asmussen, C.B., & Møller, C. (2019). Smart literature review: a practical topic modelling approach to exploratory literature review. J Big Data. 6 (93) . https://doi.org/10.1186/s40537-019-0255-7

Albalawi, R., Yeap, T.H., & Benyoucef, M. (2020). Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis. Front. Artif. Intell. 3:42. doi: 10.3389/frai.2020.00042

Silva, C.C., Galster, M. & Gilson, F. (2021). Topic modeling in software engineering research. Empir Software Eng. 26:120. https://doi.org/10.1007/s10664-021-10026-0

Owa, D. (2021) Identification of Topics from Scientific Papers through Topic Modeling. Open Journal of Applied Sciences, 11, 541-548. doi: 10.4236/ojapps.2021.104038.

Kukushkin, K., Ryabov, Y., & Borovkov, A. (2022). Digital Twins: A Systematic Literature Review Based on Data Analysis and Topic Modeling. Data. 7(12):173. https://doi.org/10.3390/data7120173

Chiu, C.-C., Wu, C.-M., Chien, T.-N., Kao, L.-J., & Qiu, J.T. (2022). Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques. Healthcare. 10(6):1087. https://doi.org/10.3390/healthcare10061087

Wu, F., Xu, W., Lin, C., & Zhang, Y. (2022). Knowledge Trajectories on Public Crisis Management Research from Massive Literature Text Using Topic-Clustered Evolution Extraction. Mathematics. 10(12):1966. https://doi.org/10.3390/math10121966

Delcea, C., Cotfas, L.-A., Crăciun, L., & Molănescu, A.G. (2022). New Wave of COVID-19 Vaccine Opinions in the Month the 3rd Booster Dose Arrived. Vaccines. 10(6):881. https://doi.org/10.3390/vaccines10060881

Ong, S.-Q., Pauzi, M.B.M., & Gan, K.H. (2022). Text Mining and Determinants of Sentiments towards the COVID-19 Vaccine Booster of Twitter Users in Malaysia. Healthcare. 10(6):994. https://doi.org/10.3390/healthcare10060994

Feizollah, A., Anuar, N.B., Mehdi, R., Firdaus, A., & Sulaiman, A. (2022). Understanding COVID-19 Halal Vaccination Discourse on Facebook and Twitter Using Aspect-Based Sentiment Analysis and Text Emotion Analysis. Int. J. Environ. Res. Public Health. 19(10):6269. https://doi.org/10.3390/ijerph19106269

Gourisaria, M.K., Chandra, S., Das, H., Patra, S.S., Sahni, M., Leon-Castro, E., Singh, V., & Kumar, S. (2022). Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies. Healthcare. 10(5):881. https://doi.org/10.3390/healthcare10050881

Gupta, R. K., Agarwalla, R., Naik, H. H., Evuri, J. R., Thapa, A., & Singh, T. D. (2022). Prediction of research trends using LDA based topic modeling. Global Transitions Proceedings. 3:1 (298-304). https://doi.org/10.1016/j.gltp.2022.03.015

Rani, S., & Kumar, M. (2021). Topic modeling and its applications in materials science and engineering. Materialstoday:Proceedings. 45:6 (5591-5596). https://doi.org/10.1016/j.matpr.2021.02.313

Choi, H. S., Lee, W. S., & Sohn, S. Y. (2017). Analyzing research trends in personal information privacy using topic modeling. Computers & Security. 67 (244-253). https://doi.org/10.1016/j.cose.2017.03.007

Trivedi, S.K., Patra, P., Singh, A., Deka, P. and Srivastava, P.R. (2022), "Analyzing the research trends of COVID-19 using topic modeling approach", Journal of Modelling in Management, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/JM2-02-2022-0045

Gupta, A., Aeron, S., Agrawal, A., & Gupta, H. (2021). Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA. Front. Digit. Health 3:686720. doi: 10.3389/fdgth.2021.686720

Kim, M., & Kim, D. A. (2022). Suggestion on the LDA-Based Topic Modeling Technique Based on ElasticSearch for Indexing Academic Research Results. Appl. Sci. 12, 3118. https://doi.org/10.3390/app12063118

Lee, J. W., Kim, Y. B., & Han, D. H. (2022). LDA-based topic modeling for COVID-19 related sports research trends. Front. Psychol. 13:1033872. doi: 10.3389/fpsyg.2022.1033872

Moshawrab, M., Adda, M., Bouzouane, A., Ibrahim, H., & Raad, A. (2023). Smart Wearables for the Detection of Cardiovascular Diseases: A Systematic Literature Review. Sensors. 23(2):828. https://doi.org/10.3390/s23020828

Davagdorj, K., Wang, L., Li, M., Pham, V.-H., Ryu, K.H., & Theera-Umpon, N. (2022). Discovering Thematically Coherent Biomedical Documents Using Contextualized Bidirectional Encoder Representations from Transformers-Based Clustering. Int. J. Environ. Res. Public Health. 19(10):5893. https://doi.org/10.3390/ijerph19105893

Diaf, S., & Fritsche, U. (2022). Topic Scaling: A Joint Document Scaling–Topic Model Approach to Learn Time-Specific Topics. Algorithms. 15(11):430. https://doi.org/10.3390/a15110430

Duan, Z., Lu, L., Yang, W., Wang, J., & Wang, Y. (2022). An Abstract Summarization Method Combining Global Topics. Appl. Sci. 12(20):10378. https://doi.org/10.3390/app122010378

Scarpino, I., Zucco, C., Vallelunga, R., Luzza, F., & Cannataro, M. (2022). Investigating Topic Modeling Techniques to Extract Meaningful Insights in Italian Long COVID Narration. BioTech. 11(3):41. https://doi.org/10.3390/biotech11030041

Xu, H., Zhang, M., Zeng, J., Hao, H., Lin, H.-C.K., & Xiao, M. (2022). Use of Latent Dirichlet Allocation and Structural Equation Modeling in Determining the Factors for Continuance Intention of Knowledge Payment Platform. Sustainability. 14(15):8992. https://doi.org/10.3390/su14158992

Hananto, V.R., Serdült, U., & Kryssanov, V. (2022). A Text Segmentation Approach for Automated Annotation of Online Customer Reviews, Based on Topic Modeling. Appl. Sci. 12(7):3412. https://doi.org/10.3390/app12073412

Murakami, R., & Chakraborty, B. (2022). Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts. Sensors. 22(3):852. https://doi.org/10.3390/s22030852

Sangaji, A. H., Pamungkas, Y., Nugroho , S. M. S. ., & Wibawa , A. D. (2022). Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 7(1), 69-80. https://doi.org/10.22219/kinetik.v7i1.1377

Quatrini, E., Colabianchi, S., Costantino, F., & Tronci, M. Clustering Application for Condition-Based Maintenance in Time-Varying Processes: A Review Using Latent Dirichlet Allocation. Appl. Sci. 12(2):814. https://doi.org/10.3390/app12020814




DOI: https://doi.org/10.32736/sisfokom.v13i2.2117

Refbacks

  • There are currently no refbacks.



Indexed By:

 



Creative Commons License
Jurnal Sisfokom (Sistem Informasi dan Komputer) has ISSN 2301-7988 and e-ISSN 2581-0588 which is published by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) ISB Atma Luhur under a Creative Commons Attribution-ShareAlike 4.0 International License.
Web Analytics Made Easy - StatCounter