Comparative Analysis: Machine Learning Algorithms for TOC Prediction in Pharmaceutical Water Treatment Systems

Authors

  • Dieki Rian Mustapa Institut Teknologi Sepuluh Nopember
  • Aris Tjahyanto Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.32736/sisfokom.v13i2.2148

Keywords:

Machine Learning, Total Organic Carbon, Pharmaceutical Water Treatment Systems, Algorithm Comparison, Water Quality Assessment

Abstract

Water quality is crucial in pharmaceutical production, where it serves as a solvent and raw material. Contamination with organic compounds poses a risk to product integrity and safety. TOC serves as a key indicator for assessing organic pollution levels in water. An increase in TOC signals potential issues with water treatment systems. Machine learning prediction of TOC values is essential for preemptive monitoring and maintenance. This study aimed to compare three different machine learning algorithms - Linear Regression (RL), Random Forest (RF), and multilayer perceptron (MLP) - for predicting Total Organic Carbon (TOC) in pharmaceutical water treatment systems. By utilizing a dataset covering various operational conditions of pharmaceutical water treatment systems, the research conducted a comprehensive analysis. Each algorithm underwent evaluation using performance metrics like coefficient of determination (R-squared), and prediction accuracy to assess their effectiveness in predicting TOC concentrations. A correlation coefficient approaching 1 (100%) signifies a strong relationship between model predictions and actual target values (accuracy prediction), while a smaller Mean Absolute Error (MAE) indicates higher accuracy in predicting target values. The study found that the results of the correlation coefficient in order from highest to lowest are the RF, MLP, and RL models with values of 95.04%, 93.11%, and 80.27%, respectively. Likewise, additional metrics for evaluation, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE), exhibit a ranking from lowest to highest values across RF, MLP, and RL models. RF has a higher prediction accuracy of the TOC than other models (95%) and lowest MAE (3.9). This research offers valuable insights into utilizing machine learning algorithms for TOC prediction within pharmaceutical water treatment to make informed decisions, improving water treatment systems and overall product quality.

Author Biography

Dieki Rian Mustapa, Institut Teknologi Sepuluh Nopember

Sekolah Interdisiplin Manajemen dan Teknologi, Institut Teknologi Sepuluh Nopember

References

T. Sandle, “Chapter 14 - Assessment of pharmaceutical water systems,” in Biocontamination Control for Pharmaceuticals and Healthcare (Second Edition), T. Sandle, Ed., Academic Press, 2024, pp. 313–327. doi: https://doi.org/10.1016/B978-0-443-21600-8.00014-2.

F. Roeder and T. Sandle, “Microbial Contamination in Water Systems,” PDA J Pharm Sci Technol, p. pdajpst.2021.012636, Jan. 2022, doi: 10.5731/pdajpst.2021.012636.

H.-S. Lee, J. Hur, and H.-S. Shin, “Enhancing the total organic carbon measurement efficiency for water samples containing suspended solids using alkaline and ultrasonic pretreatment methods,” Journal of Environmental Sciences, vol. 90, pp. 20–28, 2020, doi: https://doi.org/10.1016/j.jes.2019.11.010.

A. Shetty and A. Goyal, “Total organic carbon analysis in water – A review of current methods,” Mater Today Proc, vol. 65, pp. 3881–3886, 2022, doi: https://doi.org/10.1016/j.matpr.2022.07.173.

Y. Huang, L. Zhang, and L. Ran, “Total Organic Carbon Concentration and Export in a Human-Dominated Urban River: A Case Study in the Shenzhen River and Bay Basin,” Water (Basel), vol. 14, no. 13, 2022, doi: 10.3390/w14132102.

L. Zhu, X. Zhou, W. Liu, and Z. Kong, “Total organic carbon content logging prediction based on machine learning: A brief review,” Energy Geoscience, vol. 4, no. 2, p. 100098, 2023.

L. Goliatt, C. M. Saporetti, and E. Pereira, “Super learner approach to predict total organic carbon using stacking machine learning models based on well logs,” Fuel, vol. 353, p. 128682, 2023, doi: https://doi.org/10.1016/j.fuel.2023.128682.

J. T. Lingkungan, M. Haekal, and W. C. Wibowo, “Prediksi Kualitas Air Sungai Menggunakan Metode Pembelajaran Mesin: Studi Kasus Sungai Ciliwung Prediction of River Water Quality Using Machine Learning Methods: Ciliwung River Case Study,” vol. 24, no. 2, pp. 273–282, 2023.

J. Wang, D. Gu, W. Guo, H. Zhang, and D. Yang, “Determination of Total Organic Carbon Content in Shale Formations With Regression Analysis,” J Energy Resour Technol, vol. 141, no. 1, Jan. 2019, doi: 10.1115/1.4040755.

R. C. Wibowo, O. Dewanto, and M. Sarkowi, “Total organic carbon (TOC) prediction using machine learning methods based on well logs data,” in AIP Conference Proceedings, AIP Publishing, 2022.

R. C. Wibowo, O. Dewanto, and M. Sarkowi, “Total Organic Carbon (TOC) Prediction Using Machine Learning Methods Based on Well Logs Data,” in AIP Conference Proceedings, American Institute of Physics Inc., Oct. 2022. doi: 10.1063/5.0103209.

J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Syst Appl, vol. 134, pp. 93–101, 2019, doi: https://doi.org/10.1016/j.eswa.2019.05.028.

A. M. Handhal, A. M. Al-Abadi, H. E. Chafeet, and M. J. Ismail, “Prediction of total organic carbon at Rumaila oil field, Southern Iraq using conventional well logs and machine learning algorithms,” Mar Pet Geol, vol. 116, p. 104347, 2020, doi: https://doi.org/10.1016/j.marpetgeo.2020.104347.

D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear regression analysis. John Wiley & Sons, 2021.

D. Maulud and A. M. Abdulazeez, “A Review on Linear Regression Comprehensive in Machine Learning,” Journal of Applied Science and Technology Trends, vol. 1, no. 2, pp. 140–147, Dec. 2020, doi: 10.38094/jastt1457.

S. Badillo et al., “An Introduction to Machine Learning,” Clin Pharmacol Ther, vol. 107, no. 4, pp. 871–885, Apr. 2020, doi: https://doi.org/10.1002/cpt.1796.

M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata J, vol. 20, no. 1, pp. 3–29, Mar. 2020, doi: 10.1177/1536867X20909688.

S. S. Azmi and S. Baliga, “An overview of boosting decision tree algorithms utilizing AdaBoost and XGBoost boosting strategies,” Int. Res. J. Eng. Technol, vol. 7, no. 5, pp. 6867–6870, 2020.

S. Nosratabadi, S. Ardabili, Z. Lakner, C. Mako, and A. Mosavi, “Prediction of food production using machine learning algorithms of multilayer perceptron and ANFIS,” Agriculture, vol. 11, no. 5, p. 408, 2021.

F. Yang, H. Moayedi, and A. Mosavi, “Predicting the degree of dissolved oxygen using three types of multi-layer perceptron-based artificial neural networks,” Sustainability, vol. 13, no. 17, p. 9898, 2021.

E. R. AlBasiouny, A.-F. A. Heliel, H. E. Abdelmunim, and H. M. Abbas, “Multilayer Perceptron Generative Model via Adversarial Learning for Robust Visual Tracking,” IEEE Access, vol. 10, pp. 121230–121248, 2022, doi: 10.1109/ACCESS.2022.3222867.

M. I. C. Rachmatullah, J. Santoso, and K. Surendro, “A Novel Approach in Determining Neural Networks Architecture to Classify Data With Large Number of Attributes,” IEEE Access, vol. 8, pp. 204728–204743, 2020, doi: 10.1109/ACCESS.2020.3036853.

J. Pavic, “An Introduction to WEKA: The All-in-One Machine Learning Software in Java”.

A. Sadiq, “Intrusion Detection Using the WEKA Machine Learning Tool,” 2021.

B. Saleh, A. Saedi, A. al-Aqbi, and L. Salman, “Analysis of Weka Data Mining Techniques for Heart Disease Prediction System,” International Journal of Medical Reviews, vol. 7, no. 1, pp. 15–24, 2020, doi: 10.30491/ijmr.2020.221474.1078.

S. F. Mohd Radzi, M. S. Hassan, and M. A. H. Mohd Radzi, “Comparison of classification algorithms for predicting autistic spectrum disorder using WEKA modeler,” BMC Med Inform Decis Mak, vol. 22, no. 1, p. 306, 2022, doi: 10.1186/s12911-022-02050-x.

V. Da Poian et al., “Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry,” Frontiers in Astronomy and Space Sciences, vol. 10, p. 1134141, 2023.

D. T. H. S. Tariq and P. S. Aithal, “Visualization and Explorative Data Analysis,” Int J Enhanc Res Sci Technol Eng, vol. 12, no. 3, pp. 11–21, 2023.

Engr. Dr. F. Obodoeze, C. Nwabueze, and S. Akaneme, “Comparative Evaluation of Machine Learning Regression Algorithms for PM2.5 Monitoring,” American Journal of Engineering Research, vol. 10, pp. 19–33, Dec. 2021.

J. Rong et al., “Machine Learning Method for TOC Prediction: Taking Wufeng and Longmaxi Shales in the Sichuan Basin, Southwest China as an Example,” Geofluids, vol. 2021, 2021, doi: 10.1155/2021/6794213.

A. Apaza-Pinto, J. Esquicha-Tejada, P. López-Casaperalta, and J. Sulla-Torres, “Supervised Machine Learning Techniques for the Prediction of the State of Charge of Batteries in Photovoltaic Systems in the Mining Sector,” IEEE Access, vol. 10, pp. 134307–134317, 2022, doi: 10.1109/ACCESS.2022.3225406.

Downloads

Published

2024-06-10

Issue

Section

Articles