Article ID: 2026019
Metal oxide nanoparticles (NPs) are extensively employed in the biomedical, environmental, and industrial domains due to their unique physicochemical properties. However, concerns regarding their potential cytotoxicity require the development of accurate predictive models to assess nanoparticle safety. In this study, we present a machine learning-based framework for predicting the toxicity of metal oxide NPs using curated physicochemical descriptors. Data were systematically extracted and structured from 140 peer-reviewed publications, focusing on four representative metal oxide nanoparticles (ZnO, AgO, CuO, SiO2). To ensure accessibility and consistency, the dataset was structured using a Large Language Model (LLM) API and designed to be well-balanced and minimally correlated. The maintenance of a low correlation between features (average Pearson correlation=0.19) was prioritized to reduce redundancy and improve the interpretability of the model results. Feature selection and Principal Component Analysis (PCA) confirmed that a subset of physical descriptors effectively captured toxicity-related trends. The optimized Gradient Boosting Machine (GBM) and Support Vector Machine (SVM) models achieved predictive accuracies of 77% and 81%, respectively, without overfitting. In addition, a synthetic dataset was generated to investigate the joint effects of core size and exposure dosage on toxicity probability. Overall, this study aims to provide a predictive approach framework for nanotoxicity assessment that might offer guidance for the rational design of safer nanoparticles.