Annual Meeting of the Japanese Society of Toxicology
The 51st Annual Meeting of the Japanese Society of Toxicology
Session ID : P-65S
Conference information

Poster Session
The impact of SMILES representation inconsistencies in the operation of chemical language models for QSAR prediction
*Yosuke KIKUCHIYasuhiro YOSHIKAIAyako FURUHAMASyumpei NEMOTOTakashi YAMADAHiroyuki KUSUHARATadahaya MIZUNO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

A chemical language model, analogous to natural language processing, is employed in machine learning to handle compound structures, with SMILES representation as its principal input format. However, disparities in processing methods across databases lead to notational inconsistencies, even in Canonical SMILES. This study explores the influence of these 'dialects' in the applications of chemical language models for QSAR tasks.

An initial investigation revealed discrepancies in SMILES representations regarding stereochemistry across databases, which could impact the application of chemical language models. To assess the influence of these inconsistencies, three pre-processing methods were implemented: standard procedures, explicit stereoisomer assignment via 3D structure calculation, and exclusion of stereoisomeric data. Three corresponding models were constructed and evaluated for their performance on the Ames test dataset, focusing on translation and classification accuracy. Findings indicated notable enhancements in translation accuracy with the proposed pre-processing methods, while classification accuracy showed marginal improvement.

From the above, we found that (1) compound databases have many notational inconsistencies, and (2) taking stereoisomerism into account may contribute to improving the accuracy when applying chemical language models. It is expected that the operation of an appropriate chemical language model will contribute to the improvement of QSAR tasks, such as Ames mutagenicity prediction.

Content from these authors
© 2024 The Japanese Society of Toxicology
Previous article Next article
feedback
Top