Predicting the right Reaction Solvents in Organic Synthesis using Artificial Intelligence
Abstract
The right solvent is a crucial factor in achieving environmentally friendly, selective, and highly converted chemical reactions. While artificial intelligence-based computer-aided synthesis tools are capable of predicting starting materials and reactants for synthesizing a desired product, they often lack the ability to reliably predict reaction conditions such as the appropriate solvent. In this study, we demonstrate that data-driven machine-learning models can reliably predict the correct solvent for a broad spectrum of organic reactions. We extracted single-solvent reactions from two patent-derived datasets, Pistachio and the USPTO dataset which is openly available. We trained a BERT-based classifier and a random forest in combination with differential reaction fingerprints, achieving a Top-3 accuracy of up to 86.88\% for predicting the most commonly used solvent, as well as a reliable prediction of underrepresented classes with an F1-macro score of up to 56.87\%. An uncertainty analysis revealed that the models' misclassifications can often be explained by the fact that the reaction class of the reaction in question can be run in multiple solvents. These models are currently undergoing experimental validation in a campaign to test reactions that were successfully run in a solvent that differs from the one predicted by the model, in order to evaluate their real-world applicability. This work highlights the potential of data-driven approaches for addressing key challenges in organic synthesis, demonstrating the practical application of machine learning models in predicting reaction solvents for more efficient and sustainable chemical synthesis