Augmented disconnection aware retrosynthesis to facilitate user interaction
Abstract
In single-step retrosynthesis, a target molecule is broken down by considering the bonds to be changed and/or functional group interconversions. This is then applied recursively in an automated manner to yield multi-step synthesis plans towards an objective function, for instance the availability of commercial starting materials. Several deep-learning-based approaches to single-step retrosynthesis treat the prediction of possible disconnections as a translation task, relying on the use of the Transformer architecture and the simplified molecular-input line-entry system (SMILES) notation. Given a target molecule, these approaches suggest the best set of precursors (i.e. reactants, and possibly other reagents) as the translation's outcome, with the possibility to generate multiple such sets. However, the suggestion of optimal precursors is limited by the training dataset and affords the chemist little control over the site at which disconnections are made, and the overarching strategy taken by the route-finding algorithm. Herein, we build on our extension of transformer-based models for single-step retrosynthesis by implementing augmentation strategies on the training set. These enhance user-defined exploration of synthetic routes, thus are a step towards learning overall retrosynthetic strategy. By harnessing the power of modern deep learning, with template-based augmentation, and human-knowledge, we take a step towards improving decision-making strategies that statistical and machine learning algorithms cannot yet encode due to a lack of relevant training data. Ultimately this serves to enhance a chemist’s experience by facilitating user engagement.