Guiding multistep retrosynthesis planning with continuous pathway representations
Abstract
Finding feasible multistep synthesis pathways is key to accelerating the discovery process in synthetic organic chemistry. While most recent research has focused on improving single-step retrosynthesis modeling, little attention has been paid to improving retrosynthesis strategy, that is, the art of composing predicted single-step reactions to obtain efficient and realistic multistep synthesis routes. To tackle this goal, we herein introduce the notion of a “synthesis fingerprint”; a continuous embedding representing a series of reactions that constitute a synthetic route. The synthesis-fingerprint is an extension of the previously proposed reaction-fingerprint (rxnfp; Nature Machine Intelligence, 2021) and is defined by a normalized aggregation of all reaction-fingerprints of a synthesis route. Using quantitative as well as qualitative analyses on synthesis routes extracted from the Pistachio dataset, we show that the synthesis-fingerprint defines a metric in the discrete and sparse space of multistep synthesis routes that makes it possible to compare routes. We then exploit this induced metric in the synthesis space to guide multistep retrosynthesis planning, effectively mapping the retrosynthesis strategy problem to a geometrical optimisation problem. In particular, we expand on the hypergraph-exploration strategy using the Molecular Transformer as proposed in Schwaller et al. (2020; Chemical Science). However, instead of scoring single-step reactions based on local metrics (e.g., synthesizability scores of precursors), we drive the design of multi-step synthesis using the information of the nearest neighbors in the synthesis-fingerprint space. Our results show that the proposed routes can obtain more realistic chemistry and have a higher similarity to existing routes compared to local-score methods.