Data-driven molecular design for discovery and synthesis of novel ligands: A case study on SARS-CoV-2
Abstract
Bridging systems biology and drug design, we propose a deep learning framework for de novo discovery of molecules tailored to bind with given protein targets. Our methodology is exemplified by the task of designing antiviral candidates to target SARS-CoV-2 related proteins. Crucially, our framework does not require fine-tuning for specific proteins but is demonstrated to generalize in proposing ligands with high predicted binding affinities against unseen targets. Coupling our framework with the automatic retrosynthesis prediction of IBM RXN for Chemistry, we demonstrate the feasibility of swift chemical synthesis of molecules with potential antiviral properties that were designed against a specific protein target. In particular, we synthesize an antiviral candidate designed against the host protein angiotensin converting enzyme 2 (ACE2); a surface receptor on human respiratory epithelial cells that facilitates SARS-CoV-2 cell entry through its spike glycoprotein. This is achieved as follows. First, we train a multimodal ligand–protein binding affinity model on predicting affinities of bioactive compounds to target proteins and couple this model with pharmacological toxicity predictors. Exploiting this multi-objective as a reward function of a conditional molecular generator that consists of two variational autoencoders (VAE), our framework steers the generation toward regions of the chemical space with high-reward molecules. Specifically, we explore a challenging setting of generating ligands against unseen protein targets by performing a leave-one-out-cross-validation on 41 SARS-CoV-2-related target proteins. Using deep reinforcement learning, it is demonstrated that in 35 out of 41 cases, the generation is biased towards sampling binding ligands, with an average increase of 83% comparing to an unbiased VAE. The generated molecules exhibit favorable properties in terms of target binding affinity, selectivity and drug-likeness. We use molecular retrosynthetic models to provide a synthetic accessibility assessment of the best generated hit molecules. Finally, with this end-to-end framework, we synthesize 3-Bromobenzylamine, a potential inhibitor of the host ACE2 protein, solely based on the recommendations of a molecular retrosynthesis model and a synthesis protocol prediction model. We hope that our framework can contribute towards swift discovery of de novo molecules with desired pharmacological properties.