Towards accelerating small molecule drug discovery with pre-trained, late fusion multi-view models
Abstract
Foundation models have transformed the execution of many tasks and are an area of active exploration in small molecule drug discovery. Typically, small molecule foundation models focus on a single representation of the molecule, such as a SMILES string input into a text-based model. However, molecules may be represented in numerous ways including as images, chemically bonded graphs, or three-dimensional structures. Each representation or ‘view’ contains different, potentially complementary information that if combined can yield a more accurate and robust model. Here we describe a multi-view foundation model that incorporates several pre-trained representations to achieve this goal. Each view has already been pre-trained on hundreds of millions of molecules. Complementarity of representations in embedding space is evaluated. We explore multi-modal, late fusion techniques and fine-tune our models on datasets covering a large variety of downstream tasks. We find that our multi-view models can overall outperform models reliant on a single representation.