About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ACS Fall 2024
Talk
Towards accelerating small molecule drug discovery with pre-trained, late fusion multi-view models
Abstract
Foundation models have transformed the execution of many tasks and are an area of active exploration in small molecule drug discovery. Typically, small molecule foundation models focus on a single representation of the molecule, such as a SMILES string input into a text-based model. However, molecules may be represented in numerous ways including as images, chemically bonded graphs, or three-dimensional structures. Each representation or ‘view’ contains different, potentially complementary information that if combined can yield a more accurate and robust model. Here we describe a multi-view foundation model that incorporates several pre-trained representations to achieve this goal. Each view has already been pre-trained on hundreds of millions of molecules. Complementarity of representations in embedding space is evaluated. We explore multi-modal, late fusion techniques and fine-tune our models on datasets covering a large variety of downstream tasks. We find that our multi-view models can overall outperform models reliant on a single representation.