Publication
ACS Fall 2024
Talk

Multimodal Molecular Representation Learning for Small Molecule Drug Discovery - Pretraining and Early Fusion Architectures

Abstract

Foundation models have shown great potential in drug discovery tasks such as molecular property prediction and conformation generation by learning latent representations of molecules through self-supervision techniques. However, learning useful and generalized latent representations is a hard problem due to wide ranges of prediction outputs and vast chemical space with large heterogeneity in molecular structures. One way to learn a richer representation is by exploiting multimodality: molecules may be represented as line encodings (e.g SMILES), chemically bonded graphs, images, or 3D structures. Each representation contains potentially complementary information. In this work, we describe architectures and pre-training tasks for training a foundation model that learns from these different views of the molecule by maximizing mutual information between the modalities. By combining multiple modalities through novel pretraining methods, our model shows promising results in various molecular prediction and generation tasks, highlighting the potential of our approach in advancing the field.