Interpretable latent representations for molecular property prediction models
Abstract
Latent representations of molecular property prediction models refer to the compressed and abstract feature representations of chemical compounds learned by artificial intelligence (AI) techniques. These representations capture the essential information about the molecular structure and properties that are relevant to the prediction task, while discarding the irrelevant details. They are often used as input features to downstream machine learning models for tasks such as compound similarity analysis, virtual screening, and de novo drug design. However, generating a good molecular representation through AI is challenging since the representations not only should improve the efficacy of downstream tasks, but also should be explainable enough to demonstrate the underlying mechanism of molecular activities. In this work, we aim to learn interpretable representations so that they can be used for generating actionable insights by domain experts involved in different drug discovery tasks. Specifically, the lower-dimensional representations obtained from any AI based large-scale foundational model are further analyzed using state-of-the art Explainable AI (XAI) techniques to generate explanations related to a particular task. An initial analysis of latent representations obtained from a property prediction model interpreted using concept bottleneck framework with MACCS fingerprints as priors generated useful explanations with promising results.