Accurate multi-endpoint molecular toxicity predictions in humans with contrastive explanations
Abstract
Explainable machine learning (XML) for molecular toxicity prediction appears promising for efficient drug development and drug safety. A predictive ML model of toxicity can reduce experimental cost and time, while also mitigating ethical concerns by significantly reducing animal and human testing. In this work, we use a deep learning framework for modeling in vitro, in vivo and human toxicity data simultaneously. Two different input representations of molecules are considered: Morgan fingerprints and pretrained SMILES embeddings. A multi-task deep learning model accurately predicts toxicity for all endpoints including in humans, as indicated by the area under the Receiver Operator Characteristic curve and balanced accuracy. To provide confidence and explain the model’s predictions, we adapt a post-hoc contrastive explanation method that returns pertinent positive and negative features. Resultant pertinent features correspond well to known mutagenic and reactive toxicophores, such as unsubstituted bonded heteroatoms, aromatic amines, and Michael receptors. Toxicophore recovery by pertinent feature analysis captures more in vitro and in vivo endpoints, and indeed uncovers a bias in known toxicophore data towards in vitro and in vivo experimental data, thus demonstrating the efficacy of our proposed approach.