Biomedical Foundation Models

Exploring the power of BMFM technologies to drive critical tasks in drug discovery

Overview

Learning a molecular language for protein interactions is crucial for advancing drug discovery. Foundation models, trained on diverse biomedical data like antibody-antigen interactions and small molecule-protein interactions, are transforming this field. Unlike traditional computational approaches, they widen the search scope for novel molecules and refine it to eliminate unsuitable ones, emphasizing the detailed nuances in molecular structure and dynamics.

IBM Research biomedical foundation model (BMFM) technologies leverage multi-modal data of different types, including drug-like small molecules and proteins (covering a total of more than a billion molecules), as well as single-cell RNA sequence and other biomedical data.

Our research team has a diverse range of expertise, including computational chemistry, medicinal chemistry, artificial intelligence, computational biology, physical sciences, and biomedical informatics.

Our BMFM Technologies currently cover the following three domains:

BMFM_Technologies_for_Drug_Discovery (1).png

Foundation models for Targets Discovery

Targets discovery models learn the representation of DNA, bulk RNA, single-cell RNA expression data and other cell level signaling information for the identification of novel diagnostic and therapeutic targets, allowing tasks such as cell type annotation and classification, gene perturbation prediction, disease state prediction, splice variants prediction, promoter region, and treatment response.

Foundation Models for Biologics Discovery

Biologics discovery models focus on biologic therapeutics discovery, with the goal of leveraging large-scale representations of protein sequences, structures, and dynamics for diverse downstream tasks associated with multiple biologics modalities. These models produce unified representations of biological molecular entities, integrating data such as protein sequences, protein complex structures, and protein-protein complex binding free energies into a single framework. These models can serve as the basis for diverse downstream tasks in therapeutic design, including candidate generation and assessment, across antibody, TCR, vaccine, and other modalities.

Foundation Models for Small Molecules Discovery

Small molecule foundation models are capable of supporting a wide range of downstream predictive and generative tasks in drug discovery. Molecules can be represented in various forms—including strings, graphs, and three-dimensional conformations. Our models are trained on multiple molecular representations to learn rich, low-dimensional embeddings that capture key biochemical features relevant to drug discovery. For predictive tasks, we adopt a flexible strategy that aggregates multiple transformer-based molecular view encoders. This approach yields versatile embeddings that maximize mutual information across different representations. For generative tasks, we employ diffusion-based denoising networks to design molecules optimized for specific properties. In line with our multi-view framework, generated molecules can be represented either as strings or as three-dimensional structures within a protein binding pocket.

Publications

Using Agentic AI systems to harness multi-omics foundation models for accelerating biomedical discovery
- - Ritesh Krishna
  - Laura Gardiner
  - et al.
- 2025
- AI x Bio 2025
Single-microglia transcriptomic transition network-based prediction and real-world patient data validation identifies ketorolac as a repurposable drug for Alzheimer's disease
- - Jielin Xu
  - Wenqiang Song
  - et al.
- 2024
- Alzheimer's and Dementia
A molecular video-derived foundation model for scientific drug discovery
- - Hongxin Xiang
  - Li Zeng
  - et al.
- 2024
- Nature Communications
Artificial intelligence and open science in discovery of disease-modifying medicines for Alzheimer's disease
- - Feixiong Cheng
  - Fei Wang
  - et al.
- 2024
- Cell Reports Medicine
Large-scale chemical language representations capture molecular structure and properties
- - Jerret Ross
  - Brian Belgodere
  - et al.
- 2022
- Nature Machine Intelligence
Deep generative molecular design reshapes drug discovery
- - Xiangxiang Zeng
  - Fei Wang
  - et al.
- 2022
- Cell Reports Medicine
Decoding CAR T cell phenotype using combinatorial signaling motif libraries and machine learning
- - Kyle Daniels
  - Shangying Wang
  - et al.
- 2022
- Science
On the Choice of Active Site Sequences for Kinase-Ligand Affinity Prediction
- - Jannis Born
  - Yoel Shoshan
  - et al.
- 2022
- J. Chem. Inf. Model.

Resources

Blog Post

Boehringer Ingelheim and IBM Collaborate to Advance Generative AI and Foundation Models for Therapeutic Antibody Development

IBM Newsroom28 Nov 2023

Contributors

Biomedical Foundation Models

Overview

Foundation models for Targets Discovery

Foundation Models for Biologics Discovery

Foundation Models for Small Molecules Discovery

Related Links and Activities

Open Source Models and Tools

biomed.sm.mv-te-84m is a multi-modal, multi-view model trained on small molecules data.

biomed.omics.bl.sm.ma-ted-458m is a multi-aligned sequence-based multi-domain model trained on biologics, small molecules, and scRNA-seq data.

biomed.multi.omic is a framework that unifies diverse transcriptomic foundation models with various pretraining and fine-tuning objectives within a single framework.

Biomedical AI Tools and Methods are also available online.

AI Alliance working group: AI for Drug Discovery

Cleveland Clinic and IBM Discovery Accelerator

Scientific Conferences

Publications

Using Agentic AI systems to harness multi-omics foundation models for accelerating biomedical discovery

Single-microglia transcriptomic transition network-based prediction and real-world patient data validation identifies ketorolac as a repurposable drug for Alzheimer's disease

A molecular video-derived foundation model for scientific drug discovery

Artificial intelligence and open science in discovery of disease-modifying medicines for Alzheimer's disease

Large-scale chemical language representations capture molecular structure and properties

Deep generative molecular design reshapes drug discovery

Decoding CAR T cell phenotype using combinatorial signaling motif libraries and machine learning

On the Choice of Active Site Sequences for Kinase-Ligand Affinity Prediction

Resources

IBM Research open sources biomedical foundation models

AI for Drug Discovery Open Innovation Forum

Cleveland Clinic and IBM Researchers Publish Findings on Artificial Intelligence and Immunity

Boehringer Ingelheim and IBM Collaborate to Advance Generative AI and Foundation Models for Therapeutic Antibody Development

Contributors

Jianying Hu

Michal Rosen-Zvi

Wendy Cornell

James Kozloski

Yishai Shimoni

Efrat Hexter