About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ISMB 2024
Poster
Cross Dataset Verification of Cell Type Annotation using a Transcriptomic Biomedical Foundation Model: Inflammatory Bowel Disease Use Case
Abstract
Computational cell type annotation is an essential task for efficiently understating single-cell sequencing data. With advanced machine learning technologies, accurate cell type annotation has been possible by learning cell-type classification models. Transcriptomic foundation models incorporating large public sequencing data and using a variation of BERT have been developed to learn gene expression representation. Such models exhibit high-performance annotation because the learned representation retains essential information from large genetic repositories. Following re-training allows using rather small data associated with cell labels to effectively predict cell types in unannotated datasets. This study provides a cell annotation pipeline for Inflammatory Bowel Disease (IBD) using Biomedical Foundation Model (BMFM) based on scBERT (a variant of BERT). A model is first pre-trained using huge transcriptomic data followed by re-training with a single IBD dataset to predict its cell types. Our re-trained model can subsequently be used to annotate new IBD transcriptomic datasets. Using this pipeline, we examined how cell-types are properly annotated. With a pre-trained models using Panglao DB (contains 1 million cells in various conditions), an IBD dataset of SCP1884 (700K cells) is annotated by cell-type re-trained model by another IBD dataset of SCP259 (360K cells), which gives a cell-type mapping from SCP259 to SCP1884. Examination of the mapping found high concordance and showed that 69% (47 of 68 SCP1884 cell types) are appropriately predicted with corresponding cell type in three lineages (epithelial, immune, and stroma). Our results indicate BMFM-based annotation approach effectively helps understand a huge variation of IBD cells.