Publication
ISMB 2024
Poster

Effective In-Silico Gene Perturbation by Machine Learning Model Interpretation for Immunotherapies

Abstract

Functionality understanding of T-cell poses the central problem of immunotherapy for cancers. Recent advancement in the single-cell profiling technology has enabled large-scale accumulation of single-cell RNA sequence data and modern Machine Learning (ML) technologies can provide new approaches to predict complicated functionalities of T-cells. The PRF1 gene encodes perforin, which is highly relevant to immune T cells contributing actively to cytolysis. Our study aims to identify transcription factors (TFs) responsible for regulating the PRF1 expression using ML model and to contribute to engineering T cells. The proposed pipeline consists of three main steps: gene expression prediction using Gradient Boosting model, SHAP based model explanation for finding key TFs and gene perturbation observation. For the implementation of our proposed method, we have utilized a public Tumor Infiltrating Lymphocyte (TIL) dataset (GSE156728). The experimental results successfully find out key TFs and prove the effectiveness of our proposed method by obtaining expected PRF1 value through counterfactual inference. SHAP values obtained from the trained Gradient Boosting model can provide cell-by-cell gene attributes describing the contributions of TFs for predicting the target gene. From this SHAP based analysis, we have summarized top 5 TFs responsible for the PRF1 prediction which are ZEB2, BHLHE40, FOS, JUNB and TCF7. Through the perturbation experiment with these TFs, we have found that maximum increment of PRF1 value has been obtained for TF ZEB2 which is about 19% from the originally predicted value and maximum reduction (about 17%) of predicted PRF1 value has been obtained for TF TCF7.