Predicting protein phosphorylation from gene expression: Top methods from the IMPROVER Species Translation Challenge
Abstract
Motivation: Using gene expression to infer changes in protein phosphorylation levels induced in cells by various stimuli is an outstanding problem. The intra-species protein phosphorylation challenge organized by the IMPROVER consortium provided the framework to identify the best approaches to address this issue. Results: Rat lung epithelial cells were treated with 52 stimuli, and gene expression and phosphorylation levels were measured. Competing teams used gene expression data from 26 stimuli to develop protein phosphorylation predictionmodels and were ranked based on prediction performance for the remaining 26 stimuli. Three teams were tied in first place in this challenge achieving a balanced accuracy of about 70%, indicating that gene expression is only moderately predictive of protein phosphorylation. In spite of the similar performance, the approaches used by these three teams, described in detail in this article, were different, with the average number of predictor genes per phosphoprotein used by the teams ranging from 3 to 124. However, a significant overlap of gene signatures between teams was observed for the majority of the proteins considered, while Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were enriched in the union of the predictor genes of the three teams for multiple proteins.