Informative prediction based on ordinal questionnaire data
Abstract
Supporting human decision making is a major goal of data mining. The more decision making is critical, the more interpretability is required in the predictive model. This paper proposes a new framework to build a fully interpretable predictive model for questionnaire data, while maintaining high prediction accuracy with regards to the final outcome. Such a model has applications in project risk assessment, in health care, in sentiment analysis and presumably in any real world application that relies on questionnaire data for informative and accurate prediction. Our framework is inspired by models in Item Response Theory (IRT), which were originally developed in psychometrics with applications to standardized tests such as SAT. We first extend these models, which are essentially unsupervised, to the supervised setting. We then derive a distance metric from the trained model to define the informativeness of individual question items. On real-world questionnaire data obtained from information technology projects, we demonstrate the power of this approach in terms of interpretability as well as predictability. To the best of our knowledge, this is the first work that leverages the IRT framework to provide informative and accurate prediction on ordinal questionnaire data.