Data-enhanced predictive modeling for sales targeting
Abstract
We describe and analyze the idea of data-enhanced predictive modeling (DEM). The term "enhanced" here refers to the case that the data used for modeling is sampled not from the true target population, but from an alternative (closely related) population, from which much larger samples are available. This leads to a "bias-variance" tradeoff, which implies that in some cases, DEM can improve predictive performance on the true target population. We theoretically analyze this tradeoff for the case of linear regression. We illustrate DEM on a problem of sales targeting for a set of software products. The "correct" learning problem is to differentiate non-customers from newly acquired customers. The latter, however, are scarce. We illustrate how we can build better prediction models by using more flexible definitions of interesting targets, which give bigger learning samples.