Martin Bichler, Jayant Kalagnanam, et al.
IBM Systems Journal
We describe and analyze the idea of data-enhanced predictive modeling (DEM). The term "enhanced" here refers to the case that the data used for modeling is sampled not from the true target population, but from an alternative (closely related) population, from which much larger samples are available. This leads to a "bias-variance" tradeoff, which implies that in some cases, DEM can improve predictive performance on the true target population. We theoretically analyze this tradeoff for the case of linear regression. We illustrate DEM on a problem of sales targeting for a set of software products. The "correct" learning problem is to differentiate non-customers from newly acquired customers. The latter, however, are scarce. We illustrate how we can build better prediction models by using more flexible definitions of interesting targets, which give bigger learning samples.
Martin Bichler, Jayant Kalagnanam, et al.
IBM Systems Journal
Dan Zhang, Yan Liu, et al.
NeurIPS 2011
Aurélie C. Lozano, Naoki Abe, et al.
KDD 2009
Claudia Perlich, Saharon Rosset, et al.
KDD 2007