Abstract
Unlike traditional supervised learning, in many settings only partial feedback is available. Such settings encompass a wide variety of applications including pricing, online marketing and precision medicine. We approach this task as a domain adaptation problem and propose a self-training algorithm which imputes outcomes with finite discrete values for finite unseen actions in the observational data to simulate a randomized trial. We offer a theoretical motivation for this approach by providing an upper bound on the generalization error defined on a randomized trial under the self-training objective. We empirically demonstrate the effectiveness of the proposed algorithms.