Low-rank sparse feature selection for patient similarity learning
Abstract
Comparing and identifying similar patients is a fundamental task in medical domains -An efficient technique can, for example, help doctors to track patient cohorts, compare the effectiveness of treatments, or predict medical outcomes. The goal of patient similarity learning is to derive a clinically meaningful measure to evaluate the similarity amongst patients represented by their key clinical indicators. However, it is challenging to learn such similarity, as medical data are usually high dimensional, heterogeneous, and complex. In addition, a desirable patient similarity is dependent on particular clinical settings, which implies supervised learning scheme is more useful in medical domains. To address these, in this paper we present a novel similarity learning approach formulated as the generalized Mahalanobis similarity function with pairwise constraints. Considering there always exists some features non-discriminative and contains redundant information, we encode a low-rank structure to our similarity function to perform feature selection. We evaluate the proposed model on both UCI benchmarks and a real clinical dataset for several medical tasks, including patient retrieval, classification, and cohort discovery. The results show that our similarity model significantly outperforms many state-of-The-Art baselines, and is effective at removing noisy or redundant features.