Outlier detection by active learning

Naoki Abe; Bianca Zadrozny; John Langford

doi:10.1145/1150402.1150459

KDD 2006

Conference paper

20 Aug 2006

Outlier detection by active learning

View publication

Abstract

Most existing approaches to outlier detection are based on density estimation methods. There are two notable issues with these methods: one is the lack of explanation for outlier flagging decisions, and the other is the relatively high computational requirement. In this paper, we present a novel approach to outlier detection based on classification, in an attempt to address both of these issues. Our approach is based on two key ideas. First, we present a simple reduction of outlier detection to classification, via a procedure that involves applying classification to a labeled data set containing artificially generated examples that play the role of potential outliers. Once the task has been reduced to classification, we then invoke a selective sampling mechanism based on active learning to the reduced classification problem. We empirically evaluate the proposed approach using a number of data sets, and find that our method is superior to other methods based on the same reduction to classification, but using standard classification methods. We also show that it is competitive to the state-of-the-art out-lier detection methods in the literature based on density estimation, while significantly improving the computational complexity and explanatory power. Copyright 2006 ACM.

Talk