Publication
SoCC 2023
Conference paper

Multivariate Anomaly Detection with Domain Clustering

Abstract

Existing time-series anomaly detection (AD) pipelines for cloud monitoring at scale commonly rely on isolated training per cloud service or cloud infrastructure component. However, with the increasing volume of data generated from thousands of services and components, there is an untapped opportunity for a more effective approach to detect key performance indicator (KPI) anomalies by capitalizing on the abundance of data available. In this paper, we propose MADDoC, an unsupervised transfer learning framework for reconstruction based anomaly detection on multivariate timeseries data. We show how to efficiently leverage available KPIs in the realm of cloud infrastructure monitoring to generalize unsupervised time-series AD across infrastructure components. Compared to state-of-the-art approaches relying on isolated component-wise training, the MADDoC framework achieves superior Precision and F1 scores on public and internal time-series AD datasets, by learning a strong reconstruction backbone on the time-series data across many components, before fine-tuning to a specific component. Moreover, MADDoC achieves substantial cost savings in model training, with reductions of 60% to 75% when monitoring thousands of storage infrastructure components. Further, the framework overcomes the trade-off between training efficiency and AD performance of previous AD transfer learning approaches.

Date

Publication

SoCC 2023