About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SoCC 2023
Conference paper
Multivariate Anomaly Detection with Domain Clustering
Abstract
Existing time-series anomaly detection (AD) pipelines for cloud monitoring at scale commonly rely on isolated training per cloud service or cloud infrastructure component. However, with the increasing volume of data generated from thousands of services and components, there is an untapped opportunity for a more effective approach to detect key performance indicator (KPI) anomalies by capitalizing on the abundance of data available. In this paper, we propose MADDoC, an unsupervised transfer learning framework for reconstruction based anomaly detection on multivariate timeseries data. We show how to efficiently leverage available KPIs in the realm of cloud infrastructure monitoring to generalize unsupervised time-series AD across infrastructure components. Compared to state-of-the-art approaches relying on isolated component-wise training, the MADDoC framework achieves superior Precision and F1 scores on public and internal time-series AD datasets, by learning a strong reconstruction backbone on the time-series data across many components, before fine-tuning to a specific component. Moreover, MADDoC achieves substantial cost savings in model training, with reductions of 60% to 75% when monitoring thousands of storage infrastructure components. Further, the framework overcomes the trade-off between training efficiency and AD performance of previous AD transfer learning approaches.