Spatio-temporal patterns in network events
Abstract
Operational networks typically generate massive monitoring data that consist of local (in both space and time) observations of the status of the networks. It is often hypothesized that such data exhibit both spatial and temporal correlation based on the underlying network topology and time of occurrence; identifying such correlation patterns offers valuable insights into global network phenomena (e.g., fault cascading in communication networks). In this paper we introduce a new class of models suitable for learning, indexing, and identifying spatio-temporal patterns in network monitoring data. We exemplify our techniques with the application of fault diagnosis in enterprise networks. We show how it can help network management systems (NMSes) to effciently detect and localize potential faults (e.g., failure of routing protocols or network equipments) by analyzing massive operational event streams (e.g., alerts, alarms, and metrics). We provide results from extensive experimental studies over real network event and topology datasets to explore the effcacy of our solution. © 2010 ACM.