Non-intrusive, out-of-band and out-of-the-box systems monitoring in the cloud
Abstract
The dramatic proliferation of virtual machines (VMs) in datacenters and the highly-dynamic and transient nature of VM provisioning has revolutionized datacenter operations. However, the management of these environments is still carried out using re-purposed versions of traditional agents, originally developed for managing physical systems, or most recently via newer virtualization-aware alternatives that require guest cooperation and accessibility. We show that these existing approaches are a poor match for monitoring and managing (virtual) systems in the cloud due to their dependence on guest cooperation and operational health, and their growing lifecycle management overheads in the cloud. In this work, we first present Near Field Monitoring (NFM), our non-intrusive, out-of-band cloud monitoring and analytics approach that is designed based on cloud operation principles and to address the limitations of existing techniques. NFM decouples system execution from monitoring and analytics functions by pushing monitoring out of the targets systems' scope. By leveraging and extending VM introspection techniques, our framework provides simple, standard interfaces to monitor running systems in the cloud that require no guest cooperation or modification, and have minimal effect on guest execution. By decoupling monitoring and analytics from target system context, NFM provides "always-on" monitoring, even when the target system is unresponsive. NFM also works "out-of-the-box" for any cloud instance as it eliminates any need for installing and maintaining agents or hooks in the monitored systems. We describe the end-toend implementation of our framework with two real-system prototypes based on two virtualization platforms. We discuss the new cloud analytics opportunities enabled by our decoupled execution, monitoring and analytics architecture. We present four applications that are built on top of our framework and show their use for across-time and acrosssystem analytics. Copyright © 2014 ACM.