ImageElves: Rapid and reliable system updates in the cloud
Abstract
Virtualization has significantly reduced the cost of creating a new virtual machine and cheap storage allows VMs to be turned down when unused. This has led to a rapid proliferation of virtual machine images, both active and dormant, in the data center. System management technologies have not been able to keep pace with this growth and the management cost of keeping all virtual machines images, active as well as dormant, updated is significant. In this work, we present ImageElves, a system to rapidly, reliably and automatically propagate updates (e.g., patches, software installs, compliance checks) in a data center. ImageElves analyses all target images and creates reliable image patches using a very small number of online updates. Traditionally, updates are applied by taking the application offline, applying updates, and then restoring the application, a process that is unreliable and has an unpredictable downtime. With ImageElves, we propose a two phase process. In the first phase, images are analyzed to create an update signature and update manifest. In the second phase, downtime is taken and the manifest is applied offline on virtual images in a parallel, reliable and automated manner. This has two main advantages, (i) spontaneously apply updates to already dormant VMs, and (ii) all updates following this process are guaranteed to work reliably leading to reduced and predictable downtimes. ImageElves uses three key ideas: (i) a novel per-update profiling mechanism to divide VMs into equivalence classes, (ii) a background logging mechanism to convert updates on live instances into patches for dormant images, and (iii) a cross-difference mechanism to filter system-specific or random information (e.g., host name, IP address), while creating equivalence classes. We evaluated the ability of ImageElves to speed up mix of popular system management activities and observed upto 80% smaller update times for active instances and upto 90% reduction in update time for dormant instances. © 2013 IEEE.