VM-μcheckpoint: Design, modeling, and assessment of lightweight in-memory VM checkpointing
Abstract
Checkpointing and rollback techniques enhance reliability and availability of virtual machines and their hosted IT services. This paper proposes VM-μCheckpoint, a light-weight pure-software mechanism for high-frequency checkpointing and rapid recovery for VMs. Compared with existing techniques of VM checkpointing, VM-μCheckpoint tries to minimize checkpoint overhead and speed up recovery by means of copy-on-write, dirty-page prediction and in-place recovery, as well as saving incremental checkpoints in volatile memory. Moreover, VM-μCheckpoint deals with the issue that latency in error detection potentially results in corrupted checkpoints, particularly when checkpointing frequency is high. We also constructed Markov models to study the availability improvements provided by VM-μCheckpoint (from 99 to 99.98 percent on reasonably reliable hypervisors). We designed and implemented VM-μCheckpoint in the Xen VMM. The evaluation results demonstrate that VM-μCheckpoint incurs an average of 6.3 percent overhead (in terms of program execution time) for 50 ms checkpoint intervals when executing the SPEC CINT 2006 benchmark. Error injection experiments demonstrate that VM-μCheckpoint, combined with error detection techniques in RMK, provides high coverage of recovery.