Overdriver: Handling memory overload in an oversubscribed cloud
Abstract
With the intense competition between cloud providers, oversubscription is increasingly important to maintain profitability. Oversubscribing physical resources is not without consequences: it increases the likelihood of overload. Memory overload is particularly damaging. Contrary to traditional views, we analyze current data center logs and realistic Web workloads to show that overload is largely transient: up to 88.1% of overloads last for less than 2 minutes. Regarding overload as a continuum that includes both transient and sustained overloads of various durations points us to consider mitigation approaches also as a continuum, complete with tradeoffs with respect to application performance and data center overhead. In particular, heavyweight techniques, like VM migration, are better suited to sustained overloads, whereas lightweight approaches, like network memory, are better suited to transient overloads. We present Overdriver, a system that adaptively takes advantage of these tradeoffs, mitigating all overloads within 8% of well-provisioned performance. Furthermore, under reasonable oversubscription ratios, where transient overload constitutes the vast majority of overloads, Overdriver requires 15% of the excess space and generates a factor of four less network traffic than a migration-only approach. Copyright © 2011 ACM.