ExaPlan Archive: Data placement and provisioning for large storage systems with archival tiers
Abstract
Many important big data use cases do not require data to be instantly available. Examples are video recordings in TV and film industry, surveillance videos and data from scientific experiments. Archiving such data to high-latency media storage, such as tape and optical disk libraries, results in significant cost savings. In this context, data is accessed by first staging it to low-latency media. However, archiving and staging operations incur additional device and bandwidth costs for both the active and archiving tiers, and might impact user data access performance. For instance, in terms of cost and performance, it is often suboptimal to archive all the data. This paper presents ExaPlan Archive, a scheme to determine the data placement and number of devices required in each tier of a multitiered storage system comprised of archival and active tiers that minimize the latency of the active tiers under budget and staging-time constraints. The efficiency of the proposed optimized archiving scheme is compared with an existing scheme that optimizes multitier storage with only direct-access tiers. The two schemes are evaluated using a staging workload of LOFAR radio telescopes long-term archive for astronomical observation data.