ECS2: A Fast Erasure Coding Library for GPU-Accelerated Storage Systems with Parallel Direct IO
Abstract
As data volume keeps increasing at a rapid rate, there is an urgent need for large, reliable, and cost-effective storage systems. Erasure coding has drawn increasing attention because of its ability to ensure data reliability with higher storage efficiency, and it has been widely adopted in many distributed and large-scale storage systems, such as Azure cloud storage and HDFS. However, the storage efficiency of erasure code comes at the price of higher computing complexity. While many studies have shown the coding computations can be significantly accelerated using GPU, the overhead of data transfer between storage devices and GPUs become a new performance bottleneck. In this work, we designed and implemented, ECS2, a fast erasure coding library on GPU-accelerated storage to let users enhance their data protection with transparent IO performance and file system like programming interface. By taking advantage of the latest GPUDirect technology supported on Nvidia GPU, our library is able to bypass CPU and host memory copy from the IO path, so that both the computing and IO overhead from coding can be minimized. Using synthetic IO workload based on real storage system trace, we show that the IO latency can be reduced by 10% ∼ 20% with GPUDirect technology, and the overall IO throughput of a storage system can be improved up to 70%.