A two-level checkpointing framework for RTM/FWI for GPUs in heterogeneous memory systems
Abstract
Reverse Time Migration (RTM) and Full Waveform Inversion (FWI) are some of the most critical and intensive algorithms in the processing workflow. They involve temporal cross-correlation of forward and adjoint states at the same time and, therefore require saving the forward states in memory. Checkpointing is implemented to trade memory usage with data movement and computations. The increased data movement is specially detrimental to the performance of Graphical Processing Units (GPU) where data transfers are much slower compared to compute. Moreover, limited GPU memory necessitates more frequent transfers and effective GPU utilization is lowered because GPU waits to finish data copy before resuming computing. This lowers their effective performance when solving adjoint problem and delays the time-to-solution of RTM/FWI workflows. We propose a two-level checkpoint formulation for GPUs using asynchronous compute and Non Volatile Memory Express (NVMe) systems which hides all data movement overhead and enables continuous GPU usage without waiting for data transfer. The parameters of the checkpointing formulation are generalizable to multiple system and any RTM/FWI formulations using bandwidth and throughput values. Implementing optimized data transfer approaches leads to faster compute time with increased GPU utilization. We demonstrate our results using an acoustic RTM formulation.