Seung Won Min, Vikram Sharma Mailthody, et al.
VLDB
Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20%.
Seung Won Min, Vikram Sharma Mailthody, et al.
VLDB
Sitao Huang, Carl Pearson, et al.
HPEC 2019
Seung Won Min, Kun Wu, et al.
VLDB
Ashutosh Dhar, Sitao Huang, et al.
ISVLSI 2019