Filtering failure logs for a BlueGene/L prototype
Yinglung Liang, Yanyong Zhang, et al.
DSN 2005
This article introduces YaConv, a new algorithm to compute convolution using GEMM microkernels from a Basic Linear Algebra Subprograms library that is efficient for multiple CPU architectures. Previous approaches either create a copy of each image element for each filter element or reload these elements into cache for each GEMM call, leading to redundant instances of the image elements in cache. Instead, YaConv loads each image element once into the cache and maximizes the reuse of these elements. The output image is computed by scattering results of the GEMM microkernel calls to the correct locations in the output image. The main advantage of this new algorithm - which leads to better performance in comparison to the existing im2col approach on several architectures - is a more efficient use of the memory hierarchy. The experimental evaluation on convolutional layers from PyTorch, along with a parameterized study, indicates an average 24% speedup over im2col convolution. Increased performance comes as a result of 3× reduction in L3 cache accesses and 2× fewer branch instructions.
Yinglung Liang, Yanyong Zhang, et al.
DSN 2005
Yanyong Zhang, Hubertus Franke, et al.
IEEE TPDS
Yariv Aridor, Tamar Domany, et al.
JSSPP 2004
Bronis R. de Supinski, Martin Schulz, et al.
International Journal of High Performance Computing Applications