Combinatorial sketching for finite programs
Armando Solar-Lezama, Liviu Tancau, et al.
ASPLOS 2006
Due to the memory intensive workload and the erratic access pattern, irregular graph algorithms are notoriously hard to implement and optimize for high performance on distributed-memory systems. Although the PGAS paradigm proposed recently improves ease of programming, no high performance PGAS implementation of large-scale graph analysis is known. We present the first fast PGAS implementation of graph algorithms for the connected components and minimum spanning tree problems. By improving memory access locality, compared with the naive implementation, our implementation exhibits much better communication efficiency and cache performance on a cluster of SMPs. With additional algorithmic and PGASspecific optimizations, our implementation achieves significant speedups over both the best sequential implementation and the best single-node SMP implementation for large, sparse graphs with more than a billion edges. © 2010 IEEE.
Armando Solar-Lezama, Liviu Tancau, et al.
ASPLOS 2006
Guojing Cong, David A. Bader
Journal of Parallel and Distributed Computing
Jianpeng Cheng, Siva Reddy, et al.
ACL 2017
Guojing Cong, Konstantin Makarychev
IPDPS 2011