Reducing memory ordering overheads in software transactional memory
Abstract
Most research into high-performance software transactional memory (STM) assumes that transactions will run on a processor with a relatively strict memory model, such as Total Store Ordering (TSO). To execute these algorithms correctly on processors with relaxed memory models, explicit fence instructions may be required on every transactional access, and neither the processor nor the compiler may be able to safely reorder transactional reads. The overheads of fence instructions and read serialization are a significant but unstudied source of latency for STM, with impact on the tradeoffs among different STM systems and on the optimizations that may be possible for any given system. Straightforward ports of STM runtimes from strict to relaxed machines may fail to realize the latter's performance potential. We explore the implementation of STM for machines with relaxed memory consistency using two recent high-performance STM systems. We propose compiler optimizations that can safely eliminate many fence instructions. Using these techniques, we obtain a reduction of up to 89% in the number of fences, and 20% in per-transaction latency, for common transactional benchmarks. © 2009 IEEE.