About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
J. of Opt. Comm. and Netw.
Paper
Toward higher-radix switches with co-packaged optics for improved network locality in data center and HPC networks [Invited]
Abstract
We study the network locality improvements that can be achieved by using co-packaged optics in data center and high-performance computing (HPC) networks. The increased escape bandwidth offered by co-packaged optics can enable switches with speeds of 51.2 Tb/s and beyond. From a network architecture perspective, the key advantages of introducing co-packaged optics at the switch points include the implementation of large-scale topologies of 12,000 end points with 4Ã - higher bisection bandwidth and the reduction of the required number of switches by >40% compared with state-of-the-art approaches. From a network operation perspective, improved network locality and faster operation can be achieved since the higher-radix switches can mitigate the impact of network contention. Placing applications under fewer leaf switches reduces the number of packets that cross the spine switches in a leaf-spine topology. The proposed scheme is evaluated via discrete-event simulations: we initially evaluate the network locality properties of the system by using virtual-machine traces from a production data center, and we subsequently quantify the performance improvements by simulating an all-to-all pattern for a variety of message sizes over a number of nodes. The results suggest that co-packaged optics form a promising solution for keeping up with bandwidth scaling in future networks. The virtual-machine analysis shows that large-scale applications can be placed under up to 50% fewer first-level switches, while the network analysis shows speedups of up to 7.1, which translates to execution time reductions of up to 26% and 42.7% for applications with communication ratios of 0.3 and 0.5, respectively.