Unbiased Quantized Congestion Notification for Scalable Server Fabrics
Abstract
Ethernet is the predominant layer-2 networking technology in the datacenter, and it is evolving into an economical alternative for high-performance computing clusters. Ethernet traditionally drops packets in the event of congestion, but IEEE introduced lossless class services to enable the convergence of storage and IP networks. Losslessness is a simple, well-known concept, but its application in datacenters is hampered by the fear of ensuing saturation trees. In this article, the authors aim to accelerate the deployment of Quantized Congestion Notification (QCN). In particular, they first eliminate the intrinsic unfairness of QCN under typical fan-in scenarios by installing the congestion points at inputs, instead of at outputs as standard QCN does. They then demonstrate that QCN at input buffers cannot always discriminate between culprit and victim flows. To overcome this limitation, they propose a novel QCN-compatible marking scheme called "occupancy sampling." They have implemented these methods in a server-rack fabric with 640 100-Gigabit Ethernet ports.