FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers
Abstract
Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients and communication cost, with new challenges including the straggler problem and communication bottleneck. To address these issues, we present FedAT, a novel federated learning system with asynchronous tiers. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler with improved test accuracy. FedAT uses a weighted aggregation heuristic to balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5x, compared to state-of-the-art FL methods.