Flash: Fast Model Adaptation in ML-Centric Cloud Platforms
Haoran Qiu, Weichao Mao, et al.
MLSys 2024
With the rising popularity of post-quantum cryptographic schemes, realizing practical implementations for real-world applications is still a major challenge. A major bottleneck in such schemes is the fetching and processing of large polynomials in the Number Theoretic Transform (NTT), which makes non Von Neumann paradigms, such as near-memory processing, a viable option. We, therefore, propose a novel near-DRAM NTT accelerator design, called Dramaton. Additionally, we introduce a conflict-free mapping algorithm that enables Dramaton to process large NTTs with minimal hardware overhead using a fixed-permutation network. Dramaton achieves 5-207× speedup in latency over the state-of-the-art and 97× improvement in EDP over a recent near-memory NTT accelerator.
Haoran Qiu, Weichao Mao, et al.
MLSys 2024
Julian Büchel, A. Vasilopoulos, et al.
Nat. Comput. Sci.
Jovan Stojkovic, Tianyin Xu, et al.
HPCA 2023
Marcelo Amaral
OSSEU 2023