PaTH Attention: Position Encoding via Accumulating Householder TransformationsSonglin YangYikang Shenet al.2025NeurIPS 2025
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth StudyShawn TanSonglin Yanget al.2025ICLR 2025