Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak AttacksChen XiongXiangyu Qiet al.2025ACL 2025
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!Xiangyu QiYi Zenget al.2024ICLR 2024