An invisible watermark to keep tabs on tabular dataResearchKim Martineau19 May 2025Adversarial Robustness and PrivacyAIGenerative AITrustworthy Generation
Teaching AI models to improve themselves ResearchPeter Hess14 Aug 2024AIComputer ScienceExplainable AIGenerative AINatural Language ProcessingTrustworthy AITrustworthy Generation
What is retrieval-augmented generation?ExplainerKim Martineau22 Aug 2023AIExplainable AIGenerative AINatural Language ProcessingTrustworthy Generation
Accelerating molecular optimization with AIDeep DivePayel Das, Samuel Hoffman, Vijil Chenthamarakshan, Kahini Wadhawan, and Pin-Yu Chen08 Feb 202211 minute readAccelerated DiscoveryGenerative AIHealthcareMaterials DiscoveryTrustworthy AITrustworthy Generation
3rd TrustAI Workshop: Building Public Awareness and EngagementMiriam RateikeBrian Mboyaet al.2025DLI 2025
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak AttacksChen XiongXiangyu Qiet al.2025ACL 2025
Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMsMegh ThakkarQuentin Fournieret al.2025ACL 2025
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language ModelsGeorge KourItay Nakashet al.2025ACL 2025
Position: Theory of Mind Benchmarks are Broken for Large Language ModelsMatthew RiemerZahra Ashktorabet al.2025ICML 2025
SPRI: Aligning Large Language Models with Context-Situated PrinciplesHongli ZhanMuneeza Azmatet al.2025ICML 2025