Slicing Mutual Information Generalization Bounds for Neural NetworksKimia NadjahiKristjan Greenewaldet al.2024ICML 2024
What Would Gauss Say About Representations? Probing Pretrained Image Models using Synthetic Gaussian BenchmarksIrene KoPin-Yu Chenet al.2024ICML 2024
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic PromptsZhi-yi ChinChieh-ming Jianget al.2024ICML 2024
CharED: Character-wise Ensemble Decoding for Large Language ModelsKevin GuEva Tueckeet al.2024ICML 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMsSwanand Ravindra KadheFarhan Ahmedet al.2024ICML 2024
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven ArgumentationTomas Bueno MomcilovicBeat Buesseret al.2024xAI 2024
AUTOLYCUS: Exploiting Explainable Artificial Intelligence (XAI) for Model Extraction Attacks against Interpretable ModelsAbdullah Caglar OksuzAnisa Halimiet al.2024PETS 2024
Identifying Homogeneous and Interpretable Groups for Conformal PredictionNatalia Martinez GilDhaval Patelet al.2024UAI 2024
Exploring Vulnerabilities in LLMs: A Red Teaming Approach to Evaluate Social BiasYuya Jeremy OngJay Pankaj Galaet al.2024IEEE CISOSE 2024
Quantifying Representation Reliability in Self-Supervised Learning ModelsYoung Jin ParkHao Wanget al.2024UAI 2024