Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
Jie Ren, Zhenwei Dai, et al.
NeurIPS 2025
Predicting chemical hazard indicators for substances of concern (SoCs), such as their persistence, bioaccumulation, and toxicity (PBT), is a critical task in environmental science and chemical regulatory compliance. Existing approaches rely heavily on molecular structural representations such as SMILES, which are often unavailable in early-stage assessments, in legacy documentation, or are inadequate for structurally representing the diversity of compounds encountered for regulation tasks. This paper addresses the challenge of estimating PBT properties from partial, noisy, and unstructured natural language descriptions of SoCs, such as their physical appearance, melting point, industrial use, and other general characteristics. We propose a new framework that leverages the generalization capabilities of Large Language Models (LLMs) to infer PBT profiles from these textual descriptions. Our key contributions include the development of the first dataset of natural language descriptions paired with PBT hazard categories and a fine-tuned LLM pipeline capable of generating hazard assessments. Experimental results show that our approach achieves competitive performance compared to structure-based models, enabling early hazard screening in low- or incomplete-data scenarios.
Jie Ren, Zhenwei Dai, et al.
NeurIPS 2025
Tian Gao, Amit Dhurandhar, et al.
NeurIPS 2025
Vidushi Sharma, Andy Tek, et al.
NeurIPS 2025
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010