A Framework for Toxic PFAS Replacement based on GFlowNet and Chemical Foundation Model
Abstract
Per- and polyfluoroalkyl substances (PFAS) are a broad class of molecules used in almost every sector of industry and consumer goods. PFAS exhibit highly desirable properties such as high durability, water repellance or high acidity, that are difficult to match. As a side effect, PFAS persist in the environment and have detrimental effect on human health. Epidemiological research has linked PFAS exposure to chronic health conditions, including dyslipidemia, cardiometabolic disorders, liver damage, and hypercholesterolemia. Recently, public health agencies significantly strengthed regulations on the use of PFAS. Therefore, alternatives are needed to maintain the pace of technological developments in multiple areas that traditionally relied on PFAS. To support the discovery of alternatives, we introduce MatGFN-PFAS, an AI system that generates PFAS replacements. We build MatGFN-PFAS using Generative Flow Networks (GFlowNets) for generation and a Chemical Language Model (MolFormer) for property prediction. We evaluate MatGFN-PFAS by exploring potential replacements of PFAS superacids, defined as molecules with negative pKa, that are critical for the semiconductor industry. It might be challenging to eliminate PFAS superacids entirely as a class due to the strong constraints on their functional performance. The proposed approach aims to account for this possibility and enables the generation of safer PFAS superacids as well. We evaluate two design strategies: 1) Using Tversky similarity to design molecules similar to a target PFAS but with lower toxicity and 2) Directly generating molecules with negative pKa and low toxicity. For the given query SMILE CC1CC(CC(F)(F)C(F)(F)OC(F)(F)C(F)(F)S(=O)(=O)O)OC1=O, the MatGFN-PFAS system was able to generate a candidate with very low toxicity, LD50 = 7304.23, strong acidity, pKa = -1.92, and high similarity score, 89.32%, to the query molecule. Results demonstrated that the proposed MatGFN-PFAS was able to consistently generate replacement molecules following all the constraints forehead mentioned. The resulting datasets for each studied molecule are available at anonymized.