MatGFN-PFAS: An AI-driven approach for toxic PFAS replacement
Abstract
Per- and polyfluoroalkyl substances (PFAS) represent a ubiquitous class of compounds employed across a wide spectrum of industries and consumer products. Their application is often necessitated by their exceptional properties, including remarkable durability, water-repellent characteristics, and high acidity. However, a significant concern arises from the inherent toxicity associated with many PFAS compounds. Epidemiological studies have linked PFAS exposure to a range of chronic health conditions, encompassing dyslipidemia, cardiometabolic disorders, liver damage, and hypercholesterolemia. Furthermore, the non-degradability of PFAS compounds raises the risk of accumulation within vital organs. In response, public health authorities are actively engaged in regulating the use of PFAS, prompting the quest for viable alternatives. To facilitate the discovery of safer alternatives, we introduce MatGFN-PFAS, an artificial intelligence (AI) system designed for the generation of non-toxic PFAS substitutes. MatGFN-PFAS harnesses Generative Flow Networks (GFlowNets) for molecular generation and leverages a Chemical Language Model (MolFormer) for property prediction. We assess the effectiveness of MatGFN-PFAS through the generation of superacids—molecules possessing negative pKa values. We evaluate MatGFN-PFAS by exploring potential replacements of PFAS superacids, defined as molecules with negative pKa, that are critical for the semiconductor industry. It might be challenging to eliminate PFAS superacids entirely as a class due to the strong constraints on their functional performance. Two distinct design strategies are evaluated: 1) Utilizing Tversky similarity to design molecules akin to target PFAS compounds but with reduced toxicity, and 2) Directly generating molecules with negative pKa values while ensuring low toxicity. To evaluate our proposed approach we selected a set of 6 PFAS SMILES which the LD50 measurement between 50-500 mg/kg (EPA moderate toxicity class) and has the following SMARTS substructures [F,O,CX4]C(F)(F)OC(F)(F)[F,O,CX4]. The resulting datasets for each studied molecule are available at https://ibm.box.com/v/MatGFN-PFAS-generated-datasets.