Accelerate Polymer Membrane Discovery for CO2 Separation with Graph-Based Generative Model and Molecular Dynamics Simulation
Abstract
Polymer membranes are potent materials for exhaust gas separation in energy production and transportation. Separating off carbon dioxide (CO2) from flue gases could help to mitigate climate change which is mainly driven by greenhouse gas emissions. Due to the trade-off between a membrane’s CO2 permeability and its selectivity with regards to CO2 and nitrogen (N2), and because of the harsh operation conditions, only very few polymers are actually used at industrial scale. In this study, consider the complexity of the process conditions and complicated interaction when flue gases permeate the membrane, we develop a simplified end-to-end framework and target to homopolymer type membranes first. Powered by the less data hungry artificial intelligence algorithms and physical validation, a similar trend compared to the literature values is observed. The results and challenges which include small number of the dataset, molecule generation efficiency, and validation with molecular dynamics (MD) are discussed. The dataset is mostly acquired from the review article [1] and then converted to their corresponding SMILES. The target is to find membrane materials perform superior to Robeson's 2008 upper bound. Within the selected 150 samples none of them perform over the boundary. With the small data size, we tested the performance of three linear models (Lasso, Ridge and elastic net) and three non-linear models (random forest, kernel ridge and support vector machine). By performing grid search the best hyperparameter sets are configured automatically. For the linear models, we selected important features based on the Lasso penalty available in the scikit-learn library. For the non-linear models, we performed a combination of greedy search with local search that attempts to bypass the local optimum of the R^2 score. This benefits to the final CV scores improve at least 0.1 and finally random forest is used for doing structure generation. In the generation step, our strategy is to involve prescribed sub-structures embed in those samples close to the boundary rather than decided by exhaustive generation algorithm. The newly generated SMILES are clustered by Murcko scaffolds first and those predicted over the boundary are selected for MD validation. The newly generated structures lack of clear identification for the atom charge and bond type information. The purpose of the MD validation is to efficiently observe the trend rather than to reproduce the literature values. An automation process is developed to convert generated SMILES to polymer slabs with MD input format. Since most experimental observations are performed at constant temperature and pressure, we realize the permeability calculation with a isothermal-isobaric (NPT) based methodology in which constant pressure difference is kept during the gas injection. According to the benchmark results under available options, polymer slabs with 800 heavy atoms per chain, 6 nm in thickness, and DREDING force field are selected. The benchmark results shows that the CO2 permeability is in the same order between predicted and calculated values. Instead, the selectivity is less comparable to the original values which is because that N2 permeability usually 10 to 40-fold smaller and a long enough simulation time (ex: 100 ns) is necessary. The overall fitting results we have obtained are encouraging, considering the limitations of the training dataset and the membrane formation method. The automated process allows to put innovation on structure generation and MD methodology. Based on the results obtained, we are now analyzing the newly generated structures with highest predicted CO2 permeability. The inclusion of additional physical parameters such as adsorption energy, free volume, and gyration radius in the optimization could help further improve our material discovery results for polymer separation membranes. [1] M. Songolzadeh, et. al, The Scientific World Journal, Vol. 2014, Article ID 828131