About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ACS Fall 2021
Talk
POCSTagger: Identifying part-of-chemical-speech with transformers
Abstract
In the quest to build better automatic retrosynthetic tools, the ability to interface artificial intelligence models with a more traditional computational chemistry software becomes of paramount importance. Language-based models for retrosynthesis, like the ones in IBM RXN For Chemistry, output sequences of retrosynthetic steps represented as reaction SMILES. The construction of reaction network using atomistic modelling schemes requires the knowledge of the role of the individual molecules in a reaction equation: the solvent needs to be treated explicitly or implicitly and the catalysts many time undergo peculiar preprocessing/transformations in computational chemistry tools. Manual labeling is laborious and cannot be done for all predicted routes. Therefore an automated process is required. In this context, we developed a part-of-speech tagging AI model based on the BERT transformer architecture used in Natural Language Processing. After pretraining on reaction patent data and subsequently adding a classification layer to the network, our models accurately predict the roles of the different components involved in chemical reactions. This work brings us one step closer to automated validation of synthesis routes.