NLPeople at L+M-24 Shared Task: An Ensembled Approach for Molecule Captioning from SMILES
Abstract
This paper presents our approach submitted to the Language + Molecules 2024 $(\textit{L+M-24})$ Shared Task in the Molecular Captioning track. The task involves generating captions that describe the properties of molecules that are provided in SMILES format. We propose a method for the task that decomposes the challenge of generating captions from SMILES into a classification problem, where we first predict the molecule's properties. The molecules whose properties can be predicted with high accuracy show high translation metric scores in the caption generation by LLMs, while others produce low scores. Then we use the predicted properties to select the captions generated by different types of LLMs, and use that prediction as the final output. Our submission achieved an overall increase score of 15.21 on the dev set and 12.30 on the evaluation set, based on translation metrics and property metrics from the baseline.