Disambiguation of product expense data for carbon emission estimation
Abstract
Estimating the carbon emission embodied in the supply chain product is imperative for understanding the climate impact and its actions. The US Environmentally Extended Input-Output (USEEIO) model provides a carbon emission factor per dollar spend for a set of industry sectors. This provides an opportunity to classify the product transaction data into one of the predefined USEEIO industry classes. But the presence of acronym and the limited words in the transaction data poses a challenge in mapping an expense data to an industry sector. To address this challenge, we propose to incorporate the enterprise specific novel embeddings into the NLP foundation model leveraging enterprise contextual data. This helps to improve the disambiguation with industry abbreviations and the results show an improved accuracy in comparable to annotation by domain expertise.