About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
DSMM 2019
Conference paper
Learning Explainable Entity Resolution Algorithms for Small Business Data using SystemER
Abstract
The 2019 FEIII CALI data challenge aims at linking diferent representations of the same real-world entities across multiple public datasets that collect identiication and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classiies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of labor-intensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human efort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.