IDMU: Impact Driven Machine Unlearning
Abstract
Enterprise organizations have large amounts of data which is utilized by multiple Machine Learning (ML) models over various software frameworks. These models provide trends and insights from the data that can help enterprises define business rules around their processes. However, if certain aspects of this data are removed from the datasets, it could influence the business rules and policies in place. When a user requests data to be removed, the model retraining may be required called Machine Unlearning (MU). Recent research works in the area of MU include different methods of retraining the machine learning models. It turns out that there is lack of work in removing certain aspects of data, and quantifying its impact on the models. This paper aspires to provide a novel methodology IDMU (Impact Driven Machine Unlearning) that performs quantification of the impact of data removal requests while performing MU. Our method provides recommendations for data removal requests, factoring in underlying features of data. The results from the industrial application and evaluation of our method on a financial services dataset are encouraging. The overall IDMU had a mean MAPE of 10.25% over a set of 120 data removal requests. It also saved ~$1900 hours of model retraining time by factoring in urgency and impact of data removal data removal requests over a period of three years.