Fast and Automatic Visual Label Conflict Resolution
Abstract
Even with the rise of unsupervised learning and weak supervision techniques, human-labeled data is still a necessary part of machine learning pipelines in many real-world contexts and applications. This often involves using crowdworkers for the laborious task of labeling large amounts of data. This is a largely asynchronous process and can lead to conflict among the workers, where individual labelers potentially submit labels in disagreement from each other for a given data item. When such noisy data is fed to a machine learning model, the accuracy and performance (on test data) of the overall system can suffer. One popular workaround is to entirely discard the data items with conflict. This however, leads to wastage of expensive, human-supplied data. Moreover, the data points with conflicting labels often are the data points which are crucial in determining the decision boundaries for the model itself. Another possibility is to automate conflict resolution. Here however, given humans themselves are in disagreement, state-of-the-art models can not be expected to reliably solve the problem. In practice therefore, it becomes imperative for a human to step in and resolve the conflict. Given conflict resolution is a non-trivial task, assistance of expensive subject matter experts (SMEs) is required. To help manage the SME’s time more efficiently, we propose an intelligent approach to resolve label conflicts by automatically re-ranking the conflicts in such an order that the conflicts with the most missing information useful to the model are displayed first, complete with ML assistance to auto-resolve easy conflicts, and explanations for justifying decisions and improving explainability.