Publication
IJCAI 2024
Demo paper

Probabilistic feature matching for fast scalable visual prompting

Abstract

In this work, we propose a novel framework for image segmentation guided by visual prompting which leverages the power of vision foundation models. Inspired by recent advancements in computer vision, our approach integrates multiple large-scale pretrained models to address the challenges of segmentation tasks with limited and sparsely annotated data interactively provided by a user. Our method combines a frozen feature extraction backbone with a scalable and efficient probabilistic feature correspondence (soft matching) procedure derived from Optimal Transport to couple pixels between reference and target images. Moreover, a pretrained segmentation model is harnessed to translate user scribbles into reference masks and matched target pixels into output target segmentation masks. This results in a framework that we name Softmatcher, a versatile and fast training-free architecture for image segmentation by visual prompting. We demonstrate the efficiency and scalability of Softmatcher for real-time interactive image segmentation by visual prompting and showcase it in diverse visual domains including technical visual inspection use cases.