Visual Prompting

Visual Prompting

Using AI to build computer vision models within minutes, through quick intuitive prompts, and with just a few images

Overview

Visual Prompting is an innovative Artificial Intelligence (AI) approach for computer vision that enables users to transform an unlabelled dataset into a powerful segmentation model, with just a few clicks.

In the field of AI, prompting generally refers to the action of providing textual input to an AI model, so that the model performs a certain task. For example, we can prompt a chatbot to write an essay about a specific topic, or to generate a picture that corresponds to a given description.

Visual Prompting extends this idea for image-to-image applications. In this scenario users can prompt an AI model by simply clicking or quickly scribbling over few examples of their objects of interest. The AI model then extends these prompts and segments these objects as a whole. Once the correctness of this segmentation is confirmed by the user, the model is then able to detect similar objects in other images.

Visual Prompting Lab (VPL) – IBM Research Service

Visual Prompting Lab (VPL) is a web-service developed by IBM Research where users can swiftly upload a few images, prompt objects they are looking for, and within minutes build an AI model that can detect these objects. Behind VPL Service, IBM Research has plugged a novel, unique pipeline that combines multiple Large Vision Models (LVMs) to automate segmentation tasks, with limited and sparsely annotated data interactively provided by a user.

Visual Prompting Lab service applied to car inspection, satellite images, and concrete surface inspection. The user input and model output are in real-time (i.e., the video is not accelerated).

Visual Prompting for Enterprise Visual Inspection

Visual Prompting is a paradigm shift in the field of computer vision: being able to build an accurate segmentation model in just a few seconds of work was unthinkable a few years ago.

The benefit of this technology lies in its application in technical domains, where the available data is usually limited, and new models are needed on a daily basis. We have already successfully applied Visual Prompting for visual inspection throughout multiple applications, such as in discrete manufacturing, quality control, and infrastructure monitoring.

In addition, Visual Prompting can also be applied as a precursor to boost the performance of other downstream tasks - by isolating the actual area of inspection from the background. For instance, in the case of unsupervised anomaly detection we observed an average 3x increase in accuracy (precision), while also significantly reducing false positives.

Qualitative visualization of Visual Prompting accuracy
Qualitative visualization of Visual Prompting accuracy improvement when applied as a pre-processing step for unsupervised anomaly detection on IBM System Z pinboards.

Try Visual Prompting Lab

Our team at IBM Research is looking for prospective customers who wish to try our technology. Submit your request and we will contact you soon.