View all topics

AI Testing

We’re designing tools to help ensure that AI systems are trustworthy, reliable and can optimize business processes. We create tests to simulate real-life scenarios and localize the faults in AI systems. We’re working on automating testing, debugging, and repairing AI models across a wide range of scenarios.

Our work

ASTER: Natural and multi-language unit test generation with LLMs
Technical note
Rangeet Pan, Rahul Krishna, Raju Pavuluri, and Saurabh Sinha
30 Apr 2025
- AI
- AI Testing
Tiny benchmarks for large language models
News
Kim Martineau
03 Jun 2024
What is red teaming for generative AI?
Explainer
Kim Martineau
11 Apr 2024
An open-source toolkit for debugging AI models of all data types
Technical note
Kevin Eykholt and Taesung Lee
08 Sep 2023
AI diffusion models can be tricked into generating manipulated images
News
Kim Martineau
05 Jun 2023
DOFramework: A testing framework for decision optimization model learners
Technical note
Orit Davidovich
02 Feb 2023
See more of our work on AI Testing

Publications

Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes
- - Gal Amram
  - Ora Nova Fandina
  - et al.
- 2025
- ASE 2025
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation
- - Noy Sternlicht
  - Ariel Gera
  - et al.
- 2025
- EMNLP 2025
Effective Red-Teaming of Policy-Adherent Agents
- - Itay Nakash
  - George Kour
  - et al.
- 2025
- EMNLP 2025
Agentic Process Observability: Discovering Behavioral Variability
- - Fabiana Fournier
  - Lior Limonad
  - et al.
- 2025
- ECAI 2025
Exposing AI Bias by Crowdsourcing: Democratizing Critique of Large Language Models
- - Hangzhi Guo
  - Pranav Venkit
  - et al.
- 2025
- AIES 2025
The NorthPole Validator: A Cycle-Accurate Simulator for HW/SW Codesign of a Prescheduled Neural Inference Accelerator
- - Alexander Andreopoulos
  - Michael Debole
  - et al.
- 2025
- HPEC 2025

View all publications

Featured post

Ensuring trustworthy AI through testing

Read the blog