AI Testing
We’re designing tools to help ensure that AI systems are trustworthy, reliable and can optimize business processes. We create tests to simulate real-life scenarios and localize the faults in AI systems. We’re working on automating testing, debugging, and repairing AI models across a wide range of scenarios.
Our work
Tiny benchmarks for large language models
NewsKim MartineauWhat is red teaming for generative AI?
ExplainerKim MartineauAn open-source toolkit for debugging AI models of all data types
Technical noteKevin Eykholt and Taesung LeeAI diffusion models can be tricked into generating manipulated images
NewsKim MartineauDOFramework: A testing framework for decision optimization model learners
Technical noteOrit DavidovichManaging the risk in AI: Spotting the “unknown unknowns”
ResearchOrna Raz, Sam Ackerman, and Marcel Zalmanovici5 minute readIBM researchers check AI bias with counterfactual text
ResearchInkit Padhi, Nishtha Madaan, Naveen Panwar, and Diptikalyan Saha5 minute read
Publications
Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking
- Gabriel Rioux
- Apoorva Nitsure
- et al.
- 2024
- NeurIPS 2024
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios
- Samuel Ackerman
- Ella Rabinovich
- et al.
- 2024
- EMNLP 2024
Towards a Benchmark for Causal Business Process Reasoning with LLMs
- 2024
- BPM 2024
Why Don’t Prompt-Based Fairness Metrics Correlate?
- Abdelrahman Zayed
- Gonçalo Mordido
- et al.
- 2024
- ACL 2024
Data Contamination Report from the 2024 CONDA Shared Task
- Oscar Sainz
- Iker García-ferrero
- et al.
- 2024
- ACL 2024
Risk Aware Benchmarking of Large Language Models
- Apoorva Nitsure
- Youssef Mroueh
- et al.
- 2024
- ICML 2024