Publications

72 results for AI Testing

Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes
- - Gal Amram
  - Ora Nova Fandina
  - et al.
- 2025
- ASE 2025
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation
- - Noy Sternlicht
  - Ariel Gera
  - et al.
- 2025
- EMNLP 2025
Learn more about our AI Testing work
Effective Red-Teaming of Policy-Adherent Agents
- - Itay Nakash
  - George Kour
  - et al.
- 2025
- EMNLP 2025
Agentic Process Observability: Discovering Behavioral Variability
- - Fabiana Fournier
  - Lior Limonad
  - et al.
- 2025
- ECAI 2025
Exposing AI Bias by Crowdsourcing: Democratizing Critique of Large Language Models
- - Hangzhi Guo
  - Pranav Venkit
  - et al.
- 2025
- AIES 2025
The NorthPole Validator: A Cycle-Accurate Simulator for HW/SW Codesign of a Prescheduled Neural Inference Accelerator
- - Alexander Andreopoulos
  - Michael Debole
  - et al.
- 2025
- HPEC 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
- - Giacomo Camposampiero
  - Michael Hersche
  - et al.
- 2025
- NeSy 2025
StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation
- - Satyananda Kashyap
  - Sola Shirai
  - et al.
- 2025
- VLDB 2025
Evaluating LLM-based Agents: Foundations, Best Practices and Open Challenges
- - Roy Bar-Haim
  - Arman Cohan
  - et al.
- 2025
- IJCAI 2025
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
- - George Kour
  - Itay Nakash
  - et al.
- 2025
- ACL 2025