ITBench: Evaluating AI Agents across Diverse Real-World IT Automation TasksSaurabh JhaRohan Aroraet al.2025ICML 2025
CodeSift: An LLM-Based Reference-Less Framework for Automatic Code ValidationPooja AggarwalOishik Chatterjeeet al.2024CLOUD 2024