Fine-tuning for Extreme Event Prediction: Are Ensemble Methods All You Need?Imran NasimJoao Lucas de Sousa Almeida2025KDD 2025
Conceptual Diagnostics for Knowledge Graphs and Large Language ModelsRosario Uceda-SosaMaria Changet al.2025ACL 2025
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific LanguagesMehant KammakomatiSameer Pimparkhedeet al.2025ACL 2025
BI-Bench : A Comprehensive Benchmark Dataset and Unsupervised Evaluation for BI SystemsAnkush GuptaAniya Aggarwalet al.2025ACL 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical StudiesMassimiliano PronestiJoao Bettencourt-Silvaet al.2025ACL 2025
Multi-Sense Embeddings for Language Models and Knowledge DistillationQitong WangMohammed Zakiet al.2025ACL 2025
NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional ReasoningZheyuan ZhangYiyang Liet al.2025ACL 2025
R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic MemoryTenghao HuangKinjal Basuet al.2025ACL 2025