Zhiyuan He, Yijun Yang, et al.
ICML 2024
The increased use of personal assistants has made question-answering a common method for user-system interaction. In these systems, while it is easy to observe implicit feedbacks such as a user clicking on a link provided by the QA system, they can be noisy. On the other hand, receiving explicit feedback on the response is rare but more valuable. To address this issue, this paper proposes a new stochastic multi-armed bandit model that considers both types of feedbacks, noisy and sparse rewards. The model is studied in both classical and contextual bandit settings, and efficient algorithm is proposed and analyzed based on the UCB framework. This algorithm is evaluated through empirical studies on various reward distributions and a real-world dataset and application.
Zhiyuan He, Yijun Yang, et al.
ICML 2024
Hazar Yueksel, Ramon Bertran, et al.
MLSys 2020
Kush Varshney, Miao Liu, et al.
ICASSP 2025
Megh Thakkar, Quentin Fournier, et al.
ACL 2024