Vague Preference Policy Learning for Conversational Recommendation

Gangyi Zhang; Chongming Gao; Wenqiang Lei; Xiaojie Guo; Shijun Li; Hongshen Chen; Zhuozhi Ding; Sulong Xu; Lingfei Wu

doi:10.1145/3717831

ACM TOIS

Paper

09 May 2025

Vague Preference Policy Learning for Conversational Recommendation

View publication

Abstract

Conversational Recommendation Systems (CRS) effectively address information asymmetry by dynamically eliciting user preferences through multi-turn interactions. However, existing CRS methods commonly assume that users have clear, definite preferences for one or multiple target items. This assumption can lead to over-trusting user feedback, treating accepts/rejects as definitive signals to filter items and reduce the candidate space, potentially causing over-filtering and excluding relevant alternatives. In reality, users often exhibit vague preferences, lacking well-defined inclinations for certain attribute types (e.g., color, pattern), and their decision-making process during interactions is rarely binary. Instead, users’ choices are relative, reflecting a range of preferences rather than strict likes or dislikes. To address this issue, we introduce a novel scenario called Vague Preference Multi-Round Conversational Recommendation (VPMCR), which employs a soft estimation mechanism to assign non-zero confidence scores to all candidate items, accommodating users’ vague and dynamic preferences while mitigating over-filtering. In the VPMCR setting, we introduce a solution called Vague Preference Policy Learning (VPPL), which consists of two main components: Ambiguity-Aware Soft Estimation (ASE) and Dynamism-Aware Policy Learning (DPL). ASE aims to accommodate the ambiguity in user preferences by estimating preference scores for both directed and inferred preferences, employing a choice-based approach and a time-aware preference decay strategy. DPL implements a policy learning framework, leveraging the preference distribution from ASE, to guide the conversation and adapt to changes in users’ preferences for making recommendations or querying attributes. Extensive experiments conducted on diverse datasets demonstrate the effectiveness of VPPL within the VPMCR framework, outperforming existing methods and setting a new benchmark for CRS research. Our work represents a significant advancement in accommodating the inherent ambiguity and relative decision-making processes exhibited by users, improving the overall performance and applicability of CRS in real-world settings.

Paper