Megh Thakkar, Quentin Fournier, et al.
ACL 2025
Natural language text generation has seen significant improvements with the advent of pre-trained language models. Using such language models to predict personal data entities, in place of redacted spans in text, could help generate synthetic datasets. In order to address privacy and ethical concerns with such datasets, we need to ensure that the masked entity predictions are also fair and controlled by application specific constraints. We introduce new ways to inject hard constraints and knowledge into the language models that address such concerns and also improve performance on this task.
Megh Thakkar, Quentin Fournier, et al.
ACL 2025
Thomas Bohnstingl, Ayush Garg, et al.
ICASSP 2022
Amar Prakash Azad, Supriyo Ghosh, et al.
IAAI 2022
Jihun Yun, Aurelie Lozano, et al.
NeurIPS 2021