Publication
ICWS 2024
Workshop paper

Towards Collecting Royalties for Copyrighted Data for Generative Models

Abstract

Addressing issues of copyrighted data in the context of generative models has become an important issue for content creators, publishers, organizations training generative models, and those who deploy generative models for particular applications. Copyright holders want to ensure that they are fairly compensated for their work and users of training data and models do not want to expose themselves to litigation. However, traditional models of bulk-licensing data fit only poorly in the context of model training. In this paper, we want to discuss why a traditional data license is not always a good fit, how data is used in the life-cycle of generative models, and which impact data has on model output. This can be used as a foundation for a pay-per-(model) use compensation based how data contributes to a model's output. Having a way to compensate copyright holders in this way reduces risk for model trainers, avoids large investments upfront, and encourages a lively data ecosystem in which the creation and distribution of original work is incentivized and fairly compensated.