In less than a year, generative AI has transformed notions of what computers might be capable of. Now, we’re in the midst of a heated debate about what role computation will play in the future of art and the creative industries. There is already a lengthy docket of litigation aimed at generative AI companies, and most of these cases are focused on stopping the training of generative AI on copyrighted works without a license.
No doubt, the training of generative AI can implicate copyright, and it’s an important locus of policy debate. Indeed, even at this initial stage of technology development and litigation, copyright is being exhaustively debated.
We don’t focus on that in this article. Many scholars, developers, and others have elaborated on why copyright should permit training of generative AI on copyrighted works without a license, and have responded to common misconceptions about copyright – such as the notion that copying, or making use of, a copyrighted work without permission is per se infringement. We find these proponents’ arguments to be quite compelling, but we acknowledge that others view the matter differently, and one cannot make a fully categorical argument about how existing law applies in all cases (even if one just focuses on a single jurisdiction). In any case, while we link to those arguments above, we won’t delve deeply into them here.
Instead, our goal in this brief piece is to engage with concerns about generative AI’s impact on creativity without being bound by the contours of copyright. Copyright is too often treated as a ‘hammer’ for many different ‘nails’ – the singular instrument for addressing a variety of different economic, cultural, and other concerns. As a result, the current debate can obstruct alternative ways of thinking that may be more suitable or practical.
We focus here on three particular concerns with the training of generative AI, and highlight alternative measures to copyright that can help address them. We admittedly will simplify aspects of the debate and ignore others entirely in order to help broaden (but not resolve) the frame for envisioning solutions. Our hope is that this approach to an incredibly complex, fast-moving set of questions may point more clearly toward constructive paths forward.
Generative AI Is Unfair Because It Uses Content Without Permission
Many stakeholders feel that training generative AI on existing works without permission is per se wrong. While this has been framed as a copyright issue, it is not only that. Adobe’s generative AI system was trained on content with a clear license from the copyright holders, but creators have still objected that they didn’t anticipate this specific use when they agreed to the license. Even fan fiction authors – who often build on copyrighted works without permission – have raised concerns about generative AI trained on their works.
This speaks to a feeling that training AI on a work breaks an implicit social contract, if not the law. One way this is sometimes framed is “it’s not just about copyright, it’s about consent.”
On the one hand, this framing doesn’t help resolve the debate – it just shifts the terms. Debating the bounds of copyright means addressing whether and to what extent rights holders can demand consent for certain uses under the law, and there are many uses for which copyright does not typically require consent (e.g, reading a book, making a parody). Invoking consent by itself does not determine whether and how to sustain such uses or draw different lines, whether via copyright, other rights, or other areas of the law. For instance, the idea of crafting new “data rights” related to AI training still requires reckoning with trade-offs, including how such requirements might impede other creators and people who benefit from generative AI tools.
On the other hand, the broader framing around consent opens the door to other types of mechanisms that might help address the underlying concern. Norms and technical standards can also help people define, signal, and respect each other’s preferences. These mechanisms still come with tradeoffs, but norms and standards may be able to evolve with and be tailored to different uses and circumstances in ways that law is not suited.
It can be easy to overlook the many ways in which creative endeavors and industries are regulated today as much through ‘copy-norms’ as by formal rights. For instance, norms around attribution and plagiarism play a critical role for a range of everyday creators and innovators, even where the law does not necessarily require it. Fashion and cuisine (that is, food recipes and restaurants) have thrived in an environment where lots of copying is permitted under the law; at the same time, norms still can shape behavior in these areas, even if they are contested and continue to evolve over time.
One particularly pertinent institutionalized norm in the context of AI is the robots.txt standard, which is widely used by website operators to communicate whether and how they want their sites to be accessed (‘crawled’) by automated mechanisms. While not instantiated by or explicitly required by law, commercial search engines and others broadly comply with the standard. OpenAI has recently explained how robots.txt can be used to disallow its web crawler, GPTBot, to access content for AI training purposes. Spawning.ai is developing a standard specifically aimed at signaling whether a site accedes to AI training, and a group of publishers within the World Wide Web Consortium (W3C) is working along similar lines.
More generally, generative AI companies are taking steps to provide creators with more choice and control. Spawning.ai is also building tools for creators and rights holders to signal their preferences, upload that content to a database, and then make those signals available to third parties, and StabilityAI has already incorporated these signals in training its Stable Diffusion image generation tool. Meanwhile, Google kicked off an effort to investigate robots.txt-like solutions. Relatedly, OpenAI announced that it will take steps to limit generating content in the style of a living creator, and Microsoft provides a way for creators to limit such outputs through its Image Creator with Bing tool.
Meanwhile, communities of creators and fans are also developing norms appropriate for their own contexts. For instance, Kickstarter will be asking developers of AI tools to describe how they manage consent for use of content. That way, users of the site can factor that information when they decide whether or not to back a project.
Again, the emergence of new norms and supporting tools does not mean that everyone will be satisfied or that we can avoid difficult tradeoffs. But they can provide a practical path forward for reconciling different interests.