Generative artificial intelligence has been hailed for its potential to transform creativity, and especially by lowering the barriers to content creation. While the creative potential of generative AI tools has often been highlighted, the popularity of these tools poses questions about intellectual property and copyright protection.
Generative AI tools such as ChatGPT are powered by foundational AI models, or AI models trained on vast quantities of data. Generative AI is trained on billions of pieces of data taken from text or images scraped from the internet.
Generative AI uses very powerful machine learning methods such as deep learning and transfer learning on such vast repositories of data to understand the relationships among those pieces of data – for instance, which words tend to follow other words. This allows generative AI to perform a broad range of tasks that can mimic cognition and reasoning.
One problem is that output from an AI tool can be very similar to copyright-protected materials. Leaving aside how generative models are trained, the challenge that widespread use of generative AI poses is how individuals and companies could be held liable when generative AI outputs infringe on copyright protections.
When prompts result in copyright violations
Researchers and journalists have raised the possibility that through selective prompting strategies, people can end up creating text, images or video that violates copyright law. Typically, generative AI tools output an image, text or video but do not provide any warning about potential infringement. This raises the question of how to ensure that users of generative AI tools do not unknowingly end up infringing copyright protection.
The legal argument advanced by generative AI companies is that AI trained on copyrighted works is not an infringement of copyright since these models are not copying the training data; rather, they are designed to learn the associations between the elements of writings and images like words and pixels. AI companies, including Stability AI, maker of image generator Stable Diffusion, contend that output images provided in response to a particular text prompt is not likely to be a close match for any specific image in the training data.
Builders of generative AI tools have argued that prompts do not reproduce the training data, which should protect them from claims of copyright violation. Some audit studies have shown, though, that end users of generative AI can issue prompts that result in copyright violations by producing works that closely resemble copyright-protected content.
Establishing infringement requires detecting a close resemblance between expressive elements of a stylistically similar work and original expression in particular works by that…