Developers and business people need to better understand the legal implications of generative AI, a new study from Stanford University urges. That's because a wave of court cases may be coming, and the code, language, and images generated through AI may potentially be based on copyrighted material.
The question is, who gets the credit for generative AI output? The study suggests this is still a hazy area.
Also: Who owns the code? If ChatGPT's AI helps write your app, does it still belong to you?
Generative AI enables developers and business users to generate code and narratives at the push of a button. The bad news is that "most of the words and images in datasets behind artificial intelligent agents like ChatGPT and DALL-E are copyrighted," the paper's authors point out. "Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation."
The legal doctrine that provides cover for the use of limited segments of the code, narratives, and images is "fair use." However, the study's authors assert, "If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. Fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use."
Also: Is humanity really doomed? Consider AI's Achilles heel
AI and machine learning practitioners "aren't necessarily aware of the nuances of fair use," Peter Henderson, a co-author of the paper, pointed out in a related interview. "At the same time, the courts have ruled that certain high-profile real-world examples are not protected fair use, yet those very same examples look like things AI is putting out. There's uncertainty about how lawsuits will come out in this area."
The co-authors predict an upcoming wave of lawsuits stemming from uncompensated use of code and content with generative AI. If courts eventually rule that AI does not meet the criteria of fair use, it "could dramatically curtail how generative AI is trained and used. As AI tools continue to advance in capabilities and scale, they challenge the traditional understanding of fair use, which has been well-defined for news reporting, art, teaching, and more. New AI tools -- both their capability and scale -- complicate this definition."
Also: Serving Generative AI just got a lot easier with OctoML's OctoAI
The IP and copyright implications for code also need to be weighed. "Like in natural language text cases, in software cases, literal infringement -- verbatim copying -- is unlikely to be fair use when it comprises a large portion of the code base," the co-authors explain. "And when the amount copied is small, the overall product is sufficiently different from the original one, or the code is sufficiently transformative, then fair use may be indicated under current standards."
Such standards may be more permissive than those for text or music. "Functional aspects of code are not protected by copyright, meaning that copying larger segments of code verbatim might be allowed in cases where the same level of similarity would not be permissible for text or music. Nonetheless, for software generated by foundation models, the more the generated content can be transformed from the original structure, sequence, and organization, the better."
Also: If you use AI-generated code, what's your liability exposure?
The report's co-authors recommend establishing technical guardrails. "Install fair use filters that try to determine when the generated work -- a chapter in the style of J.K. Rowling, for instance, or a song reminiscent of Taylor Swift -- is a little too much like the original and begins to infringe on fair use."