Register now for better personalized quote!

A library of AI training with copyrighted books

Aug, 22, 2023 Hi-network.com

An investigation by The Atlantic has revealed that popular generative AI models, including Meta's open source Llama, were partially trained using pirated versions of books by leading authors. This includes models such as BloombergGPT and GPT-J from the nonprofit EleutherAI. The pirated books, which consisted of approximately 170,000 titles published within the past 20 years, were part of a larger dataset called the Pile, which was freely available online until recently. Among the authors whose works were copied without permission are renowned names like Stephen King, Margaret Atwood, Haruki Murakami, and Jonathan Franzen. Notably, Sarah Silverman and two other authors have already filed a lawsuit against Meta and OpenAI for copyright infringement.

The person responsible for releasing the dataset claimed it was done to provide "OpenAI-grade training data" to others. While some developers may argue fair use, others may have been unaware they were using copyrighted material. The legal implications surrounding the use of copyrighted data to train AI models remain unresolved. However, EleutherAI is working on creating a version of the Pile that exclusively contains documents licensed for such use.

tag-icon Hot Tags : Artificial Intelligence Intellectual property rights

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.