Re: It doesn't store the original, just 'interesting' features of the original
The complaint makes exactly this point. That not only did OpenAI use unlicensed copyrighted works to train the model but in addition the model stores substantial amounts of the unlicensed copyrighted work which it uses to generate responses.
https://www.courtlistener.com/docket/67810584/authors-guild-v-openai-inc/
88. Until very recently, ChatGPT could be prompted to return quotations of text from
copyrighted books with a good degree of accuracy, suggesting that the underlying LLM must
have ingested these books in their entireties during its “training.”
89. Now, however, ChatGPT generally responds to such prompts with the statement,
“I can’t provide verbatim excerpts from copyrighted texts.” Thus, while ChatGPT previously
provided such excerpts and in principle retains the capacity to do so, it has been restrained from
doing so, if only temporarily, by its programmers.
90. In light of its timing, this apparent revision of ChatGPT’s output rules is likely a
response to the type of activism on behalf of authors exemplified by the Open Letter addressed to
OpenAI and other companies by Plaintiff The Authors Guild, which is discussed further below.
91. Instead of “verbatim excerpts,” ChatGPT now offers to produce a summary of the
copyrighted book,