That is pretty much how I thought it would be 'reading' the training material. Nowhere in the AI's 'brain' is an exact replica of the original work, just an essence.
To be perfectly frank, this sort of gloss is not terribly meaningful. It's far enought from any actual technically accurate or precise insight into how transformer models work that it's not very useful for drawing conclusions, practical or legal.
Is there, in the model, a sequence of bits that correspond to the text of a given novel-length work in some encoding that the model can reasonably be held to have an algorithm for decoding into, say, Unicode?1 It's true that's unlikely.2
However, particularly for works that the model has seen often enough in the training set to somewhat overfit on, it's entirely possible that there are positions and gradients in the parameter space – which is very high-dimensional, after all – that reproduce substantial parts of a given work, and possibly all of it.
Any CTT-compatible computation can be reduced to some form of compression (just as it can be reduced to Boolean algebra, or the operation of a Turing machine or a Post machine, etc). What you refer to as "essence" should be called "information entropy", and LLMs (crude though unidirectional transformer stacks are) are capable of storing quite a lot of it – how much depending on how large the model is, the pre-compression parameter precision, how much compression is done, and so on. It's not necessarily going to be true for any given input (assuming it's much smaller than the model size) in the training set that not all of the information entropy in the input will be captured by the model. And, of course, the output doesn't have to be complete, or bit-for-bit exact, to be infringing in the legal sense. An ALL-SHOUTY copy of the first half of A Game of Thrones with Ned Stark referred to as "POOR LITTLE NEDDY" throughout3 would still be viewed dimly by the court.
And this last points to the real crux, which is that copyright law (i.e. Title 17) in the US, and the courts adjudicating upon it, are unlikely to care much about what is "stored" by an LLM and how it is represented. They're going to care about actual and plausible effects. Will LLMs have a chilling effect on creator revenues, and if so to what extent is that an actionable harm under the law? Can the LLM guardrails against reproducing portions of copyrighted works plausibly be bypassed, now or in the future, and how infringing would the output be? Is substantial information from copyrighted works incorporated (in any representation) in the models, and if so is that incorporation transformative or otherwise allowed under Title 17?
1It should be obvious that trivially a given LLM has a bit-sequence corresponding to any given extant novel under some arbitrary encoding, because LLMs are large enough to represent any single given novel, and you can just invent such an encoding on the spot. Thus we have to distinguish between arbitrary encodings and reasonably plausible ones.
2Not impossible, though, given the size of these models, for some relatively small set of works, particularly given the low information density of natural languages. Model compression would tend to eliminate these, but if you figure that, say, Moby-Dick has around 222 bits of entropy – quick estimate by deflating the plaintext version from Project Gutenberg – and a GPT-3-class LLM weighing in around, oh, 233 bits, then if those bits were evenly and randomly distributed (they're not, but let's pretend for a moment) you'd have around a 1-in-2048 chance of finding a target bitstring with the right information. Assuming I got the arithmetic right. Of course you'd need to decompress it, so that's not really a fair estimate.
3Actually, does Poor Little Neddy survive to the halfway point? I don't remember.