If a person studies a text then writes an article about the same subject as that text while using the same wording and discussing the same points, then it’s plagiarism whether or not they made an exact copy. Surely it should also be the case with LLM’s, which train on the data then inadvertently replicate the data again? The law has already established that it doesn’t matter what the process is for making the new work, what matters is how close it is to the original work.
If a person studies a text then writes an article about the same subject as that text while using the same wording and discussing the same points, then it’s plagiarism whether or not they made an exact copy. Surely it should also be the case with LLM’s, which train on the data then inadvertently replicate the data again? The law has already established that it doesn’t matter what the process is for making the new work, what matters is how close it is to the original work.