And basically, I can. I can quote parts of it, I can give it to a friend to read, I can rip out a page and tape it to the wall, I can teach my kid how to read with it.
These are things you’re allowed to do with your copy of the book. But you are not allowed to, for example create a copy of it and give that to a friend, create a play or a movie out of it. You don’t own the story, you own a copy of it on a specific medium.
However, just as a person does not own the work of an author, the authors do not own words, grammar, sentences or even their own style. Similarly, they do not own the names of the characters in their books or the universe in which the plot is happening. They even do not “own” their own name.
So the only question remaining becomes whether is AI allowed to “read” a book. In the future authors might prohibit it, but hey, we’re just going to end up with a slightly more archaic-speaking GPT over time because it will not train on new releases. And that’s fine by me.
I think that in the end it should be a matter of licenseship (?). The author might give you the right to train a model on it, if you pay them for it. Just like you’d have get permission if you want to turn their work into a play or a show.
I don’t think the argument (not yours, but often seen in discussions like these) about “humans can be inspired by a work, so a computer should be allowed to be as well” holds any ground. For it would take a human much more time to make a style their own, as well as to recreate large amounts of it. For a ai model the same is a matter of minutes and seconds, respectively. So any comparison is moot, imho.
But the thing is, it’s not similar to turning their work into a play or a TV show. You aren’t replicating their story at all, they put words in a logical order and you are using that to teach the AI what the next word logically could be.
As for humans taking much more time to properly mimic style, of course that’s true (assuming untrained). But an AI requires far more memory and data to do that. A human can replicate a style with just examples of that style given time. An AI needs to scrape basically the entire internet (and label it, which takes quite some time) to be able to do so. They may need different things but it’s ridiculous to say that they’re completely incomparable. Besides, you make it sound like AI is it’s own entity that wasn’t created, trained, and used by humans in the first place.
It’s not the same as turning it into a play, but it’s doing something with it beyond its intended purpose, specifically with the intention to produce derivatives of it at an enormous scale.
Whether or not a computer needs more or less of it than a human is not a factor, in my opinion. Actually, the fact that more input is required than for a human only makes it worse, since more of the creators work has to be used without their permission.
Again, the reason why I think it’s incomparable is that when a human learns to do this, the damage is relatively limited. Even the best writer can only produce so many pages per day. But when a model learns to do it, the ability to apply it is effectively unlimited. The scale of the infraction is so exponentially more extreme, that I don’t think it’s reasonable to compare them.
Lastly, if I made it sound like that, I apologise, that was not my intention. I don’t think it’s the models fault, but the people who decided to (directly or indirectly by not vetting their input data) take somebody’s copyrighted work and train an LLM on it.
These are things you’re allowed to do with your copy of the book. But you are not allowed to, for example create a copy of it and give that to a friend, create a play or a movie out of it. You don’t own the story, you own a copy of it on a specific medium.
As to why it’s unethical, see my comment here.
I agree, the ownership is not absolute.
However, just as a person does not own the work of an author, the authors do not own words, grammar, sentences or even their own style. Similarly, they do not own the names of the characters in their books or the universe in which the plot is happening. They even do not “own” their own name.
So the only question remaining becomes whether is AI allowed to “read” a book. In the future authors might prohibit it, but hey, we’re just going to end up with a slightly more archaic-speaking GPT over time because it will not train on new releases. And that’s fine by me.
I think that in the end it should be a matter of licenseship (?). The author might give you the right to train a model on it, if you pay them for it. Just like you’d have get permission if you want to turn their work into a play or a show.
I don’t think the argument (not yours, but often seen in discussions like these) about “humans can be inspired by a work, so a computer should be allowed to be as well” holds any ground. For it would take a human much more time to make a style their own, as well as to recreate large amounts of it. For a ai model the same is a matter of minutes and seconds, respectively. So any comparison is moot, imho.
But the thing is, it’s not similar to turning their work into a play or a TV show. You aren’t replicating their story at all, they put words in a logical order and you are using that to teach the AI what the next word logically could be.
As for humans taking much more time to properly mimic style, of course that’s true (assuming untrained). But an AI requires far more memory and data to do that. A human can replicate a style with just examples of that style given time. An AI needs to scrape basically the entire internet (and label it, which takes quite some time) to be able to do so. They may need different things but it’s ridiculous to say that they’re completely incomparable. Besides, you make it sound like AI is it’s own entity that wasn’t created, trained, and used by humans in the first place.
It’s not the same as turning it into a play, but it’s doing something with it beyond its intended purpose, specifically with the intention to produce derivatives of it at an enormous scale.
Whether or not a computer needs more or less of it than a human is not a factor, in my opinion. Actually, the fact that more input is required than for a human only makes it worse, since more of the creators work has to be used without their permission.
Again, the reason why I think it’s incomparable is that when a human learns to do this, the damage is relatively limited. Even the best writer can only produce so many pages per day. But when a model learns to do it, the ability to apply it is effectively unlimited. The scale of the infraction is so exponentially more extreme, that I don’t think it’s reasonable to compare them.
Lastly, if I made it sound like that, I apologise, that was not my intention. I don’t think it’s the models fault, but the people who decided to (directly or indirectly by not vetting their input data) take somebody’s copyrighted work and train an LLM on it.