Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?
“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
I will repeat what I have proffered before:
If OpenAI stated that it is impossible to train leading AI models without using copyrighted material, then, unpopular as it may be, the preemptive pragmatic solution should be pretty obvious, enter into commercial arrangements for access to said copyrighted material.
Claiming a failure to do so in circumstances where the subsequent commercial product directly competes in a market seems disingenuous at best, given what I assume is the purpose of copyrighted material, that being to set the terms under which public facing material can be used. Particularly if regurgitation of copyrighted material seems to exist in products inadequately developed to prevent such a simple and foreseeable situation.
Yes I am aware of the USA concept of fair use, but the test of that should be manifestly reciprocal, for example would Meta allow what it did to MySpace, hack and allow easy user transfer, or Google with scraping Youtube.
To me it seems Big Tech wants its cake and to eat it, where investor $$$ are used to corrupt open markets and undermine both fundamental democratic State social institutions, manipulate legal processes, and undermine basic consumer rights.
Agreed.
There is nothing “fair” about the way Open AI steals other people’s work. ChatGPT is being monetized all over the world and the large number of people whose work has not been compensated will never see a cent of that money.
At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.
Tech bros are disgusting.
Tech bros are disgusting.
That’s not even getting into the fraternity behavior at work, hyper-reactionary politics and, er, concerning age preferences.
Yup. I said it in another discussion before but think its relevant here.
Tech bros are more dangerous than Russian oligarchs. Oligarchs understand the people hate them so they mostly stay low and enjoy their money.
Tech bros think they are the savior of the world while destroying millions of people’s livelihood, as well as destroying democracy with their right wing libertarian politics.
With your logic all artists will have to pay copyright fees just to learn how to draw. All musicians will have to pay copyright fees just to learn their instrument.
I guess I should clarify by saying I’m a professional musician.
Do musicians not buy the music that they want to listen to? Should they be allowed to torrent any MP3 they want just because they say it’s for their instrument learning?
I mean I’d be all for it, but that’s not what these very same corporations (including Microsoft when it comes to software) wanted back during Napster times. Now they want a separate set of rules just for themselves. No! They get to follow the same laws they force down our throats.
Everything you said was completely irrelevant to what I mentioned and just plain ignorant.
Since when do you buy all the music you have ever listened to?
I suspect the US government will allow OpenAI to continue doing as it please to keep their competitive advantage in AI over China (which don’t have problem with using copyrighted materials to train their models). They already limit selling AI-related hardware to keep their competitive advantage, so why stop there? Might as well allow OpenAI to continue using copyrighted materials to keep the competitive advantage.
Yep, completely agree.
Case in point: Steam has recently clarified their policies of using such Ai generated material that draws on essentially billions of both copyrighted and non copyrighted text and images.
To publish a game on Steam that uses AI gen content, you now have to verify that you as a developer are legally authorized to use all training material for the AI model for commercial purposes.
This also applies to code and code snippets generated by AI tools that function similarly, such as CoPilot.
So yeah, sorry, either gotta use MIT liscensed open source code or write your own, and you gotta do your own art.
I imagine this would also prevent you from using AI generated voice lines where you trained the model on basically anyone who did not explicitly consent to this as well, but voice gen software that doesnt use the ‘train the model on human speakers’ approach would probably be fine assuming you have the relevant legal rights to use such software commercially.
Not 100% sure this is Steam’s policy on voice gen stuff, they focused mainly on art dialogue and code in their latest policy update, but the logic seems to work out to this conclusion.
The problem is not the use of copyrighted material. The problem is doing so without permission and without paying for it.
What’s stopping AI companies from paying royalties to artists they ripped off?
Also, lol at accounts created within few hours just to reply in this thread.
The moment their works are the one that got stolen by big companies and driven out of business, watch their tune change.
Edit: I remember when Reddit did that shitshow, and all the sudden a lot of sock / bot accounts appeared. I wasn’t expecting it to happen here, but I guess election cycle is near.
Money is not always the issue. FOSS software for example. Who wants their FOSS software gobbled up by a commercial AI regardless. So there are a variety of issues.
I don’t care if any of my FOSS software is gobbled up by a commercial AI. Someone reading my code isn’t a problem to me. If it were, I wouldn’t publish it openly.
I do, especially when someone’s profiting from it, while my license is strictly for non commercial.
Same. I didn’t write it for them. I wrote it for folks who don’t necessarily have a lot of money but want something useful.
Well, for $20/mo I get a super-educated virtual assistant/tutor. It’s pretty awesome.
I’d say that’s some good value for people without much money. All of my open source libs are published under the MIT license if I recall correctly. I’ve made so much money using open source software, I don’t mind giving back, even to people who are going to make money with my code.
It makes me feel good to think my code could be involved in money changing hands. It’s evidence to me that I built something valuable.
$20/mo
good value for people without much money
The absolute majority of people can not afford that. This is especially true for huge part of the art that was used to train various models on.
AI currently is a tool for rich people by rich people which uses the work of poor people who themselves won’t be able to benefit from it.
And yet it is orders of magnitude less than it cost a year ago to hire someone to do research, write reports, and tutor me in any subject I want.
If an artist can’t afford $20/mo they need a job to support that hobby.
What’s stopping AI companies from paying royalties to artists they ripped off?
profit. AI is not even a profitable business now. They exist because of the huge amount of investment being poured into it. If they have to pay their fair share they would not exist as a business.
what OpenAI says is actually true. The issue IMHO is the idea that we should give them a pass to do it.
Uber wasn’t making profit anyway, despite all the VCs money behind it.
I guess they have reasons not to pay drivers properly. Give Uber a free pass for it too
When you think about it, all companies would make so much more money if they didn’t have to pay their staff, or pay for materials they use! This whole economy and capitalism business, which relies on money being exchanged for goods and services, is clearly holding back profits. Clearly the solution here is obvious: everybody should embrace OpenAI’s methods and simply grab whatever they want without paying for it. Profit for everyone!
Some relevant comments from Ars:
leighno5
The absolute hubris required for OpenAI here to come right out and say, ‘Yeah, we have no choice but to build our product off the exploitation of the work others have already performed’ is stunning. It’s about as perfect a representation of the tech bro mindset that there can ever be. They didn’t even try to approach content creators in order to do this, they just took what they needed because they wanted to. I really don’t think it’s hyperbolic to compare this to modern day colonization, or worker exploitation. ‘You’ve been working pretty hard for a very long time to create and host content, pay for the development of that content, and build your business off of that, but we need it to make money for this thing we’re building, so we’re just going to fucking take it and do what we need to do.’
The entitlement is just…it’s incredible.
4qu4rius
20 years ago, high school kids were sued for millions & years in jail for downloading a single Metalica album (if I remember correctly minimum damage in the US was something like 500k$ per song).
All of a sudden, just because they are the dominant ones doing the infringment, they should be allowed to scrap the entire (digital) human knowledge ? Funny (or not) how the law always benefits the rich.
- This is not REALLY about copyright - this is an attack on free and open AI models, which would be IMPOSSIBLE if copyright was extended to cover the case of using the works for training.
- It’s not stealing. There is literally no resemblance between the training works and the model. IP rights have been continuously strengthened due to lobbying over the last century and are already absurdly strong, I don’t understand why people on here want so much to strengthen them ever further.
I don’t understand why people on here want so much to strengthen them ever further.
It is about a lawless company doing lawless things. Some of us want companies to follow the spirit, or at least the letter, of the law. We can change the law, but we need to discuss that.
IANAL, why isn’t it fair use?
The two big arguments are:
- Substantial reproduction of the original work, you can get back substantial portions of the original work from an AI model’s output.
- The AI model replaces the use of the original work. In short, a work that uses copyrighted material under fair use can’t be a replacement for the initial work.
you can get back substantial portions of the original work from an AI model’s output
Have you confirmed this yourself?
In its complaint, The New York Times alleges that because the AI tools have been trained on its content, they sometimes provide verbatim copies of sections of Times reports.
OpenAI said in its response Monday that so-called “regurgitation” is a “rare bug,” the occurrence of which it is working to reduce.
“We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use,” OpenAI said.
The tech company also accused The Times of “intentionally” manipulating ChatGPT or cherry-picking the copycat examples it detailed in its complaint.
https://www.cnn.com/2024/01/08/tech/openai-responds-new-york-times-copyright-lawsuit/index.html
The thing is, it doesn’t really matter if you have to “manipulate” ChatGPT into spitting out training material word-for-word, the fact that it’s possible at all is proof that, intentionally or not, that material has been encoded into the model itself. That might still be fair use, but it’s a lot weaker than the original argument, which was that nothing of the original material really remains after training, it’s all synthesized and blended with everything else to create something entirely new that doesn’t replicate the original.
So that’s a no? Confirming it yourself here means doing it yourself. Have you gotten it to regurgitate a copyrighted work?
Sorry AIs are not humans. Also executives like Altman are literally being paid millions to steal creator’s work.
I didn’t say anything about AIs being humans.
They’re also not vegetables 😡
Agreed on both counts… Except Microsoft sings a different tune when their software is being “stolen” in the exact same way. They want to have it both ways - calling us pirates when we copy their software, but it’s “without merit” when they do it. Fuck’em! Let them play by the same rules they want everyone else to play.
That sounds bad. Do you have evidence for MS behaving this way?
Literally first hit on google (after the NYT links).
Having read through these comments, I wonder if we’ve reached the logical conclusion of copyright itself.
copyright has become a tool of oppression. Individual author’s copyright is constantly being violated with little resources for them to fight while big tech abuses others work and big media uses theirs to the point of it being censorship.
Perhaps a fair compromise would be doing away with copyright in its entirety, from the tiny artists trying to protect their artwork all the way up to Disney, no exceptions. Basically, either every creator has to be protected, or none of them should be.
IMO the right compromise is to return copyright to its original 14 year term. OpenAI can freely train on anything up to 2009 which is still a gigantic amount of material while artists continue to be protected and incentivized.
I’m increasingly convinced of that myself, yeah (although I’d favour 15 or 20 years personally, just because they’re neater numbers than 14). The original purpose of copyright was to promote innovation by ensuring a creator gets a good length of time in which to benefit from their creation, which a 14-20 year term achieves. Both extremes - a complete lack of copyright and the exceedingly long terms we have now - suppress innovation.
I’d favour 15 or 20 years personally, just because they’re neater numbers than 14
Another neat number is: 4.
That’s it, if you don’t make money on your creation in 4 years, then it’s likely trash anyway.
I’ve said it before and I’ll say it again! (My apologies if it happens to be to the same person, lol)
Early access developers in shambles!
that would mean governments prosecuting all offences, which is not going to happen. I doubt any country would have enough resources for doing that
deleted by creator
Apparently they’re going to just make only the little guy’s copyrights effectively meaningless, so yeah.
It’s crazy how everyone is suddenly in favour of IP law.
IP law used to stop corporations from profiting off of creators’ labor without compensation? Yeah, absolutely.
IP law used to stop individuals from consuming media where purchases wouldn’t even go to the creators, but some megacorp? Fuck that.
I’m against downloading movies by indie filmmakers without compensating them. I’m not against downloading films from Universal and Sony.
I’m against stealing food from someone’s garden. I’m not against stealing food from Safeway.
If you stop looking at corporations as being the same as individuals, it’s a very simple and consistent viewpoint.
IP law shouldn’t exist, but if it does it should only exist to protect individuals from corporations. When that’s how it’s being used, like here, I accept it as a necessary evil.
IP law used to compensate creators “until their death + 70 years”… you can spin it however you want, that’s just plain wrong.
If you stop looking at corporations as being the same as individuals
That’s a separate bonkers legislation. Two wrongs don’t make one right.
I never said I like IP law. I explicitly said it shouldn’t exist. I wish they’d strip out any post-humous ownership, absolutely. But I’m fine beating OpenAI over the head with that or any other law. Whether I advocate for or against copyright law will ultimately have no impact on its existence, so I may as well cheer it on when it’s used to hurt corporations, and condemn it when it’s used to protect corporations over individuals.
That’s a separate bonkers legislation
I’m not talking about the legislation, I’m talking about the mindset, which is very prevalent in the pro-AI tech spaces. Go to HackerNews and see just how hard the AI-bros there will fellate each other over “corporate rights”.
My whole point is that there is nothing logically inconsistent with being against IP law, but also understanding that since its existence is reality, leveraging it as best as possible (i.e. to hurt corporations).
Word.
I’m not so much in favor of IP law as I am in favor of informed consent in every aspect of the word.
when posting photos, art and text content years ago, I was not able to imagine it might be trained off by an AI. As such I was not able to make a decision based on informed consent if I agreed to that or not.
Even though quotes such as “once you post it, its on the internet forever” were around, I was not aware the extend to which this reached and that had my art been vacuumed by a generative AI model (it hasnt luckily) people could create art that pretends to be created by me. Thus I could not consent
I think this goes for a lot of artists actually, especially those who exist far more publicly than I do, who are in those databases and who are a keyword to be used in prompts. There is no possible way they could have given informed consent to that at the time they posted art/at the time they started that social media profile/youtube channel etc.
To me, this is the real problem. I could care less about corporations.
I still think IP needs to eat shit and die. Always has, always will.
I recently found out we could have had 3d printing 20 years earlier but patents stopped that. Cocks !
It’s almost like most people are idiots who don’t understand the thing they’re against and are just parroting what they hear/read.
I’m the detractor here, I couldn’t give less of a shit about anything to do with intellectual property and think all copyright is bad.
Any reasonable person can reach the conclusion that something is wrong here.
What I’m not seeing a lot of acknowledgement of is who really gets hurt by copyright infringement under the current U.S. scheme. (The quote is obviously directed toward the UK, but I’m reasonably certain a similar situation exists there.)
Hint: It’s rarely the creators, who usually get paid once while their work continues to make money for others.
Let’s say the New York Times wins its lawsuit. Do you really think the reporters who wrote the infringed-upon material will be getting royalty checks to be made whole?
This is not OpenAI vs creatives. OK, on a basic level it is, but expecting no one to scrape blogs and forum posts rather goes against the idea of the open internet in the first place. We’ve all learned by now that what goes on the internet stays there, with attribution totally optional unless you have a legal department. What’s novel here is the scale of scraping, but I see some merit to the “transformational” fair-use defense given that the ingested content is not being reposted verbatim.
This is corporations vs corporations. Framing it as millions of people missing out on what they’d have otherwise rightfully gotten is disingenuous.
This isn’t about scraping the internet. The internet is full of crap and the LLMs will add even more crap to it. It will shortly become exponentially harder to find the meaningful content on the internet.
No, this is about dipping into high quality, curated content. OpenAI wants to be able to use all existing human artwork without paying anything for it, and then flood the world with cheap knockoff copies. It’s that simple.
Shortly? It’s happening already. I notice it when using Google and Duckduckgo. There are always a few hits that are AI written blog spam word soup
Unfortunately you haven’t seen the full impact of LLMs yet. What you’re seeing now is stuff that’s already been going on for a decade. SEO content generators have been a thing for many years and used by everybody from small business owners to site chains pinching ad pennies.
When the LLM crap will kick in you won’t see anything except their links. I wouldn’t be surprised if we’ll have to go back to 90s tech and use human-curated webrings and directories.
I wonder how many comments in this thread are ai generated. I wonder how many comments on Lemmy will be in 5 years time.
It’s especially amusing when you consider that it’s not even fully autonomous yet; we’re actively doing this to ourselves.
it’s so baffling to me that some people think this is a clear cut problem of “you stole the work just the same as if you sold a copy without paying me!”
it ain’t the same folks… that’s not how models work… the outcome is unfortunate, for sure, but to just straight out argue that it’s the same is ludicrous… it’s a new problem and ML isn’t going away, so we’re going to have to deal with it as a new problem
Well in that case maybe chat gpt should just fuck off it doesn’t seem to be doing anything particularly useful, and now it’s creator has admitted it doesn’t work without stealing things to feed it. Un fucking believable. Hacks gonna hack I guess.
ChatGPT has been enormously useful to me over the last six months. No idea where you’re getting this notion it isn’t useful.
People pretending it’s not useful and/or not improving all the time are living in their own worlds. I think you can argue the legality and the ethics, but any anti-ai position based on low quality output (“it can’t even do hands!”) has a short shelf-life.
As with many things, the golden rule applies. They who have the gold, make the rules.
…so stop doing it!
This explains what Valve was until recently not so cavalier about AI: They didn’t want to hold the bag on copyright matters outside of their domain.
Then shutdown your goddamn company until you find a better way.
It’s also “impossible” to have multiple terabytes of media on my homeserver without copyright infringement, so piracy is ok, right!?
O no, wait it actually is possible, it’s just more expensive and more work to do it legally (and leaves a lot of plastic trash in form of Blurays and DVDs), just like with AI. But laws are just for poor people, I guess.
Even if it was impossible, would that make it okay?
I stand by my opinion that AI will be the worst thing humans ever created, and that means it ranks just a bit above religion.
This is very likely to be true.
I’d argue the issue is not the AI but capitalism.
AI is good, AI companies are evil.
If it is impossible, either shut down operations or find a way to pay for it.
My concern is they and other tech companies absolutely can and would pay if they have no choice. Paying fines for illegal practices if needs be.
What absolutely wont survive a strong law to keep copyright content out of ai is the open source community which absolutely can not pay for such a thing and would be seriously lacking behind if its excluded, Strengthen the monopoly on ai by for Profit Tech. So basically this issue can have huge ramifications no matter what we end up doing.
My understanding of the open source community is that taking copyrighted content from people who haven’t willingly signed onto the project would kind of undermine the principles of the movement. I would never feel comfortable using open source software if I had knowledge that part or all of it came from people who hadn’t actively chosen to contribute to it.
I have seen a couple of things recently about AI models that were trained exclusively on public domain and creative commons content which apparently are producing viable content, though. The open source community could definitely use a model like that, and develop it further with more content that was ethically obtained. In the long run, there may be artists that willingly contribute to it, especially those who use open source software themselves (eg GIMP, Blender, etc). Paying it forward, kind of thing.
The problem right now is that artists have no reason to be generous with an open source alternative to AIs, when their rights have already been stomped on and certain people in the open source community are basically saying “if we can’t steal from artists too, then we can’t compete with the corporations.” So there’s literally a trust issue between the creative and tech industries that would need to be resolved before any artists would consider offering art to an open source AI.
Its quite a mess but I definitely agree that open source needs a good model trained on consented works.
I do fear though that the quality gap between copyright trained and purist models will be huge in the first decenia. And no matter the law, the tech is out there and corporation and criminals will be using it in secret nonetheless.
If only things where as simple as choosing for the chad digital artists. Digital art was part of my higher education and if i Haden t get a tech job i might have been one of them so i feel torn between the divide in industries.
This may sound doomer but since the technology exist we are in a race to obtain beyond human super intelligence and we do not know what will happen after that.
OpenAI had multiple times stated they don’t know if copyright will still mean anything in a future with ai.
We are also facing some huge global issues like global warming where a super intelligence could be the answer to sustain the planet, of course also risking evil ai in the process… i repeat such a mess
I don’t fully trust sam altman, but i do believe what they say may be true. At some point its going to be here and it will be to smart to ignore.
Its optimistically possible that in 20 years we will all be leisurely artist laughing at the idea of needing to work to earn survival.
Its of course just as likely some statehead old bastard presses the deathbutton next week and thats the end of all of it or that climate has progressed beyond what our smartest future ai could possible solve.
I definitely do not have the optimism that in 20 years time we’ll all be leisurely artists. That would require that the tech bros who create the AIs that displace humans are then sufficiently taxed to pay UBI for all the humans that no longer have jobs - and I don’t see that happening as long as they’re able to convince governments not to tax, regulate, or control them, because doing so will make it impossible for them to save the planet from climate change, even as their servers burn through more electricity (and thus resources) than entire countries. Tech bros aren’t going to save us, and the only reason they claim they will is so they never face any consequences of their behaviour. I don’t trust Sam Altman, or any of his ilk, any further than I can throw them.
That’s is why i am putting some of my eggs in open source, which is where the real innovation happens anyway. Free Ai tools at home running on consumers devices can level people up to build a better future ourselves without having to rely on techbros or government.
Of course i should nuance my wording a bit. My actual opinions tend to be contrasting mix of both optimistic and pessimistic lines of evens. I dont have much hope that the good future is the one we will end on, but it remains in my speculative opinion possible from where we are standing today, yet all can change in less than a week.