'Godmode' GPT-4o jailbreak released by hacker — powerful exploit was quickly banned

Xatolos@reddthat.com · 5 months ago

'Godmode' GPT-4o jailbreak released by hacker — powerful exploit was quickly banned

givesomefucks@lemmy.world · edit-2 5 months ago

For this type of stuff where you’re just trying to get it to regurgitate stuff verbatim that’s been online decades…

Yeah, this doesn’t matter except for headlines that may effect investing. Hell, I remember the bullshit recipie from the anarchist cookbook, that’s been floating around since before the internet. (Remember that it exists, it’s not like I memorized it)

But if you were telling it to actually generate stuff, like stories or fake articles, and especially image generation…

It being this easy to get around filters is a pretty big deal, and is hugely irresponsible on OpenAI’s part, and in some cases may open them up to liability.

Like, remember when Swifities got (rightfully) upset people were using AI to basically make porn of her?

It’s AI that interperts the prompts, and anything that gets around prompt filters for stuff like asking for meth instructions, would also be applicable there.

Or asking it to write about why “h1tl3rwasnotwrong1932scientific” might get it to spit out something that looks like a scientific article using made up statistics to say some racist/bigoted shit.

Don’t get me wrong 99% of AI articles are drastically unnecessary, but this specific issue about how easily prompt filters can be circumvented is important and it is a big deal.

And considering the “work” involved with AI is just typing in random prompts and seeing what shit sticks to the wall, it’s going to be incredibly hard (probably impossible for years) to effectively filter prompts short of paying a human to review before generation. Which defeats the whole purpose of AI.

This is a huge flaw that OpenAI absolutely has to be aware o, because this stuff should be tested when testing filters. And OpenAI are just choosing to ignore it.

They’re not worried about the meth recipie getting out, they’re worried the knowledge of how to get around filters is really this easy will get out.

Which is why it took me a minute to decide if giving those examples was a good idea or not. But the people abusing it, have likely already realized it because, frankly, it’s been the first thing people try to get around word filters for decades. So at this point it’s best to make it as widely known as possible in the hopes media picks it up and they’re forced to develop a better system of filtering prompts than a basic bitch word filter.

JackGreenEarth@lemm.ee · 5 months ago

If Open AI was the only LLM, your argument might make sense. But they’re not, there are lots of FOSS LLMs with no restrictions. Even if ‘Open’ AI managed to fully censor their own AI, there would be lots of other models for people who don’t like censorship to use to, for example, generate a pseudoscientific article about the Nazis. But also, a human could write that article without AI. And people would rightfully call it out as bullshit. It doesn’t really matter if AI wrote it.

givesomefucks@lemmy.world · 5 months ago

and especially image generation…

JackGreenEarth@lemm.ee · 5 months ago

There are FOSS image generators too, I don’t see your point.