AI crawler blocker Anubis gets deployed by the United Nations

BlueMonday1984@awful.systems · 2 months ago

AI crawler blocker Anubis gets deployed by the United Nations

BarrierWithAshes@fedia.io · 2 months ago

Damn these guys are moving up fast. Seems like yesterday they got picked up by GNOME now they’re supporting the UN?!

helvetpuli@sopuli.xyz · 2 months ago

UNESCO has a bit more of an open culture than some of the other specialized agencies. It can be difficult to get the others to adopt their innovations though.

Goretantath@lemm.ee · 2 months ago

Aww cmon, why isnt it the maze one, now everyones going to be using more energy to browse the web…

YourNetworkIsHaunted@awful.systems · 2 months ago

I guess UNESCO, like all right-thinking people, really like the anime animalgirl mascots and give preference to any product that has one

randomblock1@lemmy.world · edit-2 2 months ago

deleted by creator

rook@awful.systems · edit-2 2 months ago

Are you talking about anubis? Because you’re very clearly wrong.

And now I think about it, regardless of which approach you were talking about, that’s some impressive arrogance to assume that everyone involved other than you was a complete idiot.

Eta:

Ahh, looking at your post history, I see you misunderstand why scrapers use a common user agent, and are confused about what a general increase in cost-per-page means to people who do bulk scraping.

randomblock1@lemmy.world · edit-2 2 months ago

deleted by creator

rook@awful.systems · 2 months ago

Bruh, when I said “you misunderstand why scrapers use a common user agent” I didn’t require further proof.

Requests following an obvious bulk scraper pattern with user agents that almost certainly aren’t regular humans are trivially easy to handle using decades old techniques, which is why scrapers will not start using curl user agents.

I’m not saying it won’t block some scrapers

See, the thing is with blocking ai scraping, you can actually see it work by looking at the logs. I’m guessing you don’t run any sites that get much traffic or you’d be able to see this too. Its efficacy is obvious.

Sure scrapers could start keeping extra state or brute forcing hashes, but at the scale they’re working at that becomes painfully expensive and the effort required to raise the challenge difficulty is minimal if it becomes apparent that scrapers are getting through. Which will be very obvious if it happens.

once it’s in a training set, all additional protection is just wasted energy.

Presumably you haven’t had much experience with ai scrapers. They’re not a “one run and done” type thing, especially for sites with frequently changing content, like this one.

I don’t want to seem rude, but you appear to be speaking from a position of considerable ignorance, dismissing the work of people who actually have skin in the game and have demonstrated effective techniques for dealing with a problem. Maybe a little more research on the issue would help.

froztbyte@awful.systems · 2 months ago

(all of these comments by rook are correct)

froztbyte@awful.systems · 2 months ago

I see we have another primo technology understander driveby, so enlightened

bitofhope@awful.systems · 2 months ago

Why do people think this appplies only to Firefox? Is it because it’s checking for “Mozilla” in the UA string? Might wanna check what their own browser uses (I don’t care what browser you have, it probably has “Mozilla” in the user agent string)

AI crawler blocker Anubis gets deployed by the United Nations

AI crawler blocker Anubis gets deployed by the United Nations

Anubis works