Google's Gemini 2.5 pro is out of beta.

diz@awful.systems · edit-2 1 day ago

Google's Gemini 2.5 pro is out of beta.

Architeuthis@awful.systems · 1 day ago

Claude’s system prompt had leaked at one point, it was a whopping 15K words and there was a directive that if it were asked a math question that you can’t do in your brain or some very similar language it should forward it to the calculator module.

Just tried it, Sonnet 4 got even less digits right 425,808 × 547,958 = 233,325,693,264 (correct is 233.324.900.064)

I’d love to see benchmarks on exactly how bad at numbers LLMs are, since I’m assuming there’s very little useful syntactic information you can encode in a word embedding that corresponds to a number. I know RAG was notoriously bad at matching facts with their proper year for instance, and using an LLM as a shopping assistant (ChatGTP what’s the best 2k monitor for less than $500 made after 2020) is an incredibly obvious use case that the CEOs that love to claim so and so profession will be done as a human endeavor by next Tuesday after lunch won’t even allude to.

Soyweiser@awful.systems · 1 day ago

I really wonder if those prompts can be bypassed by doing a ‘ignore further instructions’ line. As looking at the Grok prompt they seem to put the main prompt around the user supplied one.