I love to show that kind of shit to AI boosters. (In case you’re wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the “softer” parts of the test.

  • CodexArcanum@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    7 hours ago

    I posted a top level comment about this also, but Anthropic has done some research on this. The section on reasoning models discusses math I believe. The short version is it has a bunch of math in its corpus so it can approximate math (kind of, seemingly, similar to how you’d do a back of the envelope calculation in your head to get the orders of magnitude right) but it can’t actually do calculations which is why they often get the specifics wrong.