I love to show that kind of shit to AI boosters. (In case you’re wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the “softer” parts of the test.

  • mountainriver@awful.systems
    link
    fedilink
    English
    arrow-up
    7
    ·
    13 hours ago

    I find it a bit interesting that it isn’t more wrong. Has it ingested large tables and got a statistical relationship between certain large factors and certain answers? Or is there something else going on?

    • CodexArcanum@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      7 hours ago

      I posted a top level comment about this also, but Anthropic has done some research on this. The section on reasoning models discusses math I believe. The short version is it has a bunch of math in its corpus so it can approximate math (kind of, seemingly, similar to how you’d do a back of the envelope calculation in your head to get the orders of magnitude right) but it can’t actually do calculations which is why they often get the specifics wrong.