I love to show that kind of shit to AI boosters. (In case you’re wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the “softer” parts of the test.

  • ShakingMyHead@awful.systems
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    23 hours ago

    I don’t believe that tortured phrases like “code interpreter” and a “direct calculator” actually came from the internet.

    Code Interpreter was the name for the thing that ChatGPT used to run python code.

    So, yeah, still taken from the internet.

    • diz@awful.systemsOP
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      14 hours ago

      Hmm, fair point, it could be training data contamination / model collapse.

      It’s curious that it is a lot better at converting free form requests for accuracy, into assurances that it used a tool, than into actually using a tool.

      And when it uses a tool, it has a bunch of fixed form tokens in the log. It’s a much more difficult language processing task to assure me that it used a tool conditionally on my free form, indirect implication that the result needs to be accurate, than to assure me it used a tool conditionally on actual tool use.

      The human equivalent to this is “pathological lying”, not “bullshitting”. I think a good term for this is “lying sack of shit”, with the “sack of shit” specifying that “lying” makes no claim of any internal motivations or the like.

      edit: also, testing it on 2.5 flash, it is quite curious: https://g.co/gemini/share/ea3f8b67370d . I did that sort of query several times and it follows the same pattern: it doesn’t use a calculator, it assures me the result is accurate, if asked again it uses a calculator, if asked if the numbers are equal it says they are not, if asked which one is correct it picks the last one and argues that the last one actually used a calculator. I hadn’t ever managed to get it to output a correct result and then follow up with an incorrect result.

      edit: If i use the wording of “use an external calculator”, it gives a correct result, and then I can’t get it to produce an incorrect result to see if it just picks the last result as correct, or not.

      I think this is lying without scare quotes, because it is a product of Google putting a lot more effort into trying to exploit Eliza effect to convince you that it is intelligent, than into actually making an useful tool. It, of course, doesn’t have any intent, but Google and its employees do.

    • TonyTonyChopper@mander.xyz
      link
      fedilink
      English
      arrow-up
      3
      ·
      15 hours ago

      Math is really easy to do in Python. So if it did have access to a Python interpreter it could write one line, print(number*number) to calculate something. And the answer would be correct.

      • zbyte64@awful.systems
        link
        fedilink
        English
        arrow-up
        4
        ·
        14 hours ago

        That is actually harder than what it has to do ATM to get the answer: write an RPC with JSON. It only needs to do two things: decide to use the calculator tool and paste the right tokens into the call.