So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

Like this one:

You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

Unsurprisingly, it does not:

https://g.co/gemini/share/a79dc80c5c6c

https://g.co/gemini/share/59b024d0908b

The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

But it outputs quasi reasoning to pretend that it is actually solving the problem live.

  • SGforce@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 days ago

    It’s just overtrained on the puzzle such that it mostly ignores your prompt. Changing a few words out doesn’t change that it recognises the puzzle. Try writing it out in ASCII or uploading an image with it written or some other weird way that it hasn’t been specifically trained on and I bet it actually performs better.

    • diz@awful.systemsOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      7 days ago

      Not really. Here’s the chain-of-word-vomit that led to the answers:

      https://pastebin.com/HQUExXkX

      Note that in “its impossible” answer it correctly echoes that you can take one other item with you, and does not bring the duck back (while the old overfitted gpt4 obsessively brought items back), while in the duck + 3 vegetables variant, it has a correct answer in the wordvomit, but not being an AI enthusiast it can’t actually choose the correct answer (a problem shared with the monkeys on typewriters).

      I’d say it clearly isn’t ignoring the prompt or differences from the original river crossings. It just can’t actually reason, and the problem requires a modicum of reasoning, much as unloading groceries from a car does.

    • froztbyte@awful.systems
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 days ago

      oh look it’s a loadbearing “just” in the wild. better hope you can shore that fucker up with some facts

      Try writing it out in ASCII

      my poster in christ, what in the fuck are you on about. stop prompting LLMs and go learn some things instead

      some other weird way that it hasn’t been specifically trained on and I bet it actually performs better

      “no no see, you just need to prompt it different. just prompt it different bro it’ll work bro I swear bro”

      god, every fucking time