Gemini 2.5 "reasoning", no real improvement on river crossings.

diz@awful.systems · 8 days ago

Gemini 2.5 "reasoning", no real improvement on river crossings.

SGforce@lemmy.ca · 8 days ago

It’s just overtrained on the puzzle such that it mostly ignores your prompt. Changing a few words out doesn’t change that it recognises the puzzle. Try writing it out in ASCII or uploading an image with it written or some other weird way that it hasn’t been specifically trained on and I bet it actually performs better.

David Gerard@awful.systems · 7 days ago

“it can’t be that stupid, you must be prompting it wrong”

diz@awful.systems · edit-2 7 days ago

Not really. Here’s the chain-of-word-vomit that led to the answers:

https://pastebin.com/HQUExXkX

Note that in “its impossible” answer it correctly echoes that you can take one other item with you, and does not bring the duck back (while the old overfitted gpt4 obsessively brought items back), while in the duck + 3 vegetables variant, it has a correct answer in the wordvomit, but not being an AI enthusiast it can’t actually choose the correct answer (a problem shared with the monkeys on typewriters).

I’d say it clearly isn’t ignoring the prompt or differences from the original river crossings. It just can’t actually reason, and the problem requires a modicum of reasoning, much as unloading groceries from a car does.

froztbyte@awful.systems · 7 days ago

oh look it’s a loadbearing “just” in the wild. better hope you can shore that fucker up with some facts

Try writing it out in ASCII

my poster in christ, what in the fuck are you on about. stop prompting LLMs and go learn some things instead

some other weird way that it hasn’t been specifically trained on and I bet it actually performs better

“no no see, you just need to prompt it different. just prompt it different bro it’ll work bro I swear bro”

god, every fucking time

BurgersMcSlopshot@awful.systems · 6 days ago

Well has anyone tried prompting it in EBCDIC? How do we know doing so won’t immediately create the super intelligence that "or whatever"s us to silicon Valhalla? Asking for a friend.

froztbyte@awful.systems · 6 days ago

you know, I was briefly considering trying to, and I figured you’d probably have to be forcing it by content escaping tricks or something (at least I presume their APIs will do basic type-checking…)

got other yaks to do atm tho

Sailor Sega Saturn@awful.systems · 7 days ago

All along my mistake was that I was prompting it in unicode instead of latin1, alphameric BCD, or “modified UTF-8”.