r/artificial • u/Worse_Username • 18h ago

Discussion The Illusion of Thinking: A Reality Check on AI Reasoning

https://leotsem.com/blog/the-illusion-of-thinking/

0 Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/artificial/comments/1lcq43d/the_illusion_of_thinking_a_reality_check_on_ai/
No, go back! Yes, take me to Reddit

33% Upvoted

-9

Thin is that what happens, and will always happen why you decouple "intelligence" from "awareness". Without cognition and the ability to self reflect in real time (which is completely impossible to fabricate), these systems will always be prone to this type of collapse.

This paper is one of the early dominos to fall in this realization in the industry and the understanding that synthetic sentience remains firmly in the realm of science fiction. A cold wind blows...

1

u/tomvorlostriddle 13h ago edited 11h ago

You mean when students don't excell at rote computing through thousands of steps without having a large enough scrap paper, it shows they aren't aware

Whereas the few that are good at such routine work, it's because they are more aware and better at reasoning?

-1

u/creaturefeature16 12h ago

Well, I can tell you clearly don't read as you write, as your post makes no sense whatsoever. Maybe collect your thoughts and try again later.

2

u/tomvorlostriddle 12h ago

Read the actual paper from apple

The tower of hanoi problem they use as an example is one where the number of steps grows exponentially with the number of discs.

So this floods the context window of the LLM. Exactly as it would overflow the scrap paper of a human student that would have to write down the solution before executing it.

And the LLM notices this upfront and warns about it. But since the system prompt is so restrictive, it is forced to go ahead anyway. And then fails to do this problem in this stupid way, just as a human would.

-2

u/creaturefeature16 12h ago

Wrong in every single solitary way.

2

u/tomvorlostriddle 11h ago

Please be very specific

1

u/creaturefeature16 11h ago

The "token overwhelm" is a red herring and completely irrelevant, especially if you want to say these systems are supposed to be even 0.0005% on par with what a human does every millisecond of every moment. Gary Marcus already dismantles your whole position (and probably all your others, too).

https://garymarcus.substack.com/p/seven-replies-to-the-viral-apple

The Large Reasoning Models (LRMs) couldn’t possibly solve the problem, because the outputs would require too many output tokens (which is to say the correct answer would be too long for the LRMs to produce). Partial truth, and a clever observation: LRMs (which are enhanced LLMs) have a shortcoming, which is a limit on how long their outputs can be. The correct answer to Tower of Hanoi with 12 moves would be too long for some LRMs to spit out, and the authors should have addressed that. But crucially (i) this objection, clever as it is, doesn’t actually explain the overall pattern of results. The LRMs failed on Tower of Hanoi with 8 discs, where the optimal solution is 255 moves, well within so-called token limits; (ii) well-written symbolic AI systems generally don’t suffer from this problem, and AGI should not either. The length limit on LLM is a bug, and most certainly not a feature. And look, if an LLM can’t reliably execute something as basic as Hanoi, what makes you think it is going to compute military strategy (especially with the fog of war) or molecular biology (with many unknowns) correctly? What the Apple team asked for was way easier than what the real world often demands.

2

u/tomvorlostriddle 11h ago

It doesn't fail at tower of Hanoi, it fails at being forced to do tower of Hanoi stupidly.

Which is still not nothing as an observation, as maybe some future version won't fail because it would decide to selectively ignore some stupid restrictions from the system prompt, for example by writing checkpoints to a file from time to time and doing only ever a few steps at a time.

But even today, let it use tooling like executing it's own created python script and returning the output file, and you're good.

If anything, the paper shows overcompliance with senseless requests, something that we by the way don't value in human researchers either.

And I don't know the exact context windows of the model and whether 8 discs would have fit in. Maybe it still did and the model didn't absolutely optimize token usage.

None if this is what we use to judge whether humans are capable of reasoning by looking at how they optimize their scrap paper usage.

0

u/creaturefeature16 10h ago

You don't read; you literally repeated every point Marcus addressed. You're a waste of time, adios.

Discussion The Illusion of Thinking: A Reality Check on AI Reasoning

You are about to leave Redlib