LLMs: No World Model? Hacker News Debates AI Limits

Table of Contents

The State of Play: What LLMs Can Do

Large Language Models can write code, pass exams, and even create art, but do they truly understand the world they describe? A recent Hacker News discussion dives deep into this question, debating whether LLMs are sophisticated mimics or possess genuine ‘world models’. The consensus is that while the term ‘world model’ might be a stretch, current models demonstrate capabilities that are forcing a re-evaluation of their internal workings.

Specialized Excellence: Top-tier models are achieving remarkable success in narrow, complex domains. One user points out, “with LLMs getting gold on the mathematical Olympiad they clearly have a pretty good world model of mathematics.” This suggests that within specific contexts, their internal representations are highly effective.
Practical Application: Developers are successfully using LLMs for nuanced, real-world tasks that require multi-step reasoning. As one commenter shared, “I am literally using LLMs every day to write image-processing and computer vision code using OpenCV. It seamlessly reasons across a range of concepts like color spaces, resolution… and human perception.”
Emergent Internal Representations: Research indicates that even smaller models develop internal structures that mirror real-world concepts. One user cited a study showing how a small transformer, trained only on chess games, developed a clear internal map of the board.

…a very small GPT-2 alike transformer… after seeing just PGN games and nothing more develops an 8×8 internal representation with which chess piece is where.
Rise of Multimodality: The conversation is rapidly moving beyond text-only models. The integration of vision and other data types is creating more robust systems, making the debate over pure LLMs a potential straw man.

The Core Debate: Where Models Fall Short

Despite their impressive performance, significant arguments suggest LLMs are fundamentally limited. The complications arise not just from technical constraints but from philosophical questions about the nature of language and understanding.

The Map vs. The Territory: A central criticism is that language is inherently symbolic. An LLM manipulates symbols related to ‘water’ but has no direct experience of it. This gap between representation and reality is a core hurdle.

Language models aren’t world models for the same reason languages aren’t world models… The map is not the territory, the description is not the described, you can’t get wet in the word “water”.
Sophisticated Mimicry: Some argue that LLMs operate like a student who can provide correct answers without true comprehension. They excel at producing plausible ‘word salad’ based on patterns, which is useful for experts but can mislead non-experts.

Sometimes, I would answer a question correctly without actually understanding what I was saying… This is what I suspect LLMs do.
Compressed and Imperfect Models: By necessity, an LLM’s internal model is a massively compressed version of the data it was trained on. This inherent compression means the model will always be imperfect and incomplete, just as human memory is.
The Hallucination Problem: A key difference between human and machine intelligence is the ability to distinguish imagination from perception. LLMs currently lack a reliable internal mechanism for this, leading to confident but incorrect outputs.

Building a Better Brain: The Path Forward

The discussion doesn’t end at limitations. Instead, it points toward a future where LLMs are not standalone intelligences but crucial components of larger, more capable systems.

LLMs as System Components: The most promising approach is to view LLMs not as the entire AI, but as a reasoning or language module within a broader architecture.

I don’t think it’s an LLMs job to have a world model, but an LLM is just one part of an AI system.
Externalizing Representations: Humans use tools like notebooks and chessboards to offload cognitive tasks. Future AI systems can do the same, using external memory or ‘notebooks’ that are fed back into the context window to maintain a coherent state.
Learning to Detect Hallucinations: The next frontier may not be a new architecture but teaching models to develop skills for self-correction. Just as children learn to separate fantasy from reality, AI could be trained to develop similar filters.

What LLMs need is to learn some tricks to detect hallucinations. Probably they will not get 100% reliable detector, but to get to the level of humans they don’t need 100% reliability.
Embracing Multimodality: The future of robust world models lies in combining language with other data streams like video and sensor input. Models like Google’s Genie 3, which can generate interactive environments from images, show the potential of this approach.

LLMs aren’t world models

The State of Play: What LLMs Can Do

The Core Debate: Where Models Fall Short

Building a Better Brain: The Path Forward

You might also like:

Museums vs. Screens: The Fight for Real Exhibits

Apple’s Thin iPhone: What Do Users Really Want?

Developer Hardware: Why Fast CPUs Boost Productivity