Words as Scaffolding, Understanding as Grounding

The Pointer Paradox

Words are pointers, not containers. "Triangle" isn't triangularity—it's a symbol that references triangularity. "Justice" doesn't contain justice; it gestures toward it. Language is indexical all the way down.

And yet humans clearly develop genuine understanding. Mathematicians "see" why proofs work. Physicists develop intuitions about field equations that precede formal derivation. Chess grandmasters perceive board positions as unified wholes with felt qualities—"this position feels dangerous"—rather than collections of pieces.

There's a phenomenological difference between symbolic manipulation and genuine comprehension. How does pointer-following become space-inhabiting?

This is the central question for artificial intelligence. Large language models are spectacular pointer-manipulators—they've learned the statistical structure of symbol-relationships with superhuman precision. But do they understand what they're pointing at? Or are they, as critics suggest, "stochastic parrots" traversing an elaborate web of references that never touches ground?[1]

H₂O ≠ Water

Chemistry tells us water is H₂O—two hydrogen atoms bonded to oxygen. This description lives in chemistry's Platonic space: precise boundaries, unambiguous identity conditions, mathematical relationships.

But "water" doesn't live there. Water lives in phenomenology's Platonic space—the domain of human experience. Water is what you drink, swim in, get caught in. It flows, freezes, evaporates. It can be muddy, salty, refreshing, dangerous.

When does water become mud? At what salinity is it brine? These questions have no crisp answers—and this isn't a defect. Fuzzy boundaries serve navigation in a fuzzy world. Sharp categories would shatter on contact with reality's continuous gradations.

H₂O and water are not identical concepts. They inhabit different Platonic spaces with different structural properties. "Water is H₂O" isn't an identity statement—it's a bridge between spaces, a lossy translation that preserves some structure while discarding other structure.

Intelligence is the ability to build and traverse such bridges: to recognize that structure here maps onto structure there in ways that survive contact with reality.[2]

The Scaffolding Metaphor

If words don't contain concepts, what do they do?

Think of language as scaffolding. Encountering "entropy," you can't absorb it through the word. Instead, the word provides structure you can climb:

"Entropy is disorder" (first approximation)
"Entropy is the number of microstates consistent with a macrostate" (formal)
"Entropy is information we've chosen not to track" (information-theoretic)
"Your room gets messy because messy configurations outnumber tidy ones" (intuition pump)

Each description circles the concept from a different angle. You walk around it in 3D, building a multi-perspectival model. You never touch the concept directly—scaffolding gets you close enough to work with it, predict its behavior, connect it to other concepts.

This echoes Lakoff and Johnson's insight that abstract thought is built from metaphorical extension of embodied experience.[3] We understand "grasping an idea" through physical grasping, "seeing the point" through visual perception. The scaffolding grows from sensorimotor foundations.

Four Layers of Understanding

But scaffolding alone is still just pointers—symbols referencing symbols. How do we break out of the symbolic circle?

We propose four distinct layers, each necessary for genuine understanding:

Layer 1: Pointer Networks

Symbols connected through learned associations. "Dog" links to "animal," "bark," "pet." Language models excel here—they've mapped pointer-relationships across vast corpora with superhuman precision.

But pointer networks alone are circular. Dictionaries are useless for learning a first language: every definition leads to more words. You can traverse indefinitely without touching ground.

Layer 2: Structural Isomorphism

The first bridge to reality. Some pointer networks preserve structure—the relationships between concepts mirror relationships in experienceable domains.

We have no sensorimotor experience of infinity. Yet mathematicians develop genuine intuition about infinite sets. How?

Through structural isomorphism. Infinity maps onto structures we can experience: the feeling of "always one more," nested containment (boxes within boxes), horizons that recede as you approach. We access structures isomorphic to infinity's formal properties, not infinity itself.

Harnad called this the "symbol grounding problem": how symbols acquire meaning beyond their relationships to other symbols.[4] Structural isomorphism is partial grounding—the map begins to resemble the territory.

Language models may have this layer. They reason about novel combinations, suggesting learned structural relationships beyond surface statistics. But whether their structures are genuinely grounded or free-floating remains contested.

Layer 3: Procedural Grounding

Structure alone is still abstract. The breakthrough comes when pointers connect to action.

Procedural grounding: I can do something with this concept, and the world responds. Predict, act, observe consequences, update. The concept anchors in a feedback loop with reality.

A child doesn't learn "hot" from definitions. They reach toward a stove, feel pain, withdraw. "Hot" becomes grounded in sensorimotor loops: perception → prediction → action → consequence → update. Later, metaphorical extensions ("hot take," "hot market") remain tethered to the original anchor.

This is Dreyfus's critique of classical AI: expertise isn't rule-following but embodied skillful coping, built from millions of situated encounters.[5] The chess grandmaster's intuition comes from tens of thousands of games played, not positions studied.

It's also what Palantir's Akshay Krishnaswamy means when he describes frontline workers as "almost telepathic." A grid operator with 20 years of experience has run millions of action → consequence loops. Their pointers aren't free-floating—they're anchored in this grid, these failure modes, these consequences.

Language models have fragments of Layer 3 through RLHF (Reinforcement Learning from Human Feedback). Human preferences proxy for consequences. But it's thin grounding—feedback on text, not actions in the world.

Layer 4: Direct Intuition

The most mysterious layer. Some understanding involves direct phenomenological access to abstract structure.

Mathematicians report "seeing" that a proof works before articulating why. Poincaré described mathematical discovery as sudden illumination after unconscious incubation.[6] The phenomenology feels like perception, not inference.

Is this reducible to Layers 1-3? Perhaps—intuition might be unconscious pattern-matching from accumulated grounded experience. Or perhaps consciousness provides access to Platonic structures we don't yet understand.

We needn't resolve this to observe: humans exhibit something at this layer that current AI systems don't. Whether irreducible or emergent, it plays a functional role.

The Grounding Gap

The distance between pointer-manipulation and genuine understanding is a grounding gap.

This explains persistent puzzles:

Confident hallucination. Without grounding, no feedback signal distinguishes accurate pointers from plausible-sounding ones. The model can't "feel" wrongness—no action → consequence loop to violate.

Brittle physical reasoning. Structural isomorphism without procedural grounding means structures break under novel perturbation. A child who's played with water "knows" how it behaves in ways that transfer. An LLM has learned water-descriptions, not water-dynamics.

Irreplaceable expertise. The Palantir observation: "most intelligence is in the workers." You can encode what the grid operator knows (Layer 1), but not their 20 years of action-consequence loops (Layer 3). The manual captures the scaffolding, not the grounding.

Chatbots and Genies

This explains why chatbot interfaces fail for serious work.

A chatbot is a "genie"—an oracle you query for answers. Each interaction is ephemeral. The genie can't operate on the same data/logic/actions as you, learn from consequences of its suggestions, build shared context over time, or develop procedural grounding in your domain.

Pure Layer 1, maybe some Layer 2. The interface prevents grounding regardless of the underlying model's capability.

Palantir's alternative: embed AI in ongoing decision-making through shared "ontological substrate"—representations both human and AI operate on. The human brings grounding (action → consequence loops). The AI brings scale (process more pointers faster). Neither alone suffices. Together, they navigate spaces neither could alone.

The Path Forward

If understanding requires grounding, and grounding requires action → consequence loops, the path to AI understanding isn't just better models:

Richer feedback loops. Not just RLHF on text—connection to real-world consequences through robotics, simulation, agentic systems that act and observe.

Shared ontological substrate. Representations both humans and AI operate on, enabling human grounding to transfer. Friston's active inference framework suggests organisms minimize surprise through action; AI systems need similar world-engagement.[7]

Temporal depth. Not just training data but ongoing experience. A system operating in a domain for years can develop grounding a freshly deployed model lacks.

Stakes. Errors must matter. Systems that hallucinate without consequence have no pressure to ground representations.

The irony: obsessed with scaling models, we've neglected the scaffolding that lets them touch ground. The intelligence isn't only in the weights. Intelligence emerges when sophisticated pointer-manipulation meets rich procedural grounding.

The workers' knowledge is "almost telepathic" because their pointers are anchored by decades of consequences. That's the untapped iceberg—and what we need to build.

Summary

Words are scaffolding, not containers—they let you approach concepts, never touch directly.
Understanding has four layers: pointer networks, structural isomorphism, procedural grounding, and direct intuition.
LLMs excel at Layer 1, may have Layer 2, have thin Layer 3, and unclear Layer 4.
The grounding gap explains hallucination, brittleness, and irreplaceable expertise.
The path forward: richer feedback, shared substrate, temporal depth, real stakes.

Intelligence isn't in the pointers. It's in how the pointers touch ground.

References

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT '21.
This framing draws on Hofstadter's work on analogy as the core of cognition. See: Hofstadter, D. & Sander, E. (2013). Surfaces and Essences: Analogy as the Fuel and Fire of Thinking.
Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press.
Harnad, S. (1990). The Symbol Grounding Problem. Physica D, 42, 335-346.
Dreyfus, H. (1972). What Computers Can't Do. MIT Press. See also: Dreyfus, H. & Dreyfus, S. (1986). Mind Over Machine.
Poincaré, H. (1908). Science and Method. See esp. "Mathematical Creation."
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.