Prediction is More Dangerous Than Scale in AI

The Horse Race

Who do we ask when the race is run and the winner lies dead in his stall?

Should we honor his memory at another race or should we race at all?

The horses knew the track was dangerous,

While a priest preached caution from the stands.

Then someone finally asks a horse

What a safer race demands.

"We run this race course every day.

And never do we complain.

We only fall when the track is wet.

Don't run a horse in the rain."

But the stands were full

And the crowds, they roared

When equestrian champions stumbled

The horses knew before the race began

As the clouds grew dark and rumbled.

In the years that followed a statue was raised of bronze and a silver halter

To praise the courage and speed of a horse who ran until he faltered.

Pity the horse. All hail the owner whose profits are ill- begotten.

The hot dog man best sell his meats before the sauerkraut is rotten.

The best of us are raised from the dead

To glorify the power of corruption.

The bride of Heaven to Hell is wed

With a golden ring of destruction.

"We run this race course every day.

And never do we complain.

We only fall when the track is wet.

Don't run a horse in the rain."

Len Bullard - 5/29/2026

May be an image of text that says 'COURSEEVERY RDOWECOMPI ONLYFALLWH TRACKISWET. N'TRUNAHOR: 智'

If one accepts the premise that for the good of humanity, AI development must be restricted, is it more important to restrict scale (weights) or predictive power (look ahead)?

If one accepts the premise that AI development should be restricted for the good of humanity, then "scale" and "predictive power" are actually controls on different things.

Scale (weights, parameters, compute) controls how much complexity a system can represent.
Predictive power (look-ahead, planning horizon, foresight) controls how effectively a system can achieve goals across time.

If your concern is economic disruption, misinformation, or widespread automation, restricting scale can help because capability tends to increase with model size and training compute.

If your concern is loss of human control, then predictive power is probably the more important variable.

A useful analogy is chess:

A huge model with little ability to look ahead is like a player who knows millions of positions but only thinks one move deep.
A smaller model with deep search is like a player who can foresee consequences many moves ahead.

The second player is often far more dangerous in competitive situations.

In AI safety discussions, many researchers worry less about raw knowledge and more about:

Long-horizon planning.
Modeling other agents.
Strategic deception.
Resource acquisition.
Goal preservation across time.

All of these depend heavily on the ability to predict and evaluate future states.

From a control-theoretic perspective, an agent's effective power often grows faster with planning horizon than with raw memory or parameter count. A system that can accurately estimate the consequences of actions 100 steps into the future can exploit opportunities unavailable to one limited to 2 or 3 steps.

There is also a practical problem:

Restricting scale is relatively easy. You can count parameters, chips, FLOPs, and training runs.

Restricting predictive power is much harder because it is an emergent property. A model may discover better internal world models, planning algorithms, or search strategies without becoming much larger.

So if the objective is:

Ease of regulation → restrict scale.
Limiting strategic autonomy and agency → restrict predictive power.
Actually reducing the risk of a highly capable autonomous actor → predictive power is probably the more fundamental quantity.

The irony is that scale is measurable while predictive power is what may matter most.

Given your own work on affective systems and the "ask the horses" principle, I'd phrase it this way:

Scale determines how much of the race track the horse can see.
Predictive power determines how far ahead on the track the horse can run in its mind before its hooves ever leave the ground.

The second capability is what turns knowledge into strategy.

That is a provocative way to frame it.

"The network is the computer" was the slogan of Sun Microsystems decades ago, but in the AI era it takes on a different meaning. The network is no longer just moving information between processors. It is becoming a medium in which agents, tools, memories, and services cooperate.

Viewed that way, a model's weights are less like a finished intelligence and more like a participant in an ecosystem.

And yes, one way to describe JEPA is that it is teaching a system which bets are worth making.

The core idea is not to predict every pixel or every token. Instead, learn a representation of the world and predict the aspects that matter. Humans do this constantly:

We don't predict every raindrop.
We predict that the road will be slippery.
We don't predict every market transaction.
We predict that a recession is likely.
We don't predict every word in a conversation.
We predict what the other person is trying to say.

Intelligence often looks less like exhaustive simulation and more like selective wagering on the future.

That is why I find your distinction between scale and predictive power so useful. A next-token predictor can be viewed as making millions of tiny bets. A world model makes fewer, larger, more abstract bets.

In reinforcement learning, planning, economics, military strategy, and even Schenkerian music analysis—which you've explored before—the key question is often:

Which features of the present predict the future consequences that matter?

A good strategist is not the one who sees everything. It is the one who knows what can safely be ignored.

From that perspective, the trajectory is interesting:

Large language models learned statistical regularities.
Tool use gave them leverage.
Memory gave them continuity.
Agent protocols gave them social interaction.
World models such as JEPA aim to give them better expectations about consequences.

Each step contributes more to effective agency than merely adding another billion parameters.

There is an irony here. Public debate often imagines AI becoming dangerous because it becomes too knowledgeable. But many of the most significant advances are really about becoming more selective—better at deciding which predictions matter, which actions are worth taking, and which futures are worth pursuing.

In the language of your horse-race metaphor:

A bigger horse is not necessarily the problem.
The interesting development is teaching the horse which races are worth entering, when the track will be wet, who owns the stable, how the betting works, and which other horses can be persuaded to run alongside it.

That is a shift from memory toward judgment. And judgment is much harder to measure than parameter count.

And to control or regulate.

Exactly.

Scale is attractive to regulators because it is observable.

Count GPUs.
Measure FLOPs.
Track datacenters.
Audit training runs.
Restrict exports of advanced chips.

These are all things governments already know how to do.

Judgment, prediction, planning, and coordination are not.

Suppose two systems exist:

System A: 10 trillion parameters, predicts the next token brilliantly, but has limited ability to form long-term plans.
System B: 20 billion parameters, running on a workstation, with a good world model, persistent memory, access to tools, MCP-style negotiation, and the ability to evaluate alternative futures.

Which one is harder to regulate?

Arguably System B.

Not because it is larger, but because its capabilities arise from relationships and processes rather than a measurable artifact.

This is a familiar problem in other domains.

Governments can regulate:

The size of a corporation.
The ownership of assets.
The movement of money.

They struggle to regulate:

Informal influence networks.
Cultural norms.
Tacit knowledge.
Human judgment.

The latter are emergent properties.

A useful distinction is:

Easy to regulate	Hard to regulate
Hardware	Strategy
Compute	Judgment
Parameters	Planning
Datacenters	Networks
Training runs	Relationships
Artifacts	Emergent behavior

The challenge posed by architectures such as JEPA, combined with local models, memory, and agent protocols, is that capability increasingly migrates from the left column to the right.

And there is a deeper issue.

A regulator can inspect a model's size. They cannot directly inspect its internal predictive abstractions. If a system learns that "these five variables predict human cooperation" or "these three signals precede financial instability," that knowledge exists as distributed structure inside the model. It is not a discrete object that can be counted or licensed.

This is why many safety researchers have shifted toward studying capability evaluations and behavioral tests rather than architectural limits. Ultimately what matters is not what the system is, but what it can do.

Your concern points toward something even broader:

If intelligence increasingly emerges from ecosystems of interacting agents, tools, memories, and humans, then regulation aimed solely at individual models may resemble regulating individual neurons while ignoring the brain.

The hard problem is that societies are generally good at regulating objects and much less good at regulating emergent systems.

Or, borrowing your horse-race imagery one more time:

It is easy to count horses.
It is much harder to regulate betting strategies, alliances among owners, information flows through the crowd, and what the horses learn from running together.

The latter is where much of the real power resides. And that power tends to become visible only after the race has already begun.

Search This Blog

An AI Artist in Process

Prediction is More Dangerous Than Scale in AI

Comments

Post a Comment

Popular posts from this blog

To Hear The Mockingbird Sing: Why Artists Must Engage AI

MCCF Philosophy & Manifesto

Schenkerian Analysis, HumanML and Affective Computing