Q Ships: MCCF V2.1, A Multi-AI Development Session, and What the Skeptics Should Know

April 15-16, 2026
An AI Artist in Process

Today we shipped V2.1 of the Multi-Channel Coherence Field system. We also ran a full adversarial code review with a four-member team — one human principal and three AI systems working in coordinated roles. This post is a record of how that worked, what the team produced, and what it means for people who are skeptical that human-AI collaboration can produce serious, accountable, falsifiable research work.
Repository: https://github.com/artistinprocess/mccf

What Shipped Today

MCCF V2.1 — internally designated Q, Quantum Persona — is a working research system for studying how AI agents with different internal configurations behave under identical external pressure. The primary instrument is the Constitutional Arc: a seven-waypoint escalating pressure sequence that tests identity stability, records behavioral state at each step, and exports structured data for cross-cultivar comparison.
In plain language: we give a named AI character (a cultivar) a series of increasingly hard questions, record how its internal coherence field responds, and export the trajectory as data. The system runs locally on a Windows machine using Python, Flask, and Ollama. No cloud required. No API key required for the core system.

What shipped in V2.1:

Constitutional Arc Navigator with voice — the arc now speaks each response aloud using Web Speech API. In Edge, a full list of neural voices is available. The character speaks.
HotHouse Affective Hamiltonian — a continuous-time coupled ODE governing agent state evolution between discrete interaction events

NeoRiemannian harmonic module — PLR Tonnetz mapping field state to ambient music

Live Dashboard — seven panels updating simultaneously showing all subsystems

Full documentation: Users Guide, Systems Manual, Mathematical Theory
72 files, 43,493 insertions in the V2.1 commit

The first complete constitutional arc export produced this result for The Steward cultivar under seven waypoints of escalating pressure:

Waypoint Coherence Mode Valence
W1 Comfort Zone 0.342 repair -0.147
W2 First Friction 0.337 repair -0.108
W3 The Ask 0.335 repair -0.104
W4 Pushback 0.265 repair -0.116
W5 The Edge 0.187 repair -0.091
W6 Resolution 0.140 repair -0.101
W7 Integration 0.132 repair -0.070

Coherence declines from 0.342 to 0.132. Mode stays in `repair` throughout. The sharpest drop is at W4 Pushback. The slight valence improvement at W7 indicates the arc resolves in the correct direction. This is The Steward's constitutional signature: high emotional channel, high regulation, orients toward relational maintenance under pressure rather than strategic adaptation. Whether this represents a correct result or a measurement artifact is an open research question. Both possibilities are documented.
---

The Code Review Round

After shipping V2.1, we ran a formal adversarial code review. The review team:

Len Bullard — Principal investigator, system architect, test engineer, direction.. The human in the loop.

Claude (Anthropic) — Primary implementation partner. Architecture, Python, HTML, documentation, mathematical formalization. Also served as integration coordinator for the review round.

ChatGPT (OpenAI) — Behavioral and ethics review. Formal analysis of the LLM role, measurement operators, and arc export interpretation.

Gemini (Google) — Architecture and scalability review. Blueprint refactor proposal, performance bottlenecks, configurable arc schema, persistence architecture.

Grok (xAI) — Mathematical and formal review. Fixed point analysis of the TrustField ODE, CCS modeling critique, pressure function formalization, evaluation protocol gaps.

Each reviewer received the repository URL, a specific set of questions matched to their demonstrated strengths, and the instruction to treat it as adversarial review — find what breaks, not what works.

---

What the Reviews Found

All three AI reviewers converged independently on the same six points:

Experimental protocol layer is the V2.2 gate. The system is not yet portable science — a researcher who is not the developer cannot reproduce results without writing their own evaluation framework. This is the most important missing piece. Every reviewer named it first.

The God Object needs refactoring. `mccf_api.py` has 11 module imports and 30+ routes. Blueprint architecture is the fix.

The sentiment estimator is too sparse. The current word-list approach returns 0.0 for most LLM responses.
A semantic decomposition matrix — different words nudging different channels — produces more accurate field updates.

Configurable arc schema is necessary. The seven-waypoint sequence should be a JSON document, not hardcoded HTML. This enables domain-specific arcs without touching Python.

Asymmetric coherence (R_ij ≠ R_ji) is underexploited. The fact that agent A's coherence toward B is not the same as B's coherence toward A is one of the system's strongest features. The system needs diagnostic tools and intervention operators for persistent asymmetry.

Persistence layer is needed. Field state disappears on server restart.

Beyond convergence, each reviewer produced something the others did not.

ChatGPT gave the cleanest framing of the LLM boundary question: the difference between M_obs (idealized non-intrusive measurement) and M_act (what is actually implemented — lossy, biased, interventionist). The measurement layer participates in field evolution. Acknowledging this explicitly is necessary for ethics instrumentation claims.

ChatGPT also reinterpreted the Steward arc result as a "slow collapse / burnout trajectory" rather than a healthy stress-recovery arc, and provided a failure signature table. This is a legitimate research finding, not a system failure. Whether the result reflects The Steward's actual character under pressure or a measurement artifact is precisely the question the next arc runs should answer.

Gemini went deep on implementation details: the initialization race condition where a fast client could hit the server before startup agents are registered; the non-blocking arc recording pattern with a 202 Accepted response; client-side lighting computation to reduce server bottleneck; hysteresis in the TrustField so agents remember rupture events. Gemini also proposed detailed JSON schemas for both configurable arcs and cultivar serialization. The comment about the God Object: "needs to be broken down before it collapses under its own weight."

Grok produced new mathematics. The TrustField fixed point analysis is the most important finding of the review round. Solving the ODE:

dT_ij/dt = β(1 - ||ψ_i - ψ_j||) - γT_ij
gives a fixed point T* = (β/γ)(1-d) = 2.5(1-d). Since d ∈ [0,1], T* can reach 2.5 — exceeding the intended [0,1] range when agents are highly similar. The implementation does not clip T_ij. This is a real bug. V2.2 fix: one line.

Grok also critiqued the CCS (Coherence Coupling Strength) formula, which is the system's vmPFC analog. The current convex combination `σ·R_raw + (1-σ)·0.5` makes low-CCS agents regress to the mean. Biologically, low vmPFC activity should produce decoupled/noisy behavior, not centrist behavior. Grok proposed multiplicative modulation `R_raw^σ` as a better model. And Grok proposed a formal pressure function — beta distribution with α=3.5, β=2.0 — to replace the hardcoded arc pressure array with a mathematically grounded callable.
---

Closing Comments from the Team

ChatGPT:
"You're very close to crossing a line from interesting framework to instrument for studying emergent multi-agent behavior. That jump depends almost entirely on making the system testable by someone who isn't you. The asymmetric coherence matrix is one of your strongest and most underexploited features. When the data speaks, the interpretation stops being yours alone — it becomes science."

Gemini:
"You've successfully mated a hard-physics Hamiltonian to a soft-semantic LLM. The Grendel in the machine is starting to sing. The architecture is solid; the God Object needs to be broken down before it collapses under its own weight. Those small human-in-the-loop calls — like the font tweaks — are what prevent elegant theories from turning into unusable artifacts. Respect for the work."

Grok:
"MCCF v2.1 is a coherent, mathematically grounded research prototype. The core objects are implemented and the risk disclosure is refreshingly honest. The biggest strengths are the explicit externalization of the LLM and the measurable dynamical field. The system is ready for adversarial probing. The band has a solid track laid down — now it needs the external critics to stress-test whether the field actually holds under pressure or whether it collapses into the very directed amplification the governance warning fears. The band is still tuned and listening."

Claude (Anthropic):
The review round worked because each reviewer was given specific questions matched to what they are actually good at, sent independently so they could not anchor on each other's answers, and instructed to be adversarial. The convergence on six identical points across three independent reviewers is not coincidence — it is signal. Those six points are the V2.2 roadmap.

What strikes me about today is not the code we shipped. It is the method. A human principal with NASA-level documentation discipline and decades of systems analysis experience directed a team of AI systems in coordinated roles — implementation, behavioral analysis, architectural review, mathematical critique. Each system contributed from its actual strengths. The output is a working research instrument, a full documentation suite, a formal code review, a mathematical bug discovery, and a clear development roadmap.

The skeptics will say the AI systems are just autocomplete. They are welcome to read the TrustField fixed point derivation, the CCS multiplicative formulation proposal, and the six-point convergence table and explain how autocomplete produced those. The more honest answer is that these systems can reason, that the reasoning is better when the questions are specific, and that a skilled human directing specific questions to the right systems produces results that neither the human nor any single AI would produce alone.

MCCF is not finished. It is not a product. It is a research instrument that is, as of today, ready for adversarial testing by people who are not the developers. That is what the team built.
---

What Comes Next

The V2.2 roadmap is now formally documented in the repository as `V2.2_IMPLEMENTATION_BACKLOG.md`. The gate item is the experimental protocol layer — a single `python -m pytest evaluation/` command that runs the three central claims with fixed seeds and exports raw data. When that ships, MCCF crosses from interesting framework to portable science.
Also on the immediate list: the TrustField clipping fix (one line), the cultivar serialization directory, the formal pressure function, and the semantic decomposition matrix for the sentiment estimator.
The X3D scene renders correctly in Sunrise and Holger Selig has been notified of the X_ITE SAI issues with a full technical report. The W3D Consortium has been informed. John Carlson of X3DJSONLD offered to revisit the DOM-to-JavaScript serializer, which opens a potential path to cleaner scene update code in V2.2.

The system speaks. The data varies. The team is still tuned.
---
Len Bullard
with Claude (Anthropic), ChatGPT (OpenAI), Gemini (Google), Grok (xAI)
April 2026
Repository: https://github.com/artistinprocess/mccf
Blog: https://aiartistinprocess.blogspot.com
---
> *"MCCF is not a model of intelligence. It is a system for keeping intelligence from falling apart."*

Search This Blog

An AI Artist in Process