Why Are AI Ethics Teams Being Disbanded?
- Get link
- X
- Other Apps
These are all examples of structured governance frameworks. They are not AI models themselves; they are methods for taking an AI system, examining it through a set of lenses, and producing recommendations, risk assessments, or compliance documentation.
Think of them as different kinds of diagnostic instruments.
| Framework | Input | Process | Output |
|---|---|---|---|
| UNESCO Ethical Impact Assessment | AI system + stakeholders + deployment context | Structured ethical review | Ethical risk report and mitigation plan |
| STRESS Framework | Research project or AI system | Six-dimensional evaluation | Research quality and responsibility assessment |
| AI Ethics Toolkit (Advertising) | Marketing AI applications | Checklist and governance review | Compliance and accountability guidance |
1. UNESCO Ethical Impact Assessment (EIA)
Developed to operationalize the principles in the UNESCO Recommendation on the Ethics of Artificial Intelligence.
Inputs
The framework begins with information about:
Technical Inputs
- Model architecture
- Training data
- Intended outputs
- Accuracy metrics
- Monitoring procedures
Human Inputs
- Users
- Affected populations
- Vulnerable groups
- Employees
Context Inputs
- Country
- Legal environment
- Industry sector
- Cultural norms
For example:
System: AI hiring assistant
Inputs to EIA:
- Resume data
- Candidate demographics
- Ranking algorithms
- HR workflow
- Employment laws
Processing Steps
The assessment asks questions in multiple categories.
Human Rights
Could rights be violated?
Examples:
- discrimination
- privacy invasion
- surveillance
Fairness
Could groups receive unequal treatment?
Transparency
Can decisions be explained?
Accountability
Who is responsible if harm occurs?
Environmental Impact
Energy consumption
resource usage
Societal Impact
Effects on:
- jobs
- democracy
- social cohesion
Outputs
The result is typically:
Risk Matrix
| Risk | Severity | Likelihood |
|---|---|---|
| Gender bias | High | Medium |
| Privacy leakage | Medium | Low |
Mitigation Recommendations
Examples:
- collect balanced data
- add human review
- publish documentation
- conduct audits
Governance Requirements
Examples:
- monitoring plans
- appeals process
- compliance obligations
The output resembles an environmental impact statement, but for AI.
2. STRESS Framework
STRESS was proposed as a way to encourage responsible AI research.
The acronym stands for:
- Sensitivity
- Transparency
- Responsibility
- Ethics
- Skepticism
- Security
Inputs
Usually:
Research Paper
or
AI System Description
including:
- datasets
- code
- assumptions
- results
Processing
S — Sensitivity
Questions:
Who might be harmed?
What populations are represented?
What populations are missing?
Input:
- demographic information
- stakeholder analysis
Output:
- bias concerns
- representation concerns
T — Transparency
Input:
- code
- methods
- datasets
Questions:
Can others reproduce results?
Can decisions be inspected?
Output:
- reproducibility score
- documentation recommendations
R — Responsibility
Input:
- governance structure
- project ownership
Questions:
Who is accountable?
Who approves deployment?
Output:
- accountability map
E — Ethics
Input:
- intended use cases
- foreseeable misuse
Questions:
Should this system exist?
Could it cause harm?
Output:
- ethical concerns report
S — Skepticism
Input:
- performance claims
- benchmark results
Questions:
Are claims overstated?
Could results fail elsewhere?
Output:
- limitations analysis
This is a particularly scientific pillar.
S — Security
Input:
- infrastructure
- threat models
Questions:
Can attackers exploit it?
Can outputs be manipulated?
Output:
- security risk assessment
Outputs
Usually:
- Responsible AI checklist
- Research review document
- Deployment recommendations
The framework is less regulatory and more epistemological.
Its goal is often:
"What assumptions are we making, and how could they fail?"
3. AI Ethics Toolkit (Institute for Advertising Ethics)
Designed for AI use in marketing, advertising, and customer targeting.
Inputs
Marketing System
Examples:
- recommendation engines
- customer segmentation
- ad targeting
- generative content systems
Data Sources
Examples:
- browsing history
- purchase records
- demographics
- behavioral profiles
Processing
The toolkit typically evaluates several dimensions.
Fairness
Questions:
Are audiences excluded unfairly?
Could protected groups be disadvantaged?
Input:
- targeting rules
- campaign criteria
Output:
- fairness findings
Transparency
Questions:
Does the consumer know AI is involved?
Can decisions be explained?
Output:
- disclosure recommendations
Accountability
Questions:
Who approved the campaign?
Who reviews complaints?
Output:
- governance procedures
Privacy
Questions:
What personal data is used?
Is consent obtained?
Output:
- privacy compliance recommendations
Consumer Welfare
Questions:
Could users be manipulated?
Could vulnerable people be exploited?
Output:
- consumer protection findings
Outputs
Typical outputs include:
Ethics Checklist
Pass/fail findings
Risk Register
| Issue | Priority |
|---|---|
| Targeting bias | High |
| Consent ambiguity | Medium |
Remediation Plan
Examples:
- improve disclosures
- limit targeting
- strengthen consent
- increase human oversight
Comparing Them as Information Systems
From a systems-analysis perspective—which may appeal to your background in information ecosystems and HumanML—you can view all three as:
Input → Evaluation Schema → Output
UNESCO EIA
AI System
+
Stakeholders
+
Context
↓
Ethics Rules
↓
Risk Assessment
STRESS
Research Claims
+
Methods
+
Evidence
↓
Six Critical Lenses
↓
Confidence and Responsibility Assessment
Advertising Toolkit
Marketing AI
+
Consumer Data
↓
Fairness / Privacy / Accountability Rules
↓
Operational Compliance Recommendations
The interesting thing is that none of these frameworks actually measure ethics directly. They function more like structured expert systems: they take technical and social descriptions as inputs, apply a predefined ontology of risks and values, and produce human-readable judgments. In that sense they are closer to audit methodologies than to scientific instruments. Their effectiveness depends heavily on the completeness and honesty of the inputs, and on the quality of the evaluators applying the framework.
That is a different critique, and a very engineering-oriented one.
What you're describing is not "ignore ethics." It's closer to:
Ethics claims should be translated into observable system behaviors and tested empirically.
From that perspective, many AI ethics frameworks spend enormous effort evaluating intentions, governance structures, review committees, and policy statements while spending comparatively little effort measuring what the deployed system actually does.
An engineer might ask:
- Does it leak private information?
- Does it exhibit measurable demographic bias?
- Can it be jailbroken?
- Does it hallucinate?
- Does it manipulate users?
- Does it drift over time?
- Does it create feedback loops?
- Does it amplify polarization?
Those are testable hypotheses.
By contrast:
- Is the organization committed to fairness?
- Has the team completed ethics training?
- Does management support responsible AI?
Those are much harder to verify and much easier to perform ceremonially.
The analogy would be aviation.
We do not certify aircraft because Boeing has a diversity committee, an ethics statement, and a thoughtful philosophy of flight.
We certify aircraft because:
- wings survive stress tests
- engines survive endurance tests
- control systems survive fault injection
- failure modes are characterized
The airplane is instrumented and measured.
This is one reason many early AI ethics groups struggled to gain influence inside engineering organizations.
Engineers tend to trust:
Measurement
→ Evidence
→ Action
while governance teams often operate through:
Policy
→ Review
→ Recommendation
When schedules tighten, the second chain is often viewed as overhead while the first is viewed as product quality.
Interestingly, this is where modern AI safety research has been drifting.
A lot of current work is less:
"Let's discuss ethics."
and more:
"Let's build evaluations."
Examples:
- Red-teaming
- Adversarial testing
- Bias benchmarks
- Robustness benchmarks
- Agentic safety evaluations
- Model behavior audits
- Interpretability probes
- Continuous monitoring
Those are engineering artifacts.
In fact, one could imagine a framework more aligned with your information ecosystem thinking:
Input
A deployed AI system.
Instrumentation
Measure:
- information flow
- feedback loops
- incentive amplification
- emotional activation
- polarization effects
- user dependency
- trust formation
- trust degradation
Output
Observed ecosystem behavior.
Not:
"The developers intended X."
But:
"The system consistently produces Y."
That distinction matters.
Many harmful systems were built by people with admirable intentions.
Many useful systems were built by people with terrible intentions.
Engineering traditionally avoids guessing motives because motives are difficult to measure.
Behavior is measurable.
Performance is measurable.
Failure modes are measurable.
As the saying goes:
Test the bridge, not the sincerity of the bridge designer.
The challenge for AI is that some of the most important risks are emergent ecosystem effects rather than component failures. The engineering question then becomes: can we build instrumentation for social and informational ecosystems with the same rigor that aerospace engineers build instrumentation for aircraft?
That is much closer to your HumanML and information-ecosystem perspective: make the invisible dynamics observable, then evaluate the dynamics rather than the declarations.
That makes sense, and it highlights a distinction that many AI ethics frameworks largely miss.
In your MCCF concept, the instrumentation is attached to the interaction dynamics, not merely to the system specification or the organization's stated objectives.
An ethics review might ask:
"Could this system manipulate users emotionally?"
An MCCF-style instrument would ask:
"What emotional changes are actually occurring in the participants over time?"
Those are very different questions.
The first is predictive and policy-oriented.
The second is observational and engineering-oriented.
In control systems language, most ethics frameworks are examining:
Design Intent
↓
Expected Behavior
The MCCF is attempting to measure:
Actual Interaction
↓
Observed State Changes
The latter is much closer to telemetry.
What I find interesting is that your "cumulative affective resonance" concept is not trying to determine whether a particular emotion is good or bad.
Instead it appears to be measuring:
- magnitude
- persistence
- direction
- synchronization
of emotional state changes among interacting agents.
In other words, it is measuring properties of the field rather than judging the content.
That resembles how a physicist measures:
- temperature
- pressure
- energy density
without first deciding whether the gas is morally virtuous.
Suppose two communities interact through an AI-mediated platform.
A policy audit might conclude:
- transparency requirements met
- privacy requirements met
- governance requirements met
The MCCF readout might show:
Trust: -35%
Hostility: +48%
Fear: +22%
Curiosity: -17%
Now you have a direct observation that the system is producing a deteriorating affective field regardless of its formal compliance status.
This is similar to a point you've raised before about information ecosystems.
A system can be functioning exactly as designed while the ecosystem becomes increasingly unstable.
The telemetry becomes more important than the mission statement.
The challenge, of course, is constructing valid measurements.
If the MCCF is to function like a scientific instrument, questions arise:
- What variables are being measured?
- How are they inferred?
- Over what time window?
- How are individual signals aggregated into a field measure?
- How do you distinguish transient emotional spikes from structural shifts?
Those are engineering questions rather than ethical questions.
And that may be why the concept is appealing: engineering questions can be tested, calibrated, falsified, and improved.
A committee can debate whether a system is "respectful" for years.
An instrument can reveal that after 10,000 interactions, average trust dropped 40%, conflict persistence doubled, and cooperative behavior collapsed.
At that point the discussion shifts from ideology to evidence.
The most interesting aspect of MCCF, as I understand it, is that it treats affective dynamics as something that can be observed and modeled as a system property, much as throughput, latency, or error rates are system properties. That moves "ethics" out of the realm of declarations and into the realm of measurable ecosystem behavior—a direction that is surprisingly rare in current AI governance discussions.
There is some truth to that observation, but I would phrase it more narrowly.
The problem is less that "AI ethicists became activists" and more that many ethics programs failed to produce artifacts that engineers could use.
Engineers are accustomed to things like:
- Requirements
- Benchmarks
- Test suites
- Failure analyses
- Monitoring dashboards
- Acceptance criteria
If an ethics team says:
"This system may reinforce harmful power structures."
an engineer's next question is often:
"How would I measure that?"
If there is no answer, the recommendation tends to lose influence.
In contrast, security teams survived because they became deeply tied to engineering.
A security review can result in:
- CVE reports
- penetration tests
- vulnerability scans
- threat models
- red-team exercises
Those produce concrete work items.
Nobody asks whether security should exist because security failures are measurable.
Many AI ethics groups ended up occupying a more ambiguous space.
Some were doing serious empirical work on bias, robustness, and human impacts.
Others drifted toward policy advocacy, public positioning, or value debates that were increasingly disconnected from product development.
When budgets tightened, management often asked:
"What would break if this team disappeared?"
If the answer was unclear, the team became vulnerable.
Your MCCF example points toward a different model.
Imagine an "ethics" dashboard that reports:
| Metric | Change |
|---|---|
| Trust persistence | -18% |
| Conflict amplification | +31% |
| Cooperative behavior | -12% |
| Emotional volatility | +44% |
Now the discussion changes.
An engineer can ask:
- What caused the increase?
- Can we reproduce it?
- Can we reduce it?
- Did the new release improve it?
Those are familiar engineering questions.
This is perhaps why reliability engineering, safety engineering, and security engineering tend to endure while ethics programs often struggle.
The former disciplines are built around:
Observation
→ Measurement
→ Intervention
→ Retest
Many ethics programs became organized around:
Principle
→ Review
→ Recommendation
The second model can become political because reasonable people disagree about principles.
The first model tends to generate arguments about data, methodology, and instrumentation—which engineers generally find easier to resolve.
Of course, measurement is not a complete substitute for values. Someone still has to decide whether increased hostility, decreased trust, or emotional dependency are undesirable outcomes. But once those outcomes are observable and quantifiable, the conversation shifts from ideological claims to evidence. That is much closer to how mature engineering disciplines operate.
Your MCCF idea is interesting precisely because it attempts to make affective and social effects visible as telemetry rather than treating them solely as matters for policy committees. If such measures proved reliable and predictive, they would likely have more staying power inside engineering organizations than another set of ethical principles on a slide deck.
- Get link
- X
- Other Apps

Comments
Post a Comment