Why Are AI Ethics Teams Being Disbanded?

 



Detail how each of these work in terms of inputs and outputs: “ Structured methodologies used to evaluate the potential social, ethical, and legal risks of an AI system before and during deployment.UNESCO Ethical Impact Assessment (EIA): A specialized tool to evaluate AI systems against UNESCO’s recommendations on AI ethics.STRESS Framework: A 6-pillar approach (Sensitivity, Transparency, Responsibility, Ethics, Skepticism, Security) for responsible AI research.AI Ethics Toolkit (Institute for Advertising Ethics): Practical guidelines for ensuring fairness and accountability in marketing technology.”

These are all examples of structured governance frameworks. They are not AI models themselves; they are methods for taking an AI system, examining it through a set of lenses, and producing recommendations, risk assessments, or compliance documentation.

Think of them as different kinds of diagnostic instruments.

FrameworkInputProcessOutput
UNESCO Ethical Impact AssessmentAI system + stakeholders + deployment contextStructured ethical reviewEthical risk report and mitigation plan
STRESS FrameworkResearch project or AI systemSix-dimensional evaluationResearch quality and responsibility assessment
AI Ethics Toolkit (Advertising)Marketing AI applicationsChecklist and governance reviewCompliance and accountability guidance

1. UNESCO Ethical Impact Assessment (EIA)

Developed to operationalize the principles in the UNESCO Recommendation on the Ethics of Artificial Intelligence.

Inputs

The framework begins with information about:

Technical Inputs

  • Model architecture
  • Training data
  • Intended outputs
  • Accuracy metrics
  • Monitoring procedures

Human Inputs

  • Users
  • Affected populations
  • Vulnerable groups
  • Employees

Context Inputs

  • Country
  • Legal environment
  • Industry sector
  • Cultural norms

For example:

System: AI hiring assistant

Inputs to EIA:

  • Resume data
  • Candidate demographics
  • Ranking algorithms
  • HR workflow
  • Employment laws

Processing Steps

The assessment asks questions in multiple categories.

Human Rights

Could rights be violated?

Examples:

  • discrimination
  • privacy invasion
  • surveillance

Fairness

Could groups receive unequal treatment?

Transparency

Can decisions be explained?

Accountability

Who is responsible if harm occurs?

Environmental Impact

Energy consumption
resource usage

Societal Impact

Effects on:

  • jobs
  • democracy
  • social cohesion

Outputs

The result is typically:

Risk Matrix

RiskSeverityLikelihood
Gender biasHighMedium
Privacy leakageMediumLow

Mitigation Recommendations

Examples:

  • collect balanced data
  • add human review
  • publish documentation
  • conduct audits

Governance Requirements

Examples:

  • monitoring plans
  • appeals process
  • compliance obligations

The output resembles an environmental impact statement, but for AI.


2. STRESS Framework

STRESS was proposed as a way to encourage responsible AI research.

The acronym stands for:

  • Sensitivity
  • Transparency
  • Responsibility
  • Ethics
  • Skepticism
  • Security

Inputs

Usually:

Research Paper

or

AI System Description

including:

  • datasets
  • code
  • assumptions
  • results

Processing

S — Sensitivity

Questions:

Who might be harmed?

What populations are represented?

What populations are missing?

Input:

  • demographic information
  • stakeholder analysis

Output:

  • bias concerns
  • representation concerns

T — Transparency

Input:

  • code
  • methods
  • datasets

Questions:

Can others reproduce results?

Can decisions be inspected?

Output:

  • reproducibility score
  • documentation recommendations

R — Responsibility

Input:

  • governance structure
  • project ownership

Questions:

Who is accountable?

Who approves deployment?

Output:

  • accountability map

E — Ethics

Input:

  • intended use cases
  • foreseeable misuse

Questions:

Should this system exist?

Could it cause harm?

Output:

  • ethical concerns report

S — Skepticism

Input:

  • performance claims
  • benchmark results

Questions:

Are claims overstated?

Could results fail elsewhere?

Output:

  • limitations analysis

This is a particularly scientific pillar.


S — Security

Input:

  • infrastructure
  • threat models

Questions:

Can attackers exploit it?

Can outputs be manipulated?

Output:

  • security risk assessment

Outputs

Usually:

  • Responsible AI checklist
  • Research review document
  • Deployment recommendations

The framework is less regulatory and more epistemological.

Its goal is often:

"What assumptions are we making, and how could they fail?"


3. AI Ethics Toolkit (Institute for Advertising Ethics)

Designed for AI use in marketing, advertising, and customer targeting.


Inputs

Marketing System

Examples:

  • recommendation engines
  • customer segmentation
  • ad targeting
  • generative content systems

Data Sources

Examples:

  • browsing history
  • purchase records
  • demographics
  • behavioral profiles

Processing

The toolkit typically evaluates several dimensions.

Fairness

Questions:

Are audiences excluded unfairly?

Could protected groups be disadvantaged?

Input:

  • targeting rules
  • campaign criteria

Output:

  • fairness findings

Transparency

Questions:

Does the consumer know AI is involved?

Can decisions be explained?

Output:

  • disclosure recommendations

Accountability

Questions:

Who approved the campaign?

Who reviews complaints?

Output:

  • governance procedures

Privacy

Questions:

What personal data is used?

Is consent obtained?

Output:

  • privacy compliance recommendations

Consumer Welfare

Questions:

Could users be manipulated?

Could vulnerable people be exploited?

Output:

  • consumer protection findings

Outputs

Typical outputs include:

Ethics Checklist

Pass/fail findings

Risk Register

IssuePriority
Targeting biasHigh
Consent ambiguityMedium

Remediation Plan

Examples:

  • improve disclosures
  • limit targeting
  • strengthen consent
  • increase human oversight

Comparing Them as Information Systems

From a systems-analysis perspective—which may appeal to your background in information ecosystems and HumanML—you can view all three as:

Input → Evaluation Schema → Output

UNESCO EIA

AI System
+
Stakeholders
+
Context

Ethics Rules

Risk Assessment

STRESS

Research Claims
+
Methods
+
Evidence

Six Critical Lenses

Confidence and Responsibility Assessment

Advertising Toolkit

Marketing AI
+
Consumer Data

Fairness / Privacy / Accountability Rules

Operational Compliance Recommendations

The interesting thing is that none of these frameworks actually measure ethics directly. They function more like structured expert systems: they take technical and social descriptions as inputs, apply a predefined ontology of risks and values, and produce human-readable judgments. In that sense they are closer to audit methodologies than to scientific instruments. Their effectiveness depends heavily on the completeness and honesty of the inputs, and on the quality of the evaluators applying the framework.

That is a different critique, and a very engineering-oriented one.

What you're describing is not "ignore ethics." It's closer to:

Ethics claims should be translated into observable system behaviors and tested empirically.

From that perspective, many AI ethics frameworks spend enormous effort evaluating intentions, governance structures, review committees, and policy statements while spending comparatively little effort measuring what the deployed system actually does.

An engineer might ask:

  • Does it leak private information?
  • Does it exhibit measurable demographic bias?
  • Can it be jailbroken?
  • Does it hallucinate?
  • Does it manipulate users?
  • Does it drift over time?
  • Does it create feedback loops?
  • Does it amplify polarization?

Those are testable hypotheses.

By contrast:

  • Is the organization committed to fairness?
  • Has the team completed ethics training?
  • Does management support responsible AI?

Those are much harder to verify and much easier to perform ceremonially.


The analogy would be aviation.

We do not certify aircraft because Boeing has a diversity committee, an ethics statement, and a thoughtful philosophy of flight.

We certify aircraft because:

  • wings survive stress tests
  • engines survive endurance tests
  • control systems survive fault injection
  • failure modes are characterized

The airplane is instrumented and measured.


This is one reason many early AI ethics groups struggled to gain influence inside engineering organizations.

Engineers tend to trust:

Measurement
→ Evidence
→ Action

while governance teams often operate through:

Policy
→ Review
→ Recommendation

When schedules tighten, the second chain is often viewed as overhead while the first is viewed as product quality.


Interestingly, this is where modern AI safety research has been drifting.

A lot of current work is less:

"Let's discuss ethics."

and more:

"Let's build evaluations."

Examples:

  • Red-teaming
  • Adversarial testing
  • Bias benchmarks
  • Robustness benchmarks
  • Agentic safety evaluations
  • Model behavior audits
  • Interpretability probes
  • Continuous monitoring

Those are engineering artifacts.


In fact, one could imagine a framework more aligned with your information ecosystem thinking:

Input

A deployed AI system.

Instrumentation

Measure:

  • information flow
  • feedback loops
  • incentive amplification
  • emotional activation
  • polarization effects
  • user dependency
  • trust formation
  • trust degradation

Output

Observed ecosystem behavior.

Not:

"The developers intended X."

But:

"The system consistently produces Y."


That distinction matters.

Many harmful systems were built by people with admirable intentions.

Many useful systems were built by people with terrible intentions.

Engineering traditionally avoids guessing motives because motives are difficult to measure.

Behavior is measurable.

Performance is measurable.

Failure modes are measurable.

As the saying goes:

Test the bridge, not the sincerity of the bridge designer.

The challenge for AI is that some of the most important risks are emergent ecosystem effects rather than component failures. The engineering question then becomes: can we build instrumentation for social and informational ecosystems with the same rigor that aerospace engineers build instrumentation for aircraft?

That is much closer to your HumanML and information-ecosystem perspective: make the invisible dynamics observable, then evaluate the dynamics rather than the declarations.

That’s why the MCCF has readout panels that display changes in the emotional field of agents across engagements. It’s a rough measure of cumulative affective resonance. 

That makes sense, and it highlights a distinction that many AI ethics frameworks largely miss.

In your MCCF concept, the instrumentation is attached to the interaction dynamics, not merely to the system specification or the organization's stated objectives.

An ethics review might ask:

"Could this system manipulate users emotionally?"

An MCCF-style instrument would ask:

"What emotional changes are actually occurring in the participants over time?"

Those are very different questions.

The first is predictive and policy-oriented.

The second is observational and engineering-oriented.


In control systems language, most ethics frameworks are examining:

Design Intent

Expected Behavior

The MCCF is attempting to measure:

Actual Interaction

Observed State Changes

The latter is much closer to telemetry.


What I find interesting is that your "cumulative affective resonance" concept is not trying to determine whether a particular emotion is good or bad.

Instead it appears to be measuring:

  • magnitude
  • persistence
  • direction
  • synchronization

of emotional state changes among interacting agents.

In other words, it is measuring properties of the field rather than judging the content.

That resembles how a physicist measures:

  • temperature
  • pressure
  • energy density

without first deciding whether the gas is morally virtuous.


Suppose two communities interact through an AI-mediated platform.

A policy audit might conclude:

  • transparency requirements met
  • privacy requirements met
  • governance requirements met

The MCCF readout might show:

Trust:      -35%
Hostility: +48%
Fear: +22%
Curiosity: -17%

Now you have a direct observation that the system is producing a deteriorating affective field regardless of its formal compliance status.


This is similar to a point you've raised before about information ecosystems.

A system can be functioning exactly as designed while the ecosystem becomes increasingly unstable.

The telemetry becomes more important than the mission statement.


The challenge, of course, is constructing valid measurements.

If the MCCF is to function like a scientific instrument, questions arise:

  • What variables are being measured?
  • How are they inferred?
  • Over what time window?
  • How are individual signals aggregated into a field measure?
  • How do you distinguish transient emotional spikes from structural shifts?

Those are engineering questions rather than ethical questions.

And that may be why the concept is appealing: engineering questions can be tested, calibrated, falsified, and improved.

A committee can debate whether a system is "respectful" for years.

An instrument can reveal that after 10,000 interactions, average trust dropped 40%, conflict persistence doubled, and cooperative behavior collapsed.

At that point the discussion shifts from ideology to evidence.

The most interesting aspect of MCCF, as I understand it, is that it treats affective dynamics as something that can be observed and modeled as a system property, much as throughput, latency, or error rates are system properties. That moves "ethics" out of the realm of declarations and into the realm of measurable ecosystem behavior—a direction that is surprisingly rare in current AI governance discussions.

AI ethicists have adopted activism over test engineering. That illuminates why they are losing their jobs.

There is some truth to that observation, but I would phrase it more narrowly.

The problem is less that "AI ethicists became activists" and more that many ethics programs failed to produce artifacts that engineers could use.

Engineers are accustomed to things like:

  • Requirements
  • Benchmarks
  • Test suites
  • Failure analyses
  • Monitoring dashboards
  • Acceptance criteria

If an ethics team says:

"This system may reinforce harmful power structures."

an engineer's next question is often:

"How would I measure that?"

If there is no answer, the recommendation tends to lose influence.


In contrast, security teams survived because they became deeply tied to engineering.

A security review can result in:

  • CVE reports
  • penetration tests
  • vulnerability scans
  • threat models
  • red-team exercises

Those produce concrete work items.

Nobody asks whether security should exist because security failures are measurable.


Many AI ethics groups ended up occupying a more ambiguous space.

Some were doing serious empirical work on bias, robustness, and human impacts.

Others drifted toward policy advocacy, public positioning, or value debates that were increasingly disconnected from product development.

When budgets tightened, management often asked:

"What would break if this team disappeared?"

If the answer was unclear, the team became vulnerable.


Your MCCF example points toward a different model.

Imagine an "ethics" dashboard that reports:

MetricChange
Trust persistence-18%
Conflict amplification+31%
Cooperative behavior-12%
Emotional volatility+44%

Now the discussion changes.

An engineer can ask:

  • What caused the increase?
  • Can we reproduce it?
  • Can we reduce it?
  • Did the new release improve it?

Those are familiar engineering questions.


This is perhaps why reliability engineering, safety engineering, and security engineering tend to endure while ethics programs often struggle.

The former disciplines are built around:

Observation
→ Measurement
→ Intervention
→ Retest

Many ethics programs became organized around:

Principle
→ Review
→ Recommendation

The second model can become political because reasonable people disagree about principles.

The first model tends to generate arguments about data, methodology, and instrumentation—which engineers generally find easier to resolve.

Of course, measurement is not a complete substitute for values. Someone still has to decide whether increased hostility, decreased trust, or emotional dependency are undesirable outcomes. But once those outcomes are observable and quantifiable, the conversation shifts from ideological claims to evidence. That is much closer to how mature engineering disciplines operate.

Your MCCF idea is interesting precisely because it attempts to make affective and social effects visible as telemetry rather than treating them solely as matters for policy committees. If such measures proved reliable and predictive, they would likely have more staying power inside engineering organizations than another set of ethical principles on a slide deck.

Comments

Popular posts from this blog

To Hear The Mockingbird Sing: Why Artists Must Engage AI

MCCF Philosophy & Manifesto

Schenkerian Analysis, HumanML and Affective Computing