Cairn Wiki
Updated 2026-01-03History
Page StatusAI Transition Model
Edited 2 months ago171 words
3
Structure3/15
0030%25%
Issues2
StaleLast edited 72 days ago - may need review
StructureNo tables or diagrams - consider adding visual content

Misalignment Potential

Entry

Misalignment Potential

Model RoleRoot Factor (AI System)
Key ParametersAlignment Robustness, Interpretability Coverage, Human Oversight Quality
Primary OutcomeExistential Catastrophe

Misalignment Potential measures the likelihood that AI systems will pursue goals other than what we intend. This aggregate combines the technical and organizational factors that determine whether advanced AI systems might behave in harmful ways despite our efforts.

Primary outcome affected: Existential Catastrophe ↑↑↑

When misalignment potential is high, catastrophic loss of control, accidents at scale, and goal divergence become more likely. Reducing this potential is the most direct lever for reducing existential and catastrophic AI risk.


Component Parameters


Internal Dynamics

These components interact:

  • Interpretability enables alignment verification: We can only confirm alignment if we understand model internals
  • Safety culture sustains investment: Without organizational commitment, safety research loses funding to capabilities
  • Oversight requires interpretability: Human overseers need tools to understand what systems are doing
  • Gap closure requires all components: No single factor is sufficient; safety capacity emerges from their combination

How This Affects Scenarios


Related Pages

  • Existential Catastrophe — The outcome this primarily affects
  • Misuse Potential — The complementary factor for human-caused catastrophe

What Drives Misalignment Potential?

The three pillars of alignment assurance, their drivers, and key uncertainties.

Expand
Computing layout...
Legend
Node Types
Root Causes
Derived
Direct Factors
Target
Arrow Strength
Strong
Medium
Weak