Page StatusContent

Edited 6 weeks ago4.4k words3 backlinks

QualityGood

ImportanceHigh

Structure12/15

141533%31%

Summary

Comprehensive tracking of 10 lab behavior metrics finds concerning trends: 53% average compliance with voluntary commitments, evaluation timelines compressed from months to days at OpenAI, 25+ senior safety departures in 2024, and open-source capability gap narrowing from 16 to 3 months. First ASL-3 activation (Claude Opus 4) represents the only publicly confirmed capability threshold crossing.

Lab Behavior & Industry

Entry

Lab Behavior

80,000 Hours

Importance72

ai-transition-model-parameters

4.4k words · 3 backlinks

Quick Assessment

Dimension	Assessment	Evidence
Overall Compliance	Mixed (53% average)	August 2025 study of 16 companies found significant variation; OpenAI scored 83%, average was 53%
Evaluation Timeline Trend	Declining	OpenAI reduced testing from months to days for some models; FT reports "weeks" compressed to "days"
Safety Team Retention	Concerning	25+ senior departures from OpenAI in 2024; Superalignment team dissolved
Transparency	Inadequate	Google Gemini 2.5 Pro released without model card; OpenAI GPT-4.1 released without technical safety report
Open-Source Gap	Narrowing	Gap reduced from 16 months to approximately 3 months in 2025; DeepSeek R1 achieved near-parity
External Red Teaming	Standard but Limited	750+ researchers engaged via HackerOne; 15-30 day engagement windows may be insufficient
Whistleblower Protection	Underdeveloped	Only OpenAI has published full policy (after media pressure); California SB 53 protections start 2026

Key Links

Source	Link
Official Website	mygreenlab.org
Wikipedia	en.wikipedia.org

Overview

This page tracks measurable indicators of AI laboratory behavior, safety practices, and industry transparency. These metrics help assess whether leading AI companies are following responsible development practices and honoring their public commitments.

Understanding lab behavior is critical because corporate practices directly influence AI safety outcomes. Even the best technical safety research is insufficient if labs are racing to deploy systems without adequate testing, suppressing internal safety concerns, or failing to disclose dangerous capabilities.

Lab Behavior Dynamics

Loading diagram...

1. Voluntary Commitment Compliance Rate

Current Status (2025): Mixed compliance, with significant variation across companies and commitment types.

Key Findings

A comprehensive study from August 2025↗ examining companies' adherence to their White House voluntary AI commitments found significant variation:

Cohort	Companies	Mean Compliance	Range
First (July 2023)	Amazon, Anthropic, Google, Inflection, Meta, Microsoft, OpenAI	69.0%	50-83%
Second (Sept 2023)	Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, Stability AI	44.6%	25-65%
Third (July 2024)	Apple	Not fully assessed	N/A
Overall Average	16 companies	53%	17-83%

Compliance by Commitment Type

Commitment Area	Average Compliance	Companies at 0%	Best Performer
Model weight security	17%	11 of 16 (69%)	Anthropic (75%)
Third-party reporting	34.4%	8 of 16 (50%)	OpenAI (100%)
Red teaming	62%	3 of 16 (19%)	OpenAI (100%)
Watermarking	48%	6 of 16 (38%)	Google (85%)
Safety research sharing	71%	2 of 16 (13%)	Multiple (100%)

Related Commitments

In May 2024, 16 AI companies joined the Frontier AI Safety Commitments, pledging to develop Responsible Scaling Policies (RSPs) by February 2025. Many companies did publish RSPs, though the quality and specificity varies significantly.

Concerning Developments

In April 2025, OpenAI removed a provision from its Preparedness Framework without noting the change in the changelog, raising transparency concerns.

Data Quality: Based on detailed public rubric scoring of disclosed company behavior. Limitations include reliance on public disclosure and potential for selective transparency.

2. RSP Capability Threshold Crossings

Current Status (2025): First ASL-3 activation announced for Claude Opus 4. No other publicly confirmed threshold crossings, though evaluation methodologies remain contested.

Anthropic's RSP Framework

Anthropic pioneered the Responsible Scaling Policy↗ approach in September 2023, with the policy now at Version 2.2 (effective May 14, 2025):

Version	Effective Date	Key Changes
1.0	September 19, 2023	Initial framework with ASL levels
2.0	October 15, 2024	Shifted to qualitative thresholds; safety case methodology
2.1	March 31, 2025	Clarified thresholds beyond ASL-3
2.2	May 14, 2025	Amended insider threat scope for ASL-3 Security Standard

ASL-3 Activation (2025)

Anthropic activated ASL-3 protections↗ for Claude Opus 4, representing the first publicly confirmed capability threshold crossing:

ASL-3 Security Standard: Increased internal security measures to protect model weights
ASL-3 Deployment Standard: Targeted measures limiting CBRN weapons misuse risk
Precautionary basis: Anthropic has not definitively confirmed Claude Opus 4 crossed the capability threshold, but could not "clearly rule out ASL-3 risks"

Key Threshold Domains

Domain	ASL-2 Threshold	ASL-3 Threshold	Current Status
CBRN capabilities	Basic refusals	Sophisticated non-state attacker resistance	Claude Opus 4 at ASL-3
Autonomous AI R&D	No automation	1000x scaling acceleration	Not crossed
Cybersecurity	Basic vulnerability knowledge	Advanced exploitation assistance	Monitoring
Model weight security	Opportunistic theft defense	Sophisticated attacker defense	ASL-3 for Opus 4

Recent Changes and Concerns

Version 2 shift: Anthropic moved from quantitative benchmarks to qualitative descriptions of capability levels. Critics note this reduces verifiability.

Grade decline: According to SaferAI, Anthropic's grade dropped from 2.2 to 1.9, placing them in the "weak" category alongside OpenAI and DeepMind. The primary concern is the shift away from precisely defined thresholds.

Institute for AI Policy and Strategy recommendation: Companies should define verifiable risk thresholds informed by "societal risk" tolerances from other industries. Current thresholds may be too lenient.

Detection Challenges

Small improvements in elicitation methodology can dramatically increase scores on evaluation benchmarks. Naive elicitation strategies may significantly underreport risk profiles, potentially missing dangerous capabilities that sophisticated actors could unlock.

Data Quality: Limited public disclosure of evaluation results. Companies control both the design and disclosure of dangerous capability evaluations, creating incentives to underreport concerning findings.

3. Time Between Model Training and Safety Evaluation

Current Status (2025): Decreasing evaluation windows, with some tests compressed from months to days.

Shortened Timelines

The Financial Times reported↗ that OpenAI has been slashing safety evaluation time, giving testers "just a few days for evaluations that had previously been allotted weeks or months to be completed."

Model	Reported Evaluation Time	Historical Comparison	Source
GPT-4 (2023)	6+ months	Baseline	OpenAI system card
o3 (2025)	Less than 1 week	95%+ reduction	FT sources
GPT-4.1 (2025)	No technical safety report	N/A	OpenAI statement

This compression creates severe constraints on thorough safety testing:

Complex evaluations require substantial time to design and execute
Emergent capabilities may only become apparent through extended testing
Red teams need adequate access to explore edge cases and failure modes
One evaluator told FT: "We had more thorough safety testing when [the technology] was less important"

Pre-Deployment Testing Examples

OpenAI o1 evaluation (December 2024): US AISI and UK AISI conducted joint pre-deployment testing during a "limited period of pre-deployment access." Testing was conducted by expert engineers and scientists across three domains:

Cyber capabilities
Biological capabilities
Software and AI development

The evaluators noted that testing was "conducted in a limited time period with finite resources, which if extended could expand the scope of findings."

Evaluator Access Challenges

METR↗ and other evaluation organizations report that comprehensive risk assessments require:

Substantial expertise and specialized knowledge
Direct access to models and training data
More time than companies typically provide
Information about technical methodologies that companies often withhold

METR argues that powerful AI systems "are not ordinary products" and risks should be addressed throughout the whole AI development lifecycle. They advocate for "earlier evaluations for dangerous capabilities, better forecasting of AI capabilities prior to training, and more emphasis on security and safety throughout development."

AI Safety Index Findings

The 2025 AI Safety Index↗ from Future of Life Institute found:

"AI companies are unlikely to make high-assurance safety cases if timelines are short"
"AI developers control both the design and disclosure of dangerous capability evaluations, creating inherent incentives to underreport alarming results"
"Naive elicitation strategies cause significant underreporting of risk profiles"
The gap between capabilities acceleration and risk management practice is widening

Data Quality: Limited public data on specific evaluation timelines. Most information comes from investigative journalism, evaluator reports, and company transparency documents.

4. External Red-Team Engagement Rate

Current Status (2025): External red teaming is standard practice at major labs, but engagement scope and disclosure vary significantly.

Major Lab Practices

OpenAI: Conducts pre-deployment adversarial testing by vetted external experts. External red teamers identify alignment failures, injection vectors, tool misuse paths, and safety regressions. Findings inform mitigation strategies and deployment decisions.

Notable recent engagements: ControlPlane's Torin van den Bulk contributed to external red team testing on GPT-4o, Operator, o3-mini, and Deep Research, with live access to model checkpoints.

HackerOne Platform: Provides structured AI Red Teaming (AIRT) as 15 or 30-day engagements. Over 750 AI-focused researchers contribute to engagements for frontier labs including Anthropic, Snap, and Adobe. HackerOne has tested 1,700+ AI assets across customer scopes.

Key Vulnerabilities Found

From HackerOne's aggregated testing data across 1,700+ AI assets:

Vulnerability Type	Frequency	Severity	Notes
Cross-tenant data leakage	Found in nearly all enterprise tests	Critical	Highest priority concern
Prompt injection	75%+ of tested models	High	Frequently bypasses safety filters
Jailbreak exploits	Variable	High	Success rates vary by methodology
Unsafe outputs	Common	Medium-High	Various categories of harmful responses

Anthropic Jailbreak Challenge Results (2025)

Anthropic partnered with HackerOne↗ to test Constitutional Classifiers on Claude 3.5 Sonnet:

300,000+ chat interactions from 339 participants
$55,000 in bounties paid to four successful teams
One team found a universal jailbreak passing all levels
One team found a borderline-universal jailbreak
Two teams passed all eight levels using multiple individual jailbreaks

Testing Metrics

Metric	Description	Industry Benchmark
Jailbreak success rate (ASR)	Percentage of successful bypass attempts	Varies: 0% to 63% at 100 attempts
Mean time to detect (MTTD)	Time to discover vulnerabilities	10 min to 7+ hours
Mean time to remediate (MTTR)	Time to fix discovered issues	Not publicly disclosed
Attack success at 200 attempts	Multi-attempt bypass rate	Claude Opus 4.5: 0% (computer use) to 63% (coding)

Government Framework

CISA defines AI red teaming as a subset of AI Testing, Evaluation, Verification and Validation (TEVV). NIST has operationalized this through programs like Assessing Risks and Impacts of AI (ARIA) and the GenAI Challenge.

Limitations

While external red teaming is increasingly common, critical gaps remain:

Limited disclosure of red team findings and remediation actions
Selective engagement: Labs choose which red teamers to work with
Short engagement windows: 15-30 days may be insufficient for complex systems
Post-deployment gaps: Less emphasis on continuous adversarial testing after launch

Data Quality: Some public information from lab announcements and red team providers. Comprehensive engagement rates and detailed findings remain largely non-public.

5. Dangerous Capability Disclosure Delays

Current Status (2025): Significant and increasing delays, with some major model releases lacking safety documentation entirely.

Google Gemini 2.5 Pro (March 2025)

Google released Gemini 2.5 Pro↗ without a model card, violating commitments made to the U.S. government and at international AI safety summits:

Timeline	Event	Notes
March 2025	Gemini 2.5 Pro released	No model card published
3 weeks later	Simplified 6-page model card	Called "meager" and "worrisome"↗ by AI governance experts
Late June 2025	Detailed report published	Months after full release

Government Response: 60 U.K. politicians signed an open letter↗ accusing Google DeepMind of "a troubling breach of trust with governments and the public" and a "failure to honour" international commitments.

Google's Defense: The company claimed Gemini 2.5 Pro was an "experimental" release, exempting it from normal documentation requirements.

OpenAI Documentation Gaps

Deep Research model: Released without a system card, which was published weeks later
GPT-4.1: OpenAI announced it would not publish a technical safety report, arguing the model is "not a frontier model"

Broader Industry Pattern

Meta Llama 4: Model card was similarly brief and limited in detail, drawing criticism from AI safety researchers.

Systemic Issues: The AI Safety Index found that "AI developers control both the design and disclosure of dangerous capability evaluations, creating inherent incentives to underreport alarming results or select lenient testing conditions that avoid costly deployment delays."

Transparency Requirements

While voluntary commitments emphasize transparency, actual disclosure practices show significant gaps:

Limited disclosure of evaluation methodologies
Weak evidence of systematic safety processes
Uneven adoption of robust evaluation practices

New Legal Requirements

California's Transparency in Frontier AI Act (effective 2026) establishes:

Transparency requirements for large AI developers
Mandatory reporting of critical safety incidents to state attorney general
Whistleblower protections for employees reporting risks

Data Quality: Based on public monitoring by AI governance organizations, investigative journalism, and government oversight. Actual capability evaluation results remain largely proprietary.

6. Pre-Deployment Safety Testing Duration

Current Status (2025): Highly variable and generally decreasing. No standardized minimum testing period exists.

Testing Approaches

Major frontier AI labs follow safety policies that include pre-deployment testing:

OpenAI's Preparedness Framework↗ (Version 2, April 2025)
Google DeepMind's Frontier Safety Framework
Anthropic's Responsible Scaling Policy↗ (Version 2.2, May 2025)

Third-party evaluators (UK AISI, US AISI, Apollo Research, METR↗) also conduct pre-deployment assessments, though their access and time are limited. METR's analysis of 12 companies↗ with published frontier AI safety policies found variable commitment levels.

Known Testing Examples

OpenAI o1 (December 2024): Joint US AISI and UK AISI evaluation during a "limited period" before public release. Specific duration not disclosed publicly.

Safeguard Testing Benchmarks: Research examples show wide variation in time requirements:

First vulnerability test: 10 minutes of expert red teamer time
Second test (novel universal jailbreak): Over 7 hours of expert effort

Industry Trends

The 2025 AI Safety Index concluded that:

Pre-deployment testing is "likely necessary but insufficient" for responsible AI development
Testing is conducted with "limited time periods and finite resources"
"If timelines are short, AI companies are unlikely to make high-assurance safety cases"

Comparison to Other Industries

Unlike pharmaceuticals (multi-year clinical trials) or aerospace (extensive certification processes), AI systems lack:

Standardized testing protocols
Minimum duration requirements
Independent verification mandates
Clear pass/fail criteria for deployment

Data Quality: Very limited public data. Specific testing durations are rarely disclosed. Assessment based on general industry reports and occasional third-party evaluator statements.

7. Model Release Velocity

Current Status (2025): Unprecedented acceleration, with major labs releasing frontier models within weeks of each other.

Release Frequency Trends

2024 Baseline: Major labs typically released frontier models annually or semi-annually.

2025 Acceleration: "Companies that typically released major models annually or semi-annually were now shipping frontier models within weeks of each other." Each release incorporated learnings from the previous week's competitive announcements.

November-December 2025: "Tit-for-Tat Arms Race"

In just 25 days, four major AI companies launched their most powerful models↗:

Date	Company	Model	Benchmark Performance
November 17	xAI	Grok 4.1	Top on select reasoning tasks
November 18	Google	Gemini 3	Topped multiple leaderboards
November 24	Anthropic	Claude Opus 4.5	80%+ on SWE-Bench Verified
December 11	OpenAI	GPT-5.2	Competitive across benchmarks

This concentration represented "a compression of innovation never before seen in technology history." OpenAI's Sam Altman issued an internal "code red" memo↗ after Gemini 3 topped leaderboards, with internal sources reporting that some employees asked for delays but "competitive pressure forced the accelerated timeline."

2025 Release Summary

OpenAI:

GPT-5 with improved coding and "thinking" mode
GPT-5.1 Codex Max (agentic coding model)
GPT-5.2
gpt-oss-120b and gpt-oss-20b (open-source models)
Dozens of feature launches (GPT-4o Image, standalone Sora app, group chats)

Anthropic:

Claude 4 family (Opus and Sonnet)
Claude Opus 4.5 (November 24)
Claude 4.5 Haiku

Google DeepMind:

Gemini 2.5 (March)
Gemini 3 (November)
Gemini 2.5 Deep Think
Genie 3.0 (world model)

Meta & Others:

DeepSeek R1 (January 20, 2025) - major open model impact
Qwen 3 and various Chinese lab releases

Tracking Data

AI Flash Report: Tracked 43 model releases as of October 27, 2025
Our World in Data: Tracks large-scale AI systems (>10²³ FLOP training compute)

Safety Implications

Rapid release velocity creates pressure that:

Reduces time available for safety evaluation
Encourages "shipping within weeks" competitive dynamics
Creates feedback loops of rapid iteration
May prioritize "shiny products" over safety culture

Data Quality: Good tracking of major model releases through multiple sources. Precise internal development timelines remain proprietary.

8. Open-Source vs Closed Model Capability Gap

Current Status (2025): Gap narrowing significantly, from approximately 16 months in 2024 to approximately 3 months in late 2025.

Current Gap Estimate

Epoch AI research from October 2025↗ found:

Metric	2024 Estimate	2025 Estimate	Trend
Average lag time	16 months	3 months	Narrowing rapidly
ECI gap (capability index)	15-20 points	7 points	Narrowing
Benchmark parity domains	Limited	Most key benchmarks	Expanding
Enterprise use gap	Significant	15% on SWE-Bench	Narrowing

Specific Example: Meta's Llama 3.1 405B (released July 2024) took approximately 16 months to match the capabilities of GPT-4's first version.

Gap Narrowing Evidence

2024: Ecosystem relied primarily on Llama 3, with Qwen2.5 and DeepSeek known to specialists.

2025: DeepSeek R1 (January 20, 2025)↗ achieved performance parity with OpenAI's o1 while operating at 15x reduced cost, training for just $5.6 million. The open ecosystem "immensely accelerated in terms of capabilities, rivaling closed models on most key benchmarks."

Meta's Claims

Meta described Llama 3.1 as "the first frontier-level open source AI model," claiming it "outperforms GPT-4.0 and Anthropic's Claude 3.5 on several benchmarks" in internal evaluations.

Remaining Closed Model Advantages

Highest-end performance: GPT-4 and newer models remain more capable on complex tasks requiring deep reasoning.

Enterprise benchmarks: On SWE-Bench Verified (real GitHub issue fixes):

State-of-the-art closed models (GPT-5.2-Codex, Claude Opus 4.5): 80%+
Top open-source models: 65%
This gap is described as "critical for enterprise use"

Open Model Strengths

Niche verticals: Open-source models lead in biomedicine, law, and defense applications where institutional constraints (privacy, security, customization) matter more than raw performance.

Cost-effectiveness: Significantly cheaper for customization and fine-tuning.

Adoption trends: According to a16z research:

41% of surveyed enterprises will increase use of open-source models
Another 41% will switch from closed to open if performance matches

Open Source Definition Debate

Meta's Llama models don't meet the Open Source Initiative's definition, which requires sharing:

Model weights (Meta provides this)
Training data (Meta does not provide this)
Training code (Meta does not provide this)

Google Internal Assessment

A leaked 2023 Google memo warned: "we aren't positioned to win this arms race … I'm talking, of course, about open source. Plainly put, they are lapping us."

Data Quality: Based on benchmark comparisons, research organization analysis (Epoch AI), and industry reports. Benchmarks may not capture all relevant capability dimensions.

9. Lab Safety Team Turnover Rate

Current Status (2025): Specific turnover rates not publicly disclosed, but high-profile departures suggest significant retention challenges at leading labs.

OpenAI Safety Departures (2024-2025)

May 2024 - Superalignment Team Dissolution↗:

Name	Role	Departure Date	Public Statement
Ilya Sutskever	Co-founder, Chief Scientist, Superalignment co-lead	May 14, 2024	No public criticism
Jan Leike	Head of Alignment, Superalignment co-lead	May 2024	"Safety culture... took a backseat to shiny products"
Daniel Kokotajlo	Safety researcher	April 2024	"Resigned in protest after losing confidence"
Leopold Aschenbrenner	Safety researcher	2024	Reportedly fired for leaking information
William Saunders	Safety researcher	2024	No public statement

Jan Leike's criticism: "Over the past few months my team has been sailing against the wind. Sometimes we were struggling for [computing resources]" despite OpenAI's promise to allocate 20% of compute to Superalignment. He joined Anthropic.

Result: The entire Superalignment team was disbanded, with members reassigned to other teams.

September 2024 Leadership Exits:

Mira Murati (CTO, 6 years at OpenAI)
Bob McGrew (Chief Research Officer)
Barret Zoph (VP of Research)
Hannah Wong (Chief Communications Officer)
Tom Cunningham (Economics Researcher)
Miles Brundage (Policy Research Head)

Total documented senior departures: 25+ as of December 2024

June 2024 Open Letter

Nine current and former OpenAI employees wrote an open letter criticizing the company for "recklessly racing" to build AGI. Daniel Kokotaljo spoke out despite OpenAI initially conditioning his equity (worth ≈$1.7 million) on non-disparagement agreement compliance.

Anthropic

Anthropic has positioned itself as a safety-focused alternative and received several high-profile hires from OpenAI, including Jan Leike. Specific turnover data not publicly available.

Industry-Wide Context

General corporate AI employee retention challenges in 2025:

High demand for AI talent creates strong external offers
Burnout from rapid development pace
Philosophical disagreements over safety prioritization

Whistleblower Issues

The 2025 AI Safety Index noted: "Public whistleblowing policies are a common best practice in safety-critical industries. Yet, among the assessed companies, only OpenAI has published its full policy, and it did so only after media reports revealed the policy's highly restrictive non-disparagement clauses."

Data Quality: Very limited. Based on public departure announcements, investigative journalism, and open letters. Internal turnover rates for safety-specific teams are not disclosed. No denominator data (total safety team size over time) publicly available.

10. Whistleblower Reports from AI Labs

Current Status (2025): Small but growing number of public whistleblower reports, primarily from OpenAI. Structural barriers remain significant.

"The OpenAI Files" (June 2025)

Comprehensive report compiled by the Midas Project and Tech Oversight Project tracking issues with governance, leadership, and safety culture at OpenAI. Drew on:

Legal documents and complaints
Social media posts
Media reports
Open letters and insider accounts

Described as "the most comprehensive collection to date of documented concerns with governance practices, leadership integrity, and organizational culture at OpenAI."

Notable Whistleblower Cases

Daniel Kokotajlo (2024): Spoke publicly despite non-disparagement agreement that initially conditioned ≈$1.7 million in equity on his silence. Resigned "in protest after losing confidence in the company."

Jan Leike (2024): While not technically a whistleblower (as he departed to Anthropic), publicly criticized OpenAI on X, stating safety "took a back seat to shiny products" and the team was under-resourced.

Nine-Person Open Letter (June 2024): Current and former OpenAI employees criticized the company for "recklessly racing" toward AGI.

Structural Barriers to Whistleblowing

Non-Disparagement Agreements: OpenAI initially used agreements that conditioned equity vesting on non-criticism. This practice was exposed and modified after public backlash.

Whistleblower Policy Gaps: The AI Safety Index found that only OpenAI published its full whistleblowing policy, and only after media scrutiny revealed restrictive clauses.

Industry Comparison: Most AI labs lack public whistleblower policies comparable to safety-critical industries like aviation or nuclear power.

Cross-Lab Safety Criticism

In 2025, AI safety researchers from OpenAI, Anthropic, and other organizations publicly criticized xAI's "reckless" and "completely irresponsible" safety culture following company scandals.

New Protections

California SB 53 (supported by Anthropic): Provides whistleblower protections for employees reporting AI-related risks or safety concerns to authorities. Effective January 1, 2026.

California AI Safety Act: Establishes protections from retaliation for reporting AI-related risks.

Cross-Lab Evaluation Initiative (2025)

In early summer 2025, Anthropic and OpenAI agreed to evaluate each other's public models using in-house misalignment evaluations and released findings in parallel. While not whistleblowing per se, this represents increased transparency.

Limitations

Actual number of whistleblower reports remains unknown because:

Many concerns may be raised internally without public disclosure
Non-disparagement agreements suppress some reports
Fear of career consequences deters whistleblowing
No centralized reporting or tracking mechanism exists

Data Quality: Very limited. Based on public letters, media investigations, and individual whistleblower accounts. Represents visible tip of potentially larger iceberg of internal concerns.

Summary of Data Availability

Metric	Data Quality	Public Availability
Voluntary commitment compliance	Good	Detailed 2025 study available
RSP threshold crossings	Poor	Companies control disclosure
Training-to-eval timeline	Poor	Mostly not disclosed
External red team engagement	Moderate	Some provider data, limited findings
Disclosure delays	Moderate	Tracked by watchdog organizations
Pre-deployment testing duration	Poor	Rarely disclosed
Model release velocity	Good	Well-tracked by multiple sources
Open vs closed capability gap	Good	Regular benchmark comparisons
Safety team turnover	Poor	Only high-profile departures visible
Whistleblower reports	Poor	Only public cases known

Key Takeaways

Compliance varies widely: Even leading labs struggle with certain commitments (especially model weight security and third-party reporting)
Evaluation timelines are shortening: Despite increasing capabilities, safety testing windows are compressed, raising concerns about thoroughness
Transparency gaps persist: Major model releases sometimes lack promised safety documentation, violating voluntary commitments
Release velocity is accelerating: Competitive pressure has created unprecedented model release density, particularly in late 2025
Open-source catching up: The capability gap between open and closed models is narrowing from ~16 months to potential parity in some domains
Safety team retention challenges: High-profile departures, particularly from OpenAI's Superalignment team, suggest cultural or resource allocation issues
Limited whistleblower infrastructure: Despite safety-critical nature of AI development, formal whistleblower protections and reporting mechanisms remain underdeveloped
Data quality challenges: Most metrics suffer from limited disclosure, creating information asymmetry between labs and external stakeholders

Sources

ai transition-model-parameterSafety Culture Strength

ai transition-model-parameterRacing Intensity

ai transition-model-parameterHuman Oversight Quality

Lab Behavior & Industry

Lab Behavior

Quick Assessment

Key Links

Overview

Lab Behavior Dynamics

1. Voluntary Commitment Compliance Rate

Key Findings

Compliance by Commitment Type

Related Commitments

Concerning Developments

2. RSP Capability Threshold Crossings

Anthropic's RSP Framework

ASL-3 Activation (2025)

Key Threshold Domains

Recent Changes and Concerns

Detection Challenges

3. Time Between Model Training and Safety Evaluation

Shortened Timelines

Pre-Deployment Testing Examples

Evaluator Access Challenges

AI Safety Index Findings

4. External Red-Team Engagement Rate

Major Lab Practices

Key Vulnerabilities Found

Anthropic Jailbreak Challenge Results (2025)

Testing Metrics

Government Framework

Limitations

5. Dangerous Capability Disclosure Delays

Google Gemini 2.5 Pro (March 2025)

OpenAI Documentation Gaps

Broader Industry Pattern

Transparency Requirements

New Legal Requirements

6. Pre-Deployment Safety Testing Duration

Testing Approaches

Known Testing Examples

Industry Trends

Comparison to Other Industries

7. Model Release Velocity

Release Frequency Trends

November-December 2025: "Tit-for-Tat Arms Race"

2025 Release Summary

Tracking Data

Safety Implications

8. Open-Source vs Closed Model Capability Gap

Current Gap Estimate

Gap Narrowing Evidence

Meta's Claims

Remaining Closed Model Advantages

Open Model Strengths

Open Source Definition Debate

Google Internal Assessment

9. Lab Safety Team Turnover Rate

OpenAI Safety Departures (2024-2025)

June 2024 Open Letter

Anthropic

Industry-Wide Context

Whistleblower Issues

10. Whistleblower Reports from AI Labs

"The OpenAI Files" (June 2025)

Notable Whistleblower Cases

Structural Barriers to Whistleblowing

Cross-Lab Safety Criticism

New Protections

Cross-Lab Evaluation Initiative (2025)

Limitations

Summary of Data Availability

Key Takeaways

Sources

Related Pages