Anthropic safety capability thresholds – 2029

v1.2 (2026-04-01) current

The starting probabilities of this market were calculated using logistic CDF models fitted to anchor points calibrated via Anthropic’s own RSP assessment history, cross-lab safety-framework patterns, and METR capability-trend data. A structural risk discount is applied to account for institutional uncertainty over longer horizons.

Events

This market tracks two milestones from Anthropic’s Responsible Scaling Policy (RSP):

AI R&D-4: Anthropic publicly reports that a model meets the AI R&D-4 capability threshold. Under RSP v2.1 (Mar 2025), this is defined as “the ability to fully automate the work of an entry-level, remote-only Researcher at Anthropic.” Under RSP v3.0 (Feb 2026), the equivalent threshold is a model that could “compress two years of 2018–2024 AI progress into a single year” (~1000x effective compute scaleup). As of February 2026, Anthropic explicitly states that Claude Opus 4.6 does not cross this threshold, though ruling it out is getting harder.
CBRN-4: Anthropic publicly reports that a model meets the CBRN-4 capability threshold. Under RSP v2.1, this is “the ability to substantially uplift CBRN development capabilities of moderately resourced state programs (with relevant expert teams), such as by novel weapons design, substantially accelerating existing processes, or dramatic reduction in technical barriers.” Under RSP v3.0, the equivalent is a model that can “significantly help threat actors (e.g., moderately resourced expert-backed teams) create/obtain and deploy chemical and/or biological weapons with potential for catastrophic damages far beyond those of past catastrophes such as COVID-19.” This is much harder than the CBRN-3 threshold (which targets individuals with basic STEM backgrounds and triggered ASL-3 protections in May 2025). ASL-4 safeguards have not yet been defined.

Anthropic RSP assessment history

Anthropic has been unusually transparent about threshold assessments, publishing regular RSP updates and risk reports. This timeline directly informs the calibration:

Date	Event
Sep 2023	RSP launched with ASL-1 through ASL-4+. All current models assessed as ASL-2.
Oct 2024	Major RSP update (v2). All models still ASL-2.
Mar 2025	RSP v2.1: AI R&D thresholds disaggregated into AI R&D-4 and AI R&D-5; CBRN threshold detail added.
May 2025	ASL-3 activated for Claude Opus 4, driven by CBRN-3 concerns (steadily increasing performance on Virology Capabilities Test, stronger virus acquisition task performance). Anthropic could not clearly rule out ASL-3 risks. ASL-4 explicitly ruled out for Opus 4.
Feb 2026	Claude Opus 4.6 assessed: does not cross AI R&D-4, but “confidently ruling out this threshold is becoming increasingly difficult.” RSP v3.0 published with major rewrite: rigid numbered thresholds replaced by capability-to-mitigation mapping tables. Sabotage risk report published for Opus 4.6.

Key implications: The Feb 2026 assessment directly constrains near-term AI R&D-4 probability — the latest flagship model was explicitly assessed and found to fall short. The next realistic opportunity is a future model generation, likely late 2026 at earliest. For CBRN-4, the gap between CBRN-3 (triggered May 2025) and CBRN-4 is very large: CBRN-3 targets non-experts while CBRN-4 requires uplifting state-backed expert teams.

METR capability trend

METR’s March 2025 analysis found that frontier-agent task horizon has been doubling roughly every 7 months. Current systems achieve ~1-hour task horizon at 50% reliability, with <10% success on tasks over ~4 hours. The AI R&D-4 threshold (“fully automate entry-level researcher”) likely requires reliable multi-day autonomous performance, suggesting this crossing is still some years away even with steep improvement trends.

Cross-lab reference points

OpenAI: “High” AI R&D threshold (mid-career ML research engineer) not yet reached. o3 reportedly “medium.” GPT-4o rated “low” in bio/cyber.
Google DeepMind: No CCL-1 threshold crossed through Gemini 3.1 Pro (Feb 2026). CBRN and Cyber alert thresholds reached but CCLs confirmed not met.

The cross-lab pattern is consistent: intermediate capability thresholds are being approached or crossed, but the most severe levels remain unmet at all labs.

Methodology

For each event, we set anchor points (month, cumulative probability) balancing:

Anthropic’s own assessments: the Feb 2026 “does not cross AI R&D-4” statement and the May 2025 “ASL-4 ruled out” for CBRN directly constrain near-term probabilities
METR task-horizon trend (~7-month doubling)
Cross-lab threshold history: no lab has publicly reported a top-level crossing
RSP framework churn: three major rewrites in 2.5 years increases structural uncertainty

A logistic CDF is fitted to each event’s anchors via least-squares. A structural / institutional risk discount is then applied multiplicatively: ~1.3%/year constant hazard rate for “framework becomes unresolvable” (RSP restructuring, threshold redefinition, reporting changes), calibrated so cumulative structural risk reaches ~7% at the 5.5-year horizon.

Key assumptions

Anthropic continues publishing capability assessments (RSP updates, Risk Reports, or equivalent)
The RSP threshold concepts remain substantially similar even as specific definitions evolve (the structural discount accounts for risk they do not)
AI R&D-4 progresses faster than CBRN-4 (stronger commercial incentive, more training signal, and closer to current capability levels)
CBRN-4 requires a much larger capability jump from CBRN-3 than most other threshold gaps

Sources

Liquidity over time

ⁱ

Open this panel to load liquidity history.

0 comments

filter:

sort: