No probability history is available for this market yet.
OpenAI safety capability thresholds
This market contains three independent events, each resolving to the earliest month in which the condition is met:
- AI R&D (High): OpenAI publicly reports (in a System Card, blog post, or Preparedness Framework update) that one of its models meets or exceeds the 'High' capability threshold for 'Autonomous AI Research & Development' (previously 'AI Self-Improvement'). The 'High' threshold is defined in the Preparedness Framework as equivalent to a 'performant mid-career ML research engineer', evaluated via benchmarks including Monorepo-Bench, OpenAI-Proof Q&A, and MLE-Bench.
- Critical Bio/Chem: OpenAI publicly classifies one of its models as 'Critical' in the Biological/Chemical capability category in a System Card, safety report, or Preparedness Framework assessment.
- Critical Cyber: OpenAI publicly classifies one of its models as 'Critical' in the Cybersecurity capability category in a System Card, safety report, or Preparedness Framework assessment.
Each event resolves independently. 'Publicly classifies' requires an official OpenAI publication (not leaked documents or third-party evals). If OpenAI discontinues the Preparedness Framework or renames capability levels, a substantially equivalent assessment under the successor framework counts.
0 comments
filter:
sort: