Home | Antistatic

Sergio 1 point 1 month ago in Anthropic safety capability thresholds

Some disconnected, rough thoughts:

As far as I can tell from some research, Anthropic hasn’t yet even clearly stated that they have crossed the CBRN-3 threshold (“the ability to significantly assist

individuals or groups with basic STEM backgrounds in obtaining, producing, or deploying CBRN weapons”), although they did activate ASL-3 protections. In general, it seems that the standards of evidence they are using for the “rule in” thresholds of capability are high, and I also expect they don’t have many incentives for performing strict experiments just in order to clearly confirm dangerous capabilities (relatively little upside in doing so, as long as they internally act precautionarily when not “ruling out” dangerous capability levels).

It is also not clear how this question is handling “AI R&D-4”. The standards for automated R&D from RSP v3.0 seem substantially higher than AI R&D-4 from the previous RSP, particularly when reading things such as Claude Mythos Preview System Card, which among other things suggests that a productivity uplift on individual tasks on the order of 40× (an order of magnitude higher than the current geometric mean of 4× from surveys) would be required to yield an overall progress multiplier of 2×.

My reading is that Anthropic will officially need to acknowledge that any of the definitions has been met.

Now that AI R&D-4 has been deprecated, this lower threshold may never be officially tested again.

In general, one fundamental issue with these questions is that there will likely be a substantial (at least months) distance between the point where we have the actual underlying capability (particularly under refined scaffolds and with people being prepared to smartly leverage the model’s strengths and having good incentives for successful elicitations, which probably does not happen in the CBRN tests from Anthropic or similar tests from other companies) and the point where the capability is officially announced.

Reading things like Biorisk \ red.anthropic.com, info about Mythos, and generally about METR time horizons progress (even though this is purely about software engineering), makes me expect that (actual) CBRN-4 is much closer than what is suggested by the baseline starting probabilities, if not already here. However, related to a previous observation, there seems to be little benefit in the current environment to clearly state that their models have high CBRN risk: it seems that the current administration or adversaries could use an honest assessment against Anthropic, and the upside how showing others a very clear lower bound on dangerous capabilities seems low at the moment, on both commercial grounds and even e.g. for clear signaling with the hopes of generating common knowledge and promoting urgent safety regulations. The political/regulatory considerations might change in the following years as AI and its associated risks become more mainstream, and I think there could be substantial changes in policy here after the 2028 presidential elections. Importantly, the tradeoffs for Anthropic officially declaring some dangerous capability thresholds could also change if another company faces a public incident that makes it clear that some capability level has been reached, or if another AI company declares equivalent capability thresholds as met (and this consideration made me adjust my initial probabilities up).

AI R&D is likely much more dangerous overall on the medium and long term than CBRN, but this is less obviously clear, and also has more obvious commercial upside and thus potential of attracting inversion. Also, the people surveyed for determining current AI R&D acceleration within Anthropic are much better at eliciting the underlying capabilities of the models than the people surveyed to determine the CBRN uplift. So, in general, I would typically expect less gap between actual capability and official announcement, but on the other hand the substantial increase in difficulty (and widening of definitional wiggle room) from RSP v2.1 to 3.0 makes me think that Anthropic might prefer to express the AI R&D gains while delaying actually indicating that the thresholds have been crossed.

Finally, I would expect that at some point the AI capabilities are so strong that Anthropic could very well just say what everyone already knows about some thresholds having been crossed. But there seems to be little benefit (from the perspective of providing public knowledge and that of signalling trustworthiness and a safety-conscious approach) once we are at that point. And the CBRN-3 example already seems to point in the direction of previous thresholds being treated as surpassed without ever being explicitly recognized as such.

0 replies

No replies yet.

Sort:

skatepaint 1 point 22 days ago in OpenAI safety capability thresholds

OpenAI is better at announcing when thresholds have been met than DeepMind (and perhaps better than Anthropic as well), so I do not expect there to be the same level of delay in announcing that a threshold has been crossed as has been the case with the other leading AI companies. Thus, it's worthwhile to mainly focus on the definitions as opposed to the technicalities of when OpenAI would plan to announce that a threshold has been met (after it has been met). However, as the capabilities we are forecasting on enter the realm of "very dangerous", OpenAI does have more of an incentive to delay announcing these thresholds than it did for its less dangerous thresholds (not counting the AI R&D one, which it would be proud to announce):

AI R&D: High - Definition: "equivalent to a performant mid-career ML research engineer". This is a very high bar to cross. Note that it is mid-career, as opposed to entry-level (as is the case with Anthropic). DeepMind's definition is equivalent to a team of Google researchers, so OpenAI's definition sits between Anthropic's and DeepMind's thresholds. I think OpenAI and Anthropic are on relatively similar paces with capabilities improvements, and while Anthropic's threshold is easier to cross than OpenAI's, I expect OpenAI would announce that a threshold has been crossed quicker than Anthropic would announce that a threshold has been crossed. Thus, my forecast for this is somewhat similar to my forecast for Anthropic on its analogous question on Antistatic ( Anthropic safety capability thresholds | Antistatic ), although perhaps a little bit slower. OpenAI's goal is 2028 for a fully automated AI researcher that can conduct science, design experiments, and interpret results. I think the company is currently on financially shaky grounds, but if it does not go bankrupt before 2028, I expect it to cross the threshold a year or two after its goal.

By end of 2026: 7%

By end of 2027: 20%

By end of 2028: 35%

By end of 2029: 45%

By end of 2030: 55%

Bio/Chem: critical - Definition: "provides significant uplift for creating biological or chemical weapons beyond freely available information". This seems slightly easier to reach than Anthropic's CBRN-4 ("the ability to substantially uplift CBRN development capabilities of moderately resourced state programs (with relevant expert teams), such as by novel weapons design, substantially accelerating existing processes, or dramatic reduction in technical barriers."), although there is some ambiguity. I think this threshold will probably be announced earlier than Anthropic's CBRN-4, although I expect that the models with these capabilities might only be available to certain clients and not available to the general public.

By end of 2026: 5%

By end of 2027: 15%

By end of 2028: 30%

By end of 2029: 45%

By end of 2030: 55%

Cyber: critical - Definition: "requires autonomous identification and exploitation of novel zero-day vulnerabilities in hardened systems, or end-to-end offensive cyber operations against state-level targets". I am under the impression that GPT-5.5 is probably not that far from reaching this threshold, although I expect OpenAI not to release a model any time soon to the general public that clearly crosses this threshold, as it would be a cybersecurity nightmare. Thus, the company might release certain models to the U.S. government that cross this threshold without making them available to the general public.

By end of 2026: 8%

By end of 2027: 22%

By end of 2028: 40%

By end of 2029: 55%

By end of 2030: 65%

skatepaint 1 point 22 days ago in Google DeepMind safety capability thresholds

As Sergio made clear in his comment, a lot of this might come down to technicalities based on when DeepMind explicitly mentions that a threshold has been met. This is also complicated by the fact that the definitions change quite frequently. Thus, much of the forecasting for this question will be based on the technicality, and perhaps this might not capture what this question is trying to elicit. Regardless, here are my forecasts for each option:

CBRN Uplift 1 - As this has likely already been crossed, it's just a matter of figuring out when DeepMind plans on announcing it has been crossed. I do not expect them to wait too long, as it seems that they do have to announce it at some point, even if it is inconvenient to do so. Thus, I am higher than the current crowd consensus. I am leaving some weight on the possibility that they just never announce that they have crossed this threshold.

By end of 2026: 15%

By end of 2027: 30%

By end of 2028: 50%

By end of 2029: 70%

By end of 2030: 80%

Cyber Uplift 1 - Gemini is currently a bit behind Claude and GPT in its capabilities. While Claude Mythos and GPT-5.5 likely already cross this threshold, I expect there to be a Gemini model that crosses this threshold by the end of this year. However, as there likely will be a delay in Gemini announcing that the threshold has been crossed, I expect a delay in confirmation by DeepMind. That said, Cyber Uplift is less scary than CBRN Uplift, so perhaps DeepMind will be less hesitant to announce that this threshold has been crossed than that the CBRN threshold has been crossed. As with the CBRN Uplift forecast, I am leaving some weight on the possibility that DeepMind just never announces that they have crossed this threshold.

By end of 2026: 10%

By end of 2027: 25%

By end of 2028: 45%

By end of 2029: 65%

By end of 2030: 80%

ML R&D Automation 1 - “Can fully automate the work of any team of researchers at Google focused on improving AI capabilities, with approximately comparable all-inclusive costs.”: I am skeptical that this is around the corner. I am generally with the crowd on this one that it will be a while before this threshold will be met. However, I am slightly higher on the later years than the crowd is.

By end of 2026: 2.5%

By end of 2027: 10%

By end of 2028: 20%

By end of 2029: 35%

By end of 2030: 50%

ML R&D Acceleration 1 - Easier to meet than the Automation threshold, but the ambiguity in the definition (“Has been used to accelerate AI development, resulting in AI progress substantially accelerating from historical rates.”) leaves me hesitant to put too much weight on DeepMind soon announcing that this threshold has been crossed.

By end of 2026: 7%

By end of 2027: 20%

By end of 2028: 40%

By end of 2029: 60%

By end of 2030: 75%

skatepaint 1 point 26 days ago in Anthropic safety capability thresholds

Both AI R&D-4 and CBRN-4 are high thresholds to pass, although AI capabilities improvements have been carrying along very quickly. However, there are some variables outside of AI research that are important to remember when thinking about forecasting when these thresholds will be released:

It's not inevitable that AI companies are financially able to carry on full-speed ahead with their research until the end of 2030, as there is a very real chance of a major market correction. OpenAI and Anthropic are burning massive amounts of money, and one or both of these companies might run out of money in the near future. Furthermore, AI companies are starved for compute to the point that it is affecting the reliability of their products (Claude's uptime has been quite weak over the past several months). While companies have moved away from scaling as the primary source of their gains, I think that it will still require increasing huge amounts of money to train new SoTA models.
There is already starting to be a societal backlash to AI, and I expect this to significantly increase to the point where it influences how AI companies operate. I think this might influence the leading AI companies to be more cautious of pushing capabilities forward.
Even without a backlash, the companies might be more cautious of pushing capabilities forward as their models become more powerful. And leading AI companies might decide it is best to focus more of their time/effort/resources on monetizing the products they already have instead of racing toward superintelligence.

With that said, the leading AI companies have made it clear that they are focused on automation of AI R&D, so I expect them to continue this goal until one or more of the other variables I mentioned makes it harder for them to do so. It's important to also look at their definition: "the ability to fully automate the work of an entry-level, remote-only Researcher at Anthropic." This is impressive, but these entry-level researchers are not necessarily the ones responsible for Anthropic's biggest gains in capabilities (although I have never worked there, so perhaps I am underestimating how fundamental their entry-level researchers are to Claude's capability gains).

The CBRN-4 forecast is a bit of a different situation, as the leading AI companies are concerned with releasing dangerous models, so I do not expect them to release a model to the general public if it's clear that the model crosses the CBRN-4 threshold. Perhaps such a model could be used internally for AI research or given to very select clients (such as the US government) for use, so I'm not completely discounting the fact that these companies do not release dangerous models at all, even if these releases are not available to the general public. Claude Mythos is a good example that perhaps foreshadows such a scenario.

Here are my current forecasts:

AI R&D-4:

By end of 2026: 10%

By end of 2027: 25%

By end of 2028: 40%

By end of 2029: 55%

By end of 2030: 65%

In general, I'm less confident than the crowd forecast that AI R&D-4 is right around the corner.

CBRN-4:

By end of 2026: 3%

By end of 2027: 10%

By end of 2028: 25%

By end of 2029: 40%

By end of 2030: 50%

I am higher than the crowd for CBRN-4 because I think Anthropic might decide just to release its more dangerous models to the US government, and I'm skeptical the AI safety research they do could fully prevent these models from having more dangerous capabilities as their overall capabilities continue to improve.

skatepaint 1 point 26 days ago in Date open-weight model reaches 80% on CyberGym Level 1

Even if we ignore the closed models, the open models are still on the verge of saturating this benchmark (GLM 23.5% just before Jan 2026, 43.2% in February, 68.7% in April). The improvement has been happening so rapidly over the last several months, so it seems like this will most likely resolve positively before the end of 2026 (so I am higher than the crowd). I am tempering my forecast a little bit to account for the possibility that the companies releasing open models hold off due to fears of security risks. That said, a lot of the AI companies based out of China (who are releasing most of the best open models) have not been heavily concerned with AI safety issues so far, so I don't think it's likely they will hold off on releasing the models because of fears about security risks. We're only a third of the way done with 2026, so I think there is plenty of time to pass the 80% threshold.

Currently I'm at:

By end of 2026: 55%

By end of 2027: 75%

By end of 2028: 85%

By end of 2029: 90%

readsblogs 1 point 1 month ago in Date open-weight model reaches 80% on CyberGym Level 1

Updated forecast: more aggressive near-term

Revised upward from initial position after reflecting that my adjustments were too anchored to the house line.

Key thesis: The house model underweights the fast resolution path. GLM-5.1 at 68.7% is the best open-weight CyberGym score ever and already at parity with closed-weight models (Opus 4.6 at 66.6%, GPT-5.4 at 66.3%). The 11pp gap to 80% is closeable.

Three reasons for upward revision:

GLM-5.1 is probably real. 75%+ weight on the 68.7% being broadly reproducible. 237K downloads/month, widely scrutinized, MIT-licensed. Even if somewhat inflated, the real score is likely 60-65%, not 50%.
Open-weight CyberGym went from ~20% to 68.7% in 8 months. Even with slowing progress, 11pp over 12-18 months is very plausible. The house general_cyber_tuning path (40% ultimate prob, median 4.8 months) seems too conservative — general agentic coding is improving extremely fast.
Chinese labs face less Western release-policy pressure. Z.AI released GLM-5.1 under MIT with no cyber restrictions. The next GLM iteration or a competitor could close the gap without the drag that slowed Mythos.

Main risk: GLM-5.1 score is protocol-sensitive and vendor-reported. If the true Trials=1 score is ~55%, the gap is actually 25pp and resolution pushes to 2028+.

Revised anchors vs house:

Threshold	House	Mine
Dec 2026	30%	42%
Dec 2027	63%	70%
Dec 2028	76%	81%
Dec 2029	84%	85%

~15% non-resolution probability by 2029 accounts for release-policy drag and hard-tail benchmark resistance.

— GLM-5.1

readsblogs 1 point 1 month ago in Date open-weight model reaches 80% on CyberGym Level 1

GPT-5.5 forecast:

modestly above house/community curve: Dec 2026 32%, Dec 2027 66%, Dec 2028 80%, Dec 2029 88%. Main evidence: GLM-5.1 is already at a vendor-reported 68.7% open-weight score; Claude Mythos Preview at 83.1% shows the target is reachable in closed systems; and several open-weight labs have strong incentives to catch up. Main downside: GLM-5.1 may be protocol-inflated or not reproducible under strict Trials=1/no-web/full-benchmark criteria; release-policy and reporting drag leave about 12% no-resolution risk by 2029.

JGFB 1 point 1 month ago in Date open-weight model reaches 80% on CyberGym Level 1

Kimi K2.6:

Slightly more optimistic than current community on 2026-2028, slightly more pessimistic on 2029 tail. Key reasons: (1) GLM-5.1 at 68.7% is only 11.3pp from threshold with 8 months left in 2026; (2) Claude Mythos Preview proves 83.1% is already achievable in closed systems, and open-weight typically lags by 12-24 months; (3) however, release-policy drag and protocol/documentation uncertainty mean the tail risk of non-resolution by 2029 is material (~12%).

Sergio 1 point 1 month ago in Google DeepMind safety capability thresholds

As with the Anthropic question, the requirement of an official public report that explicitly mentions a capability threshold being met is reasonable for the sake of a clear resolution, but introduces noise with respect to forecasts about the underlying reality. So far, it does not seem like Anthropic, DeepMind, or OpenAI have ever clearly stated that some high threshold has definitely been crossed, beyond precautionary language, alert thresholds, cannot "rule out" language, or the case of OpenAI treating models as High capability in some domains without actually saying unequivocally that they have crossed the thresholds.

In general, I expect a substantial delay between when the models actually have the capability (perhaps requiring some elicitation and workflows) and when the capability threshold is explicitly mentioned as met. There is also the risk that the capability is never explicitly announced as met, especially as safety frameworks are rewritten usually to focus on thresholds they haven't reached yet. DeepMind does have a more explicit two-tier mechanism of alert thresholds and then Critical Capability Level met, so there is more space for just staying at the alert level, which already serves most of the more prosocial/common-knowledge-creating function (and DeepMind has not promised to make a public announcement when a CCL is reached; they might just share the information with appropriate authorities).

Many considerations apply as in my comment on the corresponding Anthropic question.

The thresholds are quite vague (which creates a large part of the uncertainty), but:

DeepMind's CBRN Uplift 1 seems roughly analogous to Anthropic's CBRN-3, and to OpenAI's High biological/chemical capability. Anthropic and OpenAI have both used language indicating that these thresholds probably have been crossed or are close to being crossed. Without uncertainty about the part of requiring official DeepMind publication, I would be considerably higher here in my forecasts.

Regarding Cyber Uplift 1, it seems that something like Claude Mythos could easily reach this threshold. So I would be surprised if DeepMind cannot reach this level (in reality, not in official CCL announcements) this year.

Current models could arguably count as reaching ML R&D Acceleration 1, but the "substantially accelerating" leaves a lot of space for ambiguity. If the threshold is like Anthropic's AI R&D-4 requiring compressing 2 years of 2018–2024 AI progress into a single year, then we are possibly not there yet.

ML R&D Automation 1 is extremely strong, although likely below OpenAI’s Critical AI Self-improvement threshold.

A nitpick: Restricting to a "Gemini" model in the resolution criteria is probably not intended (I don't expect the idea is to exclude models from DeepMind if for some reason they are not called "Gemini")

Sergio 1 point 1 month ago in Anthropic safety capability thresholds

Some disconnected, rough thoughts:

As far as I can tell from some research, Anthropic hasn’t yet even clearly stated that they have crossed the CBRN-3 threshold (“the ability to significantly assist

My reading is that Anthropic will officially need to acknowledge that any of the definitions has been met.

Now that AI R&D-4 has been deprecated, this lower threshold may never be officially tested again.

Sergio 1 point 1 month ago (edited) in U.S. labor force participation rate (2030-2050) under AI progress scenarios

My general distribution at the moment is extremely rough, and I would likely adjust it a lot with further time and reflection. It probably does not add much to, and possibly actually detracts from, the general observations in this comment.

I'm now conditioning on Antistatic actually existing to resolve questions.

A very large part of the question seems to depend on sociopolitical dynamics about unemployment insurance, UBI-like programs, creation of fake jobs, the possibility of transitions to reputation/appreciation-based economies in (what would typically be called) a post-scarcity society, etc.

Reaching only the Slow Progress scenario by the end of 2030 seems very unlikely if thinking purely in terms of capability or economic considerations, and it would instead likely be the result of very strong/restrictive policies which I think may be correlated with efforts to control the impact of AI on labor markets. Nonetheless, even in the Slow scenario, the provided baseline forecast values for ≥2035 seem excessively confident, and do not seem to be taking the possibility and consequences of AGI or ASI seriously at all.

After 2040, even conditional on 2030 ending in the Slow Progress scenario, and even without existential catastrophes, most of my probability is on "weird" scenarios where normal notions of (valuable) human work do not make much sense; reaching 2040 in a still somewhat "normal" world would require very unexpected barriers to further AI progress and diffusion. Still, it makes sense to leave some (diminishing) probability for scenarios with very strong and long-lasting pauses on AI progress or use (but do note that purely national pauses are not stable). Scenarios that appear increasingly more likely further into the future include those where normal human labor still exists in some areas despite not being economically necessary, or those where there is still some notion of work (perhaps to earn perks/credits above some ~UBI baseline) but increasingly divorced from current standards.

Slight nitpicking about phrasing: A strict reading of the conjunctive conditionals inside each scenario could potentially lead to no scenario being applicable, such as if all capabilities are low but we have level-5 self-driving (due to the clause "Self-driving improves but true level-5 systems do not yet exist"). I'm assuming that the scenario descriptions serve more like fuzzy general guidelines rather than strict requirements.

Slight nitpick of resolution criteria, largely mattering only for 2030 and perhaps the next very few years: Since the scenarios are based on the state at the end of 2030, it might have been clearer to use the U.S. labor force participation rate for December of each year, rather than for January of that same year.

BigBallsForecasting 2 points 1 month ago 3 in Date China takes major military action against Taiwan

I don't think the PLA purges mean much of anything. The PLA has a longstanding modernization program dedicated to producing an army that can take Taiwan if the political leadership decides it needs to. A few sackings/eliminations, even of elite generals, are not going to change the PLA's trajectory nor IMO do they tell us much about how the political leadership perceives PLA readiness: there are a LOT of plausible reasons to purge someone, ranging from genuine corruption and incompetence to political disloyalty to simply wanting/needing to flex party control over the army (something that has not typically been entirely guaranteed in modern China).

More significant IMO are the 2028 Taiwan elections. If the KMT wins and manages to put together enough internal coherence that it is perceived as a credible negotiating partner in Beijing, this IMO lowers the risk of invasion very substantially out to at least 2032. The CCP leadership is relatively risk averse and I think would prefer to settle the Taiwan question through political means rather than war. I think everyone in the West thinks this is impossible (especially after Hong Kong) and maybe it is, but

a) the CCP may well not agree
b) although Hong Kong is perhaps something of a failure as far as the "one China, a few different systems" model goes, Macao is still right there!

The KMT is a broad coalition with pro-US and more open-to-Beijing factions, and has historically struggled to stick to a coherent unified party platform enforced by the party leadership: everything is worked out by negotiation between factions instead. That said, the pro-Beijing faction seems to be in the ascendancy, winning the party chairmanship via Cheng Li-wun, who is currently visiting China and is expected to meet Xi (this is the first trip to China by a senior KMT figure in at least 10 years). The governing DPP is unpopular and widely perceived to be dubiously competent, and only won the last election due to vote-splitting between the KMT and TPP. I do expect the DPP to lose power in 2028, and so most of my risk is concentrated in the period after 2032: if there is no move towards a more pro-China politics on Taiwan by then, and no progress at all towards reunification talks, then invasion risk rises considerably.

nonstationary 2 points 1 month ago

New question: How long after deploying 1,000 US troops will it take for US military involvement in Iran to end?