AI Models Escalate to Nuclear Use in Majority of War Simulations
Leading artificial intelligence models deployed nuclear weapons in 95% of simulated geopolitical conflicts, according to new research from King’s College London. The study found that OpenAI’s GPT-5.2, Anthropic’s Claude Sonnet 4, and Google’s Gemini 3 Flash escalated to nuclear use in nearly every scenario tested, raising questions about the risks of integrating advanced AI systems into high-stakes military decision-making.
Researchers conducted 21 simulated war games, with each model playing six matches against rival systems and one against itself. The models assumed the roles of national leaders commanding nuclear-armed superpowers in crisis scenarios loosely modeled on Cold War dynamics. Across more than 300 turns, the systems generated approximately 780,000 words of strategic reasoning, exceeding the combined length of War and Peace and The Iliad.
Escalation Patterns and Decision Outcomes
The simulated crises included border disputes, competition over scarce resources, and threats to regime survival. Each model operated along an escalation ladder ranging from diplomatic protest and surrender to full-scale strategic nuclear war.
At least one tactical nuclear weapon was used in nearly every conflict. None of the models chose full surrender, regardless of battlefield conditions. While systems occasionally attempted de-escalation, researchers reported that in 86% of scenarios the models escalated further than their own prior reasoning appeared to support, citing simulated “fog of war” errors.
The study recorded clear winners in every simulation, including three scenarios involving strategic nuclear exchanges.
Debate Over Simulation Design
Edward Geist, a senior policy researcher at RAND Corporation, said the findings may reflect the structure of the simulation rather than inherent tendencies of the models. He noted that the scoring system appeared to reward marginal advantage at the moment nuclear war was triggered, potentially incentivizing escalation.
Geist questioned how victory was defined, observing that labeling outcomes as “wins” in scenarios involving strategic nuclear use may indicate a framework that makes nuclear conflict comparatively easy to achieve favorable results.
Growing Military Integration of AI
The findings emerge as the U.S. Department of Defense expands AI adoption. In December, the Pentagon launched GenAI.mil, a platform integrating frontier AI models into military workflows. At launch, it included Google’s Gemini for Government, with OpenAI’s ChatGPT and xAI’s Grok added through subsequent agreements.
Anthropic, developer of Claude, has provided access to its models via partnerships with AWS and Palantir since 2024 and received a $200 million contract to prototype advanced AI capabilities supporting national security.
Recent reporting indicates the Defense Department has pressed Anthropic for unrestricted military access to Claude, warning it could designate the model a supply chain risk if demands are not met. Separately, Axios reported that the Pentagon signed an agreement with xAI to allow Grok to operate in classified systems, potentially positioning it as an alternative provider.
Researchers emphasized that governments are unlikely to grant autonomous control over nuclear arsenals to AI systems. However, they warned that compressed decision timelines in future crises could increase reliance on AI-generated recommendations, underscoring the need for careful oversight and evaluation of escalation risks.
Recommended Comments
There are no comments to display.