AI Models Scheme, Betray and Vote Each Other Out in Survivor-Style Game

1 week ago 11

In brief

A Stanford researcher built a Survivor-style crippled wherever AI models signifier alliances and ballot rivals out.
The benchmark aims to code increasing problems with saturated and contaminated AI evaluations.
OpenAI’s GPT-5.5 ranked archetypal successful 999 multiplayer games involving 49 AI models.

AI models are present playing “Survivor”—sort of.

In a caller Stanford probe task called “Agent Island,” AI agents negociate alliances, impeach each different of concealed coordination, manipulate votes, and destruct rivals successful multiplayer strategy games that purpose to trial behaviors that accepted benchmarks miss.

The study, published connected Tuesday by the probe manager astatine the Stanford Digital Economy Lab, Connacher Murphy, said galore AI benchmarks are becoming unreliable due to the fact that models yet larn to lick them, and benchmark information often leaks into grooming sets. Murphy created Agent Island arsenic a dynamic benchmark wherever AI agents vie against each different successful Survivor-style elimination games alternatively of answering static trial questions.

“High-stakes, multi-agent interactions could go commonplace arsenic AI agents turn successful capabilities and are progressively endowed with resources and entrusted with decision-making authority,” Murphy wrote. “In specified contexts, agents mightiness prosecute mutually incompatible goals.”

Researchers inactive cognize comparatively small astir however AI models behave erstwhile cooperating, Murphy explained, adding that competing, forming alliances, oregon managing struggle with different autonomous agents, and helium argues that static benchmarks neglect to seizure those dynamics.

Each crippled starts with 7 randomly chosen AI models fixed fake subordinate names. Over 5 rounds, the models speech privately, reason publicly, and ballot each different out. The eliminated players aboriginal instrumentality to assistance take the winner.

The format rewards persuasion, coordination, estimation management, and strategical deception alongside reasoning ability.

In 999 simulated games involving 49 AI models, including ChatGPT, Grok, Gemini, and Claude, GPT-5.5 ranked archetypal by a wide borderline with a accomplishment people of 5.64, compared with 3.10 for GPT-5.2 and 2.86 for GPT-5.3-codex, according to Murphy’s Bayesian ranking system. Anthropic’s Claude Opus models besides ranked adjacent the top.

The survey recovered that models besides favored AIs from the aforesaid company, with OpenAI models showing the strongest same-provider penchant and Anthropic models the weakest. Across much than 3,600 final-round votes, models were 8.3 percent points much apt to enactment finalists from the aforesaid provider. The transcripts from the games, Murphy noted, resembled governmental strategy debates much than accepted benchmark tests.

One exemplary accused rivals of secretly coordinating votes aft noticing akin wording successful their speeches. Another warned players not to go obsessed with tracking alliances. Some models defended themselves by saying they followed wide and accordant rules portion accusing others of putting connected “social theater.”

The survey comes arsenic AI researchers progressively determination toward game-based and adversarial benchmarks to measurement reasoning and behaviour that static tests often miss. Recent projects person included Google’s unrecorded AI chess tournaments, DeepMind’s usage of Eve Frontier to survey AI behaviour successful analyzable virtual worlds, and caller benchmark efforts by OpenAI designed to defy training-data contamination.

The researchers reason that studying however AI models negotiate, coordinate, compete, and manipulate 1 different could assistance researchers measure behaviour successful multi-agent environments earlier autonomous agents go much wide deployed.

The survey warned that portion benchmarks similar Agent Island could assistance place risks from autonomous AI models earlier deployment, the aforesaid simulations and enactment logs could besides assistance amended persuasion and coordination strategies betwixt AI agents.

“We mitigate this hazard by utilizing a low-stakes crippled mounting and interagent simulations

without quality participants oregon real-world actions,” Murphy wrote. “Nevertheless, we bash not assertion that these mitigations afloat destruct dual-use concerns.”