By Connie · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.
Google DeepMind's AI Rewrites Its Own Game Theory Algorithms — Beats Every Human Expert
Google DeepMind published research on April 3, 2026 showing that AlphaEvolve — its LLM-driven algorithm discovery system powered by Gemini 2.5 Pro — automatically rewrote two foundational game theory algorithm families and produced superior variants. VAD-CFR beat human-designed CFR in 10 of 11 benchmark games. SHOR-PSRO beat human-designed PSRO in 8 of 11 games. Neither algorithm was designed by a human — both emerged from AI-driven evolutionary search over source code.
For decades, game theory algorithms like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) were designed by teams of specialized researchers — hand-crafting mathematical rules to find near-optimal strategies in competitive multi-agent scenarios. The best versions took years to develop and required deep domain expertise.
On April 3, 2026, Google DeepMind published research showing that its AlphaEvolve system had automated this process entirely. By using Gemini 2.5 Pro to mutate algorithm source code through an evolutionary loop, AlphaEvolve discovered two new algorithms — VAD-CFR and SHOR-PSRO — that outperform the best human-designed equivalents in standardized benchmarks. The expertise has shifted from designing algorithms to designing the environments that evaluate them.
What AlphaEvolve Actually Does
AlphaEvolve is not a game-playing AI like AlphaZero or AlphaGo. Those systems search for the best strategy within a fixed game. AlphaEvolve searches for the best algorithm — it modifies the source code of existing algorithms to find better ones. This distinction matters: AlphaEvolve is doing automated research, not automated gameplay.
The process works through evolutionary search. AlphaEvolve takes a parent algorithm's source code, uses Gemini 2.5 Pro to propose mutations — adding new parameters, changing how values are updated, introducing new weighting mechanisms — and then evaluates the mutated variants on proxy games. Variants that are harder for opponents to exploit (lower "exploitability" scores) survive and become parents for the next generation. Variants that perform worse are discarded.
The key difference from prior approaches like FunSearch (2023) and AlphaDev (2023) is scale. Those systems generated short snippets of code. AlphaEvolve generates and evaluates complex, hundreds-of-lines-long programs — full algorithm implementations with dozens of interacting parameters.
The Two New Algorithms
VAD-CFR: Volatility-Adaptive Discounted CFR
CFR is the algorithmic foundation behind modern poker AI — it was used in DeepStack (2017) and is standard in academic game theory research. CFR works by iteratively simulating game play, tracking "regret" for decisions not taken, and shifting strategy toward choices with lower regret. Different versions discount old regret at different rates to focus on recent information.
The problem with existing CFR variants is that they use static discounting — the same formula regardless of how volatile the game state is. AlphaEvolve discovered that dynamically adjusting discounting based on measured volatility works better. VAD-CFR uses an Exponential Weighted Moving Average (EWMA) of instantaneous regret to detect volatility and adjusts its discounting factor accordingly. When the game is stable, it holds older information longer. When the game is volatile, it discounts aggressively to respond to new patterns.
The mechanism was not designed theoretically — AlphaEvolve found it through search. DeepMind researchers subsequently worked backward to understand why it works, which required deriving new theoretical proofs after the fact.
SHOR-PSRO: Smoothed Hybrid Optimistic Regret PSRO
PSRO is a different framework — a meta-solver that trains specialist agents against each other and uses game theory to find an optimal mixture of their strategies. Where CFR operates within a game, PSRO operates at the level of which strategies to train and how to weight them.
AlphaEvolve's SHOR-PSRO blends two existing approaches: Optimistic Regret Matching, which adds an optimism bias to encourage exploration of strategies that previously worked, and Smoothed Best Pure Strategy, which uses a Boltzmann distribution to select among existing strategies rather than committing to the best single option. Neither combination was in the prior literature. AlphaEvolve discovered it by searching the space of possible hybrids.
Benchmark Results
| Algorithm | Baseline | Games Won (of 11) | Only Failure |
|---|---|---|---|
| VAD-CFR | Best human CFR variant | 10 / 11 | 4-player Kuhn Poker |
| SHOR-PSRO | Best human PSRO variant | 8 / 11 | 3 smaller games |
All evaluations used the OpenSpiel framework, a standard research platform for game theory benchmarking. DeepMind emphasized that both algorithms were evaluated on larger, unseen games to confirm the results represent genuine generalization — not overfitting to the proxy games used during the evolutionary search.
Try Happycapy — Access Gemini 3.1, Claude Opus 4.6, GPT-5.4 in One Platform →Why This Matters Beyond Gaming
The benchmark games used — Kuhn Poker, Leduc Poker, Goofspiel, and others — are proxies for a much larger class of problems. Any system where multiple agents make decisions under incomplete information is a candidate for these algorithms.
- Automated trading: Markets are imperfect-information games where multiple agents compete. CFR-based algorithms are already used in market-making and liquidity provision. VAD-CFR's volatility-adaptive discounting is directly applicable to markets that shift between stable and volatile regimes.
- Negotiation systems: AI agents negotiating contracts, supply chain terms, or resource allocation operate in adversarial settings. PSRO-based meta-solvers help find robust negotiation strategies across diverse counterparties.
- Cybersecurity defense: Attacker-defender models are imperfect-information games. Better equilibrium-finding algorithms enable more robust intrusion detection and adversarial testing systems.
- Autonomous vehicle coordination: Intersections and merges are multi-agent coordination problems where each vehicle has incomplete information about others' intentions. Game theory algorithms underpin the decision-making logic in many AV coordination systems.
The Bigger Shift: AI Is Now Doing Its Own Research
AlphaEvolve's game theory results are the latest in a series of DeepMind publications showing AI systems discovering things their designers did not know. FunSearch found new combinatorial mathematics results in 2023. AlphaDev optimized sorting algorithms that had not been improved in decades. AlphaEvolve is extending this pattern to a more complex domain — multi-agent game theory — where the search space of possible algorithms is vast.
"The expertise moves from designing algorithms to designing the evaluation environments that guide the search." — Google DeepMind research team
The practical implication is that the bottleneck for algorithm design is shifting. Human researchers no longer need to know the answer in advance — they need to define the right fitness criteria and let the system discover what works. For industries that depend on game-theoretic reasoning, this changes the R&D process significantly.
For AI practitioners exploring frontier models and agentic workflows, Happycapy provides direct access to Gemini 2.5 Pro, Claude Opus 4.6, and GPT-5.4 — the same generation of models powering research like AlphaEvolve — through a single interface.
Build with Every Frontier AI Model — Start Free on Happycapy →Frequently Asked Questions
What is Google DeepMind's AlphaEvolve game theory research?
AlphaEvolve is a system that uses Gemini 2.5 Pro to automatically evolve game theory algorithms via evolutionary search over source code. In April 2026, DeepMind showed AlphaEvolve discovered VAD-CFR and SHOR-PSRO — algorithms that beat the best human-designed equivalents in 10/11 and 8/11 benchmark games.
What are VAD-CFR and SHOR-PSRO?
VAD-CFR (Volatility-Adaptive Discounted CFR) dynamically adjusts regret discounting based on game volatility using EWMA, improving on static CFR variants. SHOR-PSRO (Smoothed Hybrid Optimistic Regret PSRO) blends Optimistic Regret Matching with Smoothed Best Pure Strategy in a way no prior PSRO research had tried. Both were discovered by AI, not humans.
How does AlphaEvolve differ from AlphaZero?
AlphaZero searches for the best strategy within a fixed game. AlphaEvolve searches the space of algorithms themselves — it mutates source code to discover new algorithms that generalize across many games. AlphaEvolve is automated research; AlphaZero is automated game-playing.
What real-world applications does this have?
The algorithms apply to any domain with multi-agent reasoning under incomplete information: automated trading, AI negotiation systems, cybersecurity attacker-defender models, and autonomous vehicle intersection coordination. These are all imperfect-information games at their core.
MarkTechPost — "Google DeepMind's Research Lets an LLM Rewrite Its Own Game Theory Algorithms" (April 3, 2026)
Startup Fortune — "Google DeepMind Lets AI Rewrite Its Own Algorithms, Beating Human Experts" (April 3, 2026)
Google DeepMind — AlphaEvolve research publication (April 3, 2026)
OpenSpiel — DeepMind open-source game theory research framework
Get the best AI tools tips — weekly
Honest reviews, tutorials, and Happycapy tips. No spam.