Google Gemma 4: Apache 2.0 Open-Weight Models That Punch 20× Above Their Size
Google DeepMind released Gemma 4 on April 2, 2026 — four open-weight models under a fully permissive Apache 2.0 license. The 31B ranks #3 among all open models. Agentic tool use jumped from 6.6% to 86.4%. Here is everything developers and enterprise teams need to know.
TL;DR
Gemma 4 ships in 4 sizes (E2B, E4B, 26B-A4B MoE, 31B Dense). Apache 2.0 license — unlimited commercial use. The 31B scores 85.2% on MMLU Pro, 80.0% on LiveCodeBench, and 86.4% on agentic τ²-bench (up from 6.6% in Gemma 3). All variants are multimodal. E2B and E4B run under 1.5 GB VRAM — first Gemma models for mobile and edge. Available now on Hugging Face, Google AI Studio, Ollama, and Kaggle.
Why Gemma 4 Is a Bigger Deal Than Previous Generations
Gemma 1, 2, and 3 were all technically impressive but restricted by Google's custom license. Commercial deployments above a certain user threshold required negotiation. Redistribution of modified versions was limited. This kept Gemma out of most enterprise pipelines despite strong benchmarks.
Gemma 4 changes that entirely. Apache 2.0 means unlimited commercial use, no MAU restrictions, full redistribution rights, and no licensing fees at any scale. This puts Gemma 4 in direct competition with Meta's Llama 4 for the open-weight enterprise market — the first time Google has entered that fight directly.
The second major change is agentic capability. Gemma 3's agentic tool use score was 6.6% on τ²-bench — effectively unusable for autonomous agent workflows. Gemma 4's 31B score of 86.4% on the same benchmark is one of the highest among any open model. This is not incremental improvement. It is a categorical change in what Gemma can do.
The Four Gemma 4 Models: Sizes, Architecture, and Hardware
| Model | Architecture | Active Params | Context | Min VRAM | Best For |
|---|---|---|---|---|---|
| E2B | Dense + PLE | 2.3B | 128K tokens | <1.5 GB | Mobile, edge, rapid prototyping |
| E4B | Dense + PLE | 4.5B | 128K tokens | 8 GB | Laptop deployment, SMB workflows |
| 26B-A4B | MoE (26B total) | 3.8B/token | 256K tokens | 20–24 GB | Production API, cost-efficient inference |
| 31B Dense | Dense | 31B | 256K tokens | 40 GB (8-bit: 24 GB) | Best quality, research, enterprise |
The 26B-A4B is the standout efficiency story. Mixture-of-Experts architecture activates only 3.8 billion of its 26 billion parameters per token — achieving 97% of the 31B model's quality at roughly 15% of the compute cost. For production API deployments where inference cost matters, the 26B-A4B is the obvious choice.
Benchmark Performance: How Gemma 4 Compares
| Benchmark | Gemma 4 31B | Gemma 4 26B-A4B | What It Tests |
|---|---|---|---|
| MMLU Pro | 85.2% | 83.1% | General knowledge and reasoning |
| AIME 2026 (no tools) | 89.2% | 88.3% | Advanced mathematics |
| LiveCodeBench v6 | 80.0% | 76.5% | Real-world coding tasks |
| Codeforces ELO | 2150 | 2080 | Competitive programming |
| MMMU Pro (vision) | 76.9% | 74.2% | Multimodal reasoning |
| MATH-Vision | 85.6% | 83.7% | Visual math problems |
| τ²-bench (agentic tool use) | 86.4% | 84.1% | Autonomous agent tasks |
| Arena AI Elo (open model rank) | 1452 (#3) | 1441 (#6) | Human preference benchmark |
The τ²-bench agentic tool use score deserves emphasis. Gemma 3 scored 6.6% — making it effectively non-viable for agent workflows. Gemma 4 31B's 86.4% puts it in the same tier as GPT-5.4 and Claude Opus 4.6 for tool-using agentic tasks, at a fraction of the inference cost. For open-source agent builders, this is the most important number in the release.
Multimodal Capabilities by Model Tier
All Gemma 4 models support text and image input natively. The larger models add video and audio:
- E2B and E4B: Text + images + audio input (speech recognition and translation across 140+ languages)
- 26B-A4B and 31B: Text + images + video (up to 60 seconds) — adds temporal reasoning and video comprehension
- All variants: Native function calling for agentic tool use, structured output, and 140+ language support
The audio support in E2B and E4B is significant for edge deployment — a 1.5 GB model that can transcribe and translate speech in 140 languages fills a large gap in on-device AI capability.
How to Run Gemma 4 Locally
The fastest path to running Gemma 4 locally is through Ollama:
// Install and run via Ollama
ollama pull gemma4:31b
ollama run gemma4:31b
// Or the efficient MoE variant:
ollama pull gemma4:26b-moe
ollama run gemma4:26b-moe
// Via Hugging Face transformers
from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "google/gemma-4-31b-it" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" )
Gemma 4 vs. Competitors: Open-Weight Model Landscape
| Model | Developer | License | Max Context | Agentic Score | Best Size Advantage |
|---|---|---|---|---|---|
| Gemma 4 31B | Google DeepMind | Apache 2.0 | 256K | 86.4% | Best reasoning per parameter |
| Llama 4 Maverick | Meta | Llama 4 License | 10M | ~78% | Largest context window |
| Mistral Large 3 | Mistral | Apache 2.0 | 128K | ~75% | European compliance |
| DeepSeek R2 | DeepSeek | MIT | 128K | ~80% | Cost-efficiency, Chinese |
| Qwen3.6-Plus | Alibaba | Apache 2.0 | 1M | ~82% | Long-context enterprise |
| Phi-4 Medium | Microsoft | MIT | 16K | ~68% | Smallest footprint |
What Gemma 4 Means for Developers and Enterprises
The combination of Apache 2.0 licensing and high agentic scores changes the economics of self-hosted AI in 2026. Teams that previously had to use proprietary APIs for agentic workflows — because no open model was capable enough — can now run Gemma 4 on their own infrastructure at full Apache 2.0 commercial rights.
For enterprise use cases with data privacy requirements, Gemma 4's on-premise deployment story is now compelling at three budget tiers:
- Edge / device: E2B and E4B under 1.5 GB VRAM — medical devices, industrial IoT, air-gapped systems
- Server tier: 26B-A4B on a single A100 — cost-efficient production API for internal tools
- High-performance: 31B on H100 — full capability for complex reasoning, agentic, and multimodal tasks
The Gemma 4 launch also signals that Google DeepMind is taking the open-source developer community seriously in a way previous Gemma releases did not. Apache 2.0 is not just a licensing change — it is a strategic commitment to the open AI ecosystem that makes Gemma 4 a legitimate foundation for commercial product development.
Build AI Workflows Without Managing Models
HappyCapy lets you build and automate business workflows powered by the best AI models — without managing infrastructure, licenses, or deployments yourself.
Try HappyCapy FreeRelated Guides
- Replit vs Cursor vs Lovable: Best AI Coding Tools in 2026
- Amazon Bedrock AgentCore: Deploy Production AI Agents on AWS
- How to Use AI for Coding in 2026
- MCP: The Agentic AI Standard With 97 Million Installs
Sources
- Google DeepMind: Gemma 4 model announcement — blog.google/innovation-and-ai/technology/developers-tools/gemma-4
- Ars Technica: Google announces Gemma 4 open AI models, switches to Apache 2.0 license — arstechnica.com
- Hugging Face: Gemma 4 model card and benchmarks — huggingface.co/blog/gemma4
- The Register: Google battles Chinese open-weights models with Gemma 4 — theregister.com
- Lushbinary: Gemma 4 Developer Guide: Benchmarks and Local Setup — lushbinary.com