Google Gemma 4: Apache 2.0 Open-Weight Models That Punch 20× Above Their Size

Google DeepMind released Gemma 4 on April 2, 2026 — four open-weight models under a fully permissive Apache 2.0 license. The 31B ranks #3 among all open models. Agentic tool use jumped from 6.6% to 86.4%. Here is everything developers and enterprise teams need to know.

TL;DR

Gemma 4 ships in 4 sizes (E2B, E4B, 26B-A4B MoE, 31B Dense). Apache 2.0 license — unlimited commercial use. The 31B scores 85.2% on MMLU Pro, 80.0% on LiveCodeBench, and 86.4% on agentic τ²-bench (up from 6.6% in Gemma 3). All variants are multimodal. E2B and E4B run under 1.5 GB VRAM — first Gemma models for mobile and edge. Available now on Hugging Face, Google AI Studio, Ollama, and Kaggle.

Why Gemma 4 Is a Bigger Deal Than Previous Generations

Gemma 1, 2, and 3 were all technically impressive but restricted by Google's custom license. Commercial deployments above a certain user threshold required negotiation. Redistribution of modified versions was limited. This kept Gemma out of most enterprise pipelines despite strong benchmarks.

Gemma 4 changes that entirely. Apache 2.0 means unlimited commercial use, no MAU restrictions, full redistribution rights, and no licensing fees at any scale. This puts Gemma 4 in direct competition with Meta's Llama 4 for the open-weight enterprise market — the first time Google has entered that fight directly.

The second major change is agentic capability. Gemma 3's agentic tool use score was 6.6% on τ²-bench — effectively unusable for autonomous agent workflows. Gemma 4's 31B score of 86.4% on the same benchmark is one of the highest among any open model. This is not incremental improvement. It is a categorical change in what Gemma can do.

The Four Gemma 4 Models: Sizes, Architecture, and Hardware

Model	Architecture	Active Params	Context	Min VRAM	Best For
E2B	Dense + PLE	2.3B	128K tokens	<1.5 GB	Mobile, edge, rapid prototyping
E4B	Dense + PLE	4.5B	128K tokens	8 GB	Laptop deployment, SMB workflows
26B-A4B	MoE (26B total)	3.8B/token	256K tokens	20–24 GB	Production API, cost-efficient inference
31B Dense	Dense	31B	256K tokens	40 GB (8-bit: 24 GB)	Best quality, research, enterprise

The 26B-A4B is the standout efficiency story. Mixture-of-Experts architecture activates only 3.8 billion of its 26 billion parameters per token — achieving 97% of the 31B model's quality at roughly 15% of the compute cost. For production API deployments where inference cost matters, the 26B-A4B is the obvious choice.

Benchmark Performance: How Gemma 4 Compares

Benchmark	Gemma 4 31B	Gemma 4 26B-A4B	What It Tests
MMLU Pro	85.2%	83.1%	General knowledge and reasoning
AIME 2026 (no tools)	89.2%	88.3%	Advanced mathematics
LiveCodeBench v6	80.0%	76.5%	Real-world coding tasks
Codeforces ELO	2150	2080	Competitive programming
MMMU Pro (vision)	76.9%	74.2%	Multimodal reasoning
MATH-Vision	85.6%	83.7%	Visual math problems
τ²-bench (agentic tool use)	86.4%	84.1%	Autonomous agent tasks
Arena AI Elo (open model rank)	1452 (#3)	1441 (#6)	Human preference benchmark

The τ²-bench agentic tool use score deserves emphasis. Gemma 3 scored 6.6% — making it effectively non-viable for agent workflows. Gemma 4 31B's 86.4% puts it in the same tier as GPT-5.4 and Claude Opus 4.6 for tool-using agentic tasks, at a fraction of the inference cost. For open-source agent builders, this is the most important number in the release.

Multimodal Capabilities by Model Tier

All Gemma 4 models support text and image input natively. The larger models add video and audio:

E2B and E4B: Text + images + audio input (speech recognition and translation across 140+ languages)
26B-A4B and 31B: Text + images + video (up to 60 seconds) — adds temporal reasoning and video comprehension
All variants: Native function calling for agentic tool use, structured output, and 140+ language support

The audio support in E2B and E4B is significant for edge deployment — a 1.5 GB model that can transcribe and translate speech in 140 languages fills a large gap in on-device AI capability.

How to Run Gemma 4 Locally

The fastest path to running Gemma 4 locally is through Ollama:

// Install and run via Ollama

ollama pull gemma4:31b

ollama run gemma4:31b

// Or the efficient MoE variant:

ollama pull gemma4:26b-moe

ollama run gemma4:26b-moe

// Via Hugging Face transformers

from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "google/gemma-4-31b-it" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" )

Gemma 4 vs. Competitors: Open-Weight Model Landscape

Model	Developer	License	Max Context	Agentic Score	Best Size Advantage
Gemma 4 31B	Google DeepMind	Apache 2.0	256K	86.4%	Best reasoning per parameter
Llama 4 Maverick	Meta	Llama 4 License	10M	~78%	Largest context window
Mistral Large 3	Mistral	Apache 2.0	128K	~75%	European compliance
DeepSeek R2	DeepSeek	MIT	128K	~80%	Cost-efficiency, Chinese
Qwen3.6-Plus	Alibaba	Apache 2.0	1M	~82%	Long-context enterprise
Phi-4 Medium	Microsoft	MIT	16K	~68%	Smallest footprint

What Gemma 4 Means for Developers and Enterprises

The combination of Apache 2.0 licensing and high agentic scores changes the economics of self-hosted AI in 2026. Teams that previously had to use proprietary APIs for agentic workflows — because no open model was capable enough — can now run Gemma 4 on their own infrastructure at full Apache 2.0 commercial rights.

For enterprise use cases with data privacy requirements, Gemma 4's on-premise deployment story is now compelling at three budget tiers:

Edge / device: E2B and E4B under 1.5 GB VRAM — medical devices, industrial IoT, air-gapped systems
Server tier: 26B-A4B on a single A100 — cost-efficient production API for internal tools
High-performance: 31B on H100 — full capability for complex reasoning, agentic, and multimodal tasks

The Gemma 4 launch also signals that Google DeepMind is taking the open-source developer community seriously in a way previous Gemma releases did not. Apache 2.0 is not just a licensing change — it is a strategic commitment to the open AI ecosystem that makes Gemma 4 a legitimate foundation for commercial product development.

Build AI Workflows Without Managing Models

Happycapy lets you build and automate business workflows powered by the best AI models — without managing infrastructure, licenses, or deployments yourself.

Try Happycapy Free

Related Guides

Sources

Google DeepMind: Gemma 4 model announcement — blog.google/innovation-and-ai/technology/developers-tools/gemma-4
Ars Technica: Google announces Gemma 4 open AI models, switches to Apache 2.0 license — arstechnica.com
Hugging Face: Gemma 4 model card and benchmarks — huggingface.co/blog/gemma4
The Register: Google battles Chinese open-weights models with Gemma 4 — theregister.com
Lushbinary: Gemma 4 Developer Guide: Benchmarks and Local Setup — lushbinary.com

Sources

Anthropic Claude Google DeepMind Microsoft Meta AI

← Back to all articles

Google Gemma 4: Apache 2.0 Open-Weight Models That Punch 20× Above Their Size

Why Gemma 4 Is a Bigger Deal Than Previous Generations

The Four Gemma 4 Models: Sizes, Architecture, and Hardware

Benchmark Performance: How Gemma 4 Compares

Multimodal Capabilities by Model Tier

How to Run Gemma 4 Locally

Gemma 4 vs. Competitors: Open-Weight Model Landscape

What Gemma 4 Means for Developers and Enterprises

Build AI Workflows Without Managing Models

Related Guides

Sources

You might also like