xAI Colossus 2: World's First 1.5 GW AI Supercluster Is Now Training Grok 5

Elon Musk's xAI has completed the expansion of Colossus 2 to 1.5 gigawatts — making it the largest AI training cluster ever built by a wide margin. Here's what's inside, what it's training, and what it means for the broader compute race.

TL;DR

Colossus 2 reached 1.5 GW in April 2026 — world's first at this scale
850,000 NVIDIA GPUs across three buildings in Memphis, Tennessee
Primary purpose: training Grok 5 (est. 6 trillion parameters, MoE)
Power draw exceeds San Francisco's entire city peak load
OpenAI and Anthropic are targeting comparable scale in 2027+

From 1 GW to 1.5 GW: The Timeline

Colossus 2 went live in January 2026 as the world's first gigawatt-scale AI training cluster. At launch, xAI's facility housed roughly 600,000 NVIDIA GPUs and used more power than any previous AI data center by a factor of three.

At the January launch event, Elon Musk announced the roadmap publicly: "Upgrading to 1.5 GW in April." That target has now been hit. The expansion added approximately 250,000 additional GPUs and a third warehouse building — internally codenamed "MACROHARDRR" — to the existing Memphis campus.

The long-term roadmap points to nearly 2 GW of total capacity once all infrastructure work is complete, though xAI has not yet given a specific date for that milestone.

Inside the 1.5 GW Cluster

Spec	Colossus 2 (April 2026)
Total GPU count	~850,000 NVIDIA GPUs
Power capacity	1.5 GW (target); 350 MW confirmed cooling
Location	Memphis, Tennessee
Buildings	3 warehouses + adjacent land ("MACROHARDRR")
Power vs. city	Exceeds San Francisco peak load
Primary workload	Grok 5 training + inference
Long-term target	~2 GW (no date set)
Launched	January 2026 (1 GW), April 2026 (1.5 GW)

Note: Independent satellite imagery analysis by Tom's Hardware suggested cooling infrastructure may currently support closer to 350 MW of actual workload, despite the official 1 GW+ claims. xAI maintains the cluster is fully operational at stated capacity.

What Colossus 2 Is Training: Grok 5

The primary purpose of the 1.5 GW expansion is training Grok 5 — xAI's next flagship model, speculated to use a Mixture-of-Experts architecture with approximately 6 trillion total parameters. If accurate, that would make it one of the largest models ever trained by raw parameter count, rivaling GPT-5.5 "Spud" and the still-unannounced Claude Mythos.

xAI has not confirmed Grok 5's parameter count publicly, but multiple sources cite the 6T figure based on internal hiring documentation and infrastructure planning documents that surfaced earlier in 2026.

Model	Developer	Est. Params	Status	Compute Source
Grok 5	xAI	~6T (MoE)	Training	Colossus 2
GPT-5.5 "Spud"	OpenAI	1–10T (MoE)	Pretraining done	Stargate Abilene
Claude Mythos	Anthropic	~10T (speculated)	Early access	Undisclosed
Gemini 4 Ultra	Google DeepMind	Undisclosed	Rumored H2 2026	TPU v7 pods

The Compute Race: How xAI Compares

Before Colossus 2, the largest AI training clusters operated at 100–200 MW. OpenAI's Stargate Abilene — the cluster training GPT-5.5 — is estimated at around 500 MW, making xAI's facility three times larger by power consumption.

Cluster	Operator	Capacity	Timeline
Colossus 2	xAI	1.5 GW	April 2026 (live)
Stargate Abilene	OpenAI / Oracle	~500 MW	2026 (live)
Microsoft AI Campus	Microsoft / OpenAI	~400 MW	2026 (partial)
Google TPU v7 Pods	Google DeepMind	~600 MW (distributed)	2026
OpenAI / Anthropic GW clusters	Various	1 GW target	2027 or later

The scale advantage is significant: more compute translates directly into the ability to train larger models faster, run more ablation experiments, and keep inference costs lower through economies of scale. If xAI can successfully utilize its 1.5 GW build-out, it has a structural training advantage that competitors won't match for at least 12–18 months.

The Cooling Controversy

Not everyone accepts xAI's power claims at face value. In February 2026, Tom's Hardware published an analysis of satellite imagery showing that Colossus 2's visible cooling towers and infrastructure appeared consistent with approximately 350 MW of actual computational load — not the claimed 1 GW+.

xAI disputed the analysis, arguing that the thermal signature alone is not sufficient to determine compute density, particularly given the use of liquid cooling and high rack density configurations that reduce external heat rejection per GPU.

The controversy highlights a recurring pattern in AI infrastructure: companies announce capacity in power-contracted terms while actual computational utilization ramps up over months. The distinction matters when evaluating competitive claims about training timelines for models like Grok 5.

What This Means for the AI Industry

Colossus 2's 1.5 GW milestone represents more than a flex — it reflects a fundamental shift in how AI competition is structured. The frontier of AI development is now gated by infrastructure, not algorithms. The labs that win in the next two to three years are the ones that can build and fill gigawatt-scale clusters first.

For enterprises: Grok 5, once released, will likely be the most capable real-time model available via API, with pricing competitive to Grok 4.20.
For NVIDIA: 850,000 GPUs represents one of the largest single-customer purchases in GPU history — a significant tailwind for H200 and next-gen Blackwell Ultra demand.
For energy: AI data centers are now a material factor in regional grid planning. Memphis's infrastructure has been significantly upgraded to accommodate Colossus 2's load.
For OpenAI and Anthropic: Both companies are behind on raw compute scale, which may explain the urgency around Stargate and AWS expansion deals announced in Q1 2026.

Frequently Asked Questions

What is xAI Colossus 2?

Colossus 2 is xAI's AI supercomputer in Memphis, Tennessee. It became the world's first gigawatt-scale cluster in January 2026 and expanded to 1.5 GW in April 2026, with approximately 850,000 NVIDIA GPUs.

What is Colossus 2 training?

Primarily Grok 5 — xAI's next flagship model, speculated to have ~6 trillion parameters in a MoE architecture. The cluster also handles inference and other xAI research workloads.

How does Colossus 2 compare to OpenAI's Stargate?

Colossus 2 at 1.5 GW is roughly three times larger by power than Stargate Abilene (~500 MW). Both OpenAI and Anthropic are targeting comparable scale but not until 2027 or later.

Is the 1.5 GW figure accurate?

xAI claims 1.5 GW contracted capacity. Independent satellite analysis suggests current cooling infrastructure supports ~350 MW of actual load, with the gap explained by phased ramp-up and liquid cooling configurations.

Want an AI agent that works across every model — Grok, Claude, GPT, Gemini — without switching apps?

Try Happycapy Free

Sources

OpenAI Anthropic Anthropic Claude Google Gemini

← Back to all articles