By Connie · Last reviewed: April 2026 — pricing & tools verified · AI-assisted, human-edited · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.

Model Release

Cohere Transcribe: Open-Source ASR Beats Whisper with 5.4% Word Error Rate

Q: What is Cohere Transcribe?

Cohere Transcribe is a 2-billion-parameter open-source automatic speech recognition (ASR) model released under Apache 2.0 on March 26, 2026. It achieves a 5.42% average word error rate, ranking #1 on the Hugging Face Open ASR Leaderboard.

Q: How does Cohere Transcribe compare to OpenAI Whisper?

Cohere Transcribe achieves a 5.42% word error rate versus Whisper Large v3's 7.44% — a 27% accuracy improvement. It also delivers 3x higher throughput and runs on consumer-grade GPUs like the RTX 3060.

Q: What languages does Cohere Transcribe support?

Cohere Transcribe supports 14 languages including English, Chinese, Japanese, Arabic, French, German, Spanish, Portuguese, Dutch, Italian, Polish, Russian, Korean, and Hindi. A language code must be specified at inference time — it does not auto-detect.

Q: Is Cohere Transcribe free to use?

Yes. Cohere Transcribe is available for free under Apache 2.0 for local deployment via Hugging Face. Cohere also offers a free rate-limited API and a paid managed 'Model Vault' for production workloads without rate limits.

March 26, 2026 · 7 min read · Happycapy Guide

TL;DR

Cohere released Cohere Transcribe on March 26, 2026 — a 2B-parameter open-source ASR model that ranks #1 on the Hugging Face Open ASR Leaderboard with a 5.42% word error rate, beating OpenAI Whisper Large v3 (7.44%) by 27%. It supports 14 languages, runs on a consumer RTX 3060, and is free under Apache 2.0.

Cohere, best known as the enterprise large language model company, just made a surprising entry into audio AI. On March 26, 2026, it released Cohere Transcribe — a dedicated speech recognition model that immediately claimed the top spot on the Hugging Face Open ASR Leaderboard, surpassing OpenAI Whisper, ElevenLabs Scribe, Zoom Scribe, and IBM Granite Speech.

Unlike most speech models that fine-tune existing text LLMs, Cohere built Transcribe from scratch on 500,000 hours of audio-transcript pairs using a Fast-Conformer encoder-decoder architecture — prioritizing accuracy and throughput over model generality.

Benchmark Results: #1 on Open ASR Leaderboard

Cohere Transcribe's average word error rate of 5.42% puts it ahead of every model on the Hugging Face Open ASR Leaderboard as of its launch date.

Model	Avg WER ↓	License	Params
Cohere Transcribe	5.42%	Apache 2.0	2B
Zoom Scribe v1	5.47%	Proprietary	—
IBM Granite 4.0 Speech 1B	5.52%	Apache 2.0	1B
ElevenLabs Scribe v2	5.83%	Proprietary	—
OpenAI Whisper Large v3	7.44%	MIT	1.5B

Human evaluation results reinforce the leaderboard rankings: in English pairwise comparisons, humans preferred Cohere Transcribe over Whisper Large v3 in 64% of test cases and over ElevenLabs Scribe v2 in 51% of cases. Japanese showed an even stronger preference, with Cohere winning 66–70% of comparisons.

Architecture: Built From Scratch for Speed

Cohere Transcribe uses a Conformer-based encoder-decoder architecture trained on 500,000 hours of audio. Over 90% of its 2 billion parameters are in the Fast-Conformer encoder, which handles acoustic representation. A lightweight Transformer decoder converts the encoded audio to text.

This design choice explains why Transcribe achieves 3x higher offline throughput than similarly-sized models. It is purpose-built for transcription, not general audio understanding — which makes it both faster and more accurate for the specific task of converting speech to text.

Hardware requirements are remarkably modest. Cohere confirmed that Transcribe runs on consumer-grade GPUs (RTX 3060 and above) rather than requiring enterprise A100 or H100 clusters. This opens the door for self-hosted deployment in organizations with data privacy requirements.

Dataset Benchmarks

Dataset	WER	Notes
LibriSpeech (clean)	1.25%	Read speech, studio conditions
LibriSpeech (other)	2.37%	Harder read speech
TED-LIUM	2.49%	Conference talks
SPGISpeech	3.08%	Financial audio
VoxPopuli	5.87%	Diverse accents, EU Parliament
GigaSpeech	9.34%	Web audio, podcasts
Earnings22	10.86%	Financial earnings calls
AMI (multi-speaker)	8.13–8.15%	Meeting transcription

Try Happycapy — Use Claude, GPT-4.1, Gemini 3 & More in One App →

14 Languages Supported

Cohere Transcribe supports the following languages: English, Chinese (Mandarin), Japanese, Arabic, French, German, Spanish, Portuguese, Dutch, Italian, Polish, Russian, Korean, and Hindi. Unlike Whisper's 100+ language support, Transcribe is optimized for quality over breadth — each supported language is thoroughly represented in training data.

A notable limitation: language must be specified at inference time. Transcribe does not auto-detect the spoken language. For multilingual pipelines, you will need a language identification step upstream.

Enterprise Limitations to Know

Cohere Transcribe is production-ready for the right use cases, but three limitations affect enterprise deployments:

No speaker diarization. The model transcribes speech but does not separate or label individual speakers. Third-party diarization tools (e.g., pyannote.audio) are needed for meeting transcription with named participants.
No automatic language detection. Language code must be passed at runtime. This adds a pipeline step for multi-language audio.
Hallucination from noise. Like all ASR models, Transcribe may generate text from music, ambient sound, or silence. Voice Activity Detection (VAD) preprocessing is recommended for noisy environments such as call centers or video recordings.

How to Access Cohere Transcribe

Three deployment options are available:

Local deployment (free): Download model weights from Hugging Face under Apache 2.0. Requires an RTX 3060 or equivalent GPU. No usage fees.
Free API (rate-limited): Cohere's API provides a free tier with rate limits for testing and light production use.
Model Vault (paid): Cohere's managed deployment removes rate limits and adds SLAs for enterprise production workloads.

"We built Transcribe from the ground up, dedicating the vast majority of model capacity to the acoustic encoder — this is why we achieve state-of-the-art accuracy at a fraction of the compute cost." — Cohere research team

Why This Matters for AI Developers

Until now, Whisper Large v3 was the default open-source choice for speech-to-text — it was free, accurate enough, and widely supported. Cohere Transcribe changes that calculus for any team where transcription accuracy directly affects business outcomes (legal, medical, financial, customer support).

The 27% WER improvement over Whisper is not a marginal gain. At the sentence level, it means fewer corrections, fewer missed words, and lower post-processing costs. For a team transcribing 1,000 hours of earnings calls per quarter, the difference is measurable in analyst time saved.

The Apache 2.0 license also removes the friction of using proprietary APIs for sensitive data. Legal, healthcare, and financial organizations can self-host without routing audio through third-party servers.

For developers building transcription workflows, Happycapy provides access to state-of-the-art language models for downstream processing — summarization, extraction, analysis — once your audio is converted to text.

Start Free with Happycapy — Claude, GPT-4.1, Gemini 3 in One Platform →

Frequently Asked Questions

What is Cohere Transcribe?

Cohere Transcribe is a 2-billion-parameter open-source ASR model released on March 26, 2026. It achieves #1 on the Hugging Face Open ASR Leaderboard with a 5.42% average word error rate. It is available under Apache 2.0 for local deployment or via Cohere's managed API.

How does Cohere Transcribe compare to OpenAI Whisper?

Cohere Transcribe (5.42% WER) outperforms Whisper Large v3 (7.44% WER) by 27% in accuracy. Transcribe also delivers 3x higher throughput and runs on consumer GPUs. Whisper supports 100+ languages versus Transcribe's 14, so Whisper remains the better option for multilingual use cases outside the 14 supported languages.

What languages does Cohere Transcribe support?

Cohere Transcribe supports 14 languages: English, Chinese, Japanese, Arabic, French, German, Spanish, Portuguese, Dutch, Italian, Polish, Russian, Korean, and Hindi. Language detection is not automatic — the language code must be specified at inference time.

Is Cohere Transcribe free to use?

Yes. Cohere Transcribe weights are available for free on Hugging Face under the Apache 2.0 license for local deployment. Cohere also offers a free rate-limited API for testing. A paid managed tier ("Model Vault") removes rate limits for production use.

Sources:
Cohere Blog — "Introducing Cohere Transcribe" (March 26, 2026)
Hugging Face — CohereLabs/cohere-transcribe-03-2026 model card
TechCrunch — "Cohere launches an open source voice model specifically for transcription" (March 26, 2026)
VentureBeat — "Cohere's open-weight ASR model hits 5.4% word error rate" (March 31, 2026)
Ars Technica — Hugging Face Open ASR Leaderboard data

Sources

OpenAI OpenAI GPT-4 Anthropic Claude Google Gemini

← Back to all articles

SharePost on X LinkedIn

—Was this helpful?

Get the best AI tools tips — weekly

Honest reviews, tutorials, and Happycapy tips. No spam.

Model Release

Claude Sonnet 5 Released April 2026: Better Coding, Computer Use, Same Price

6 min

Model Release

Anthropic Claude 4: Features, Models, and How It Compares in 2026