Legal & Policy

Britannica and Merriam-Webster Sue OpenAI: What 100,000 Scraped Articles Mean for ChatGPT Users

Q: What is the Britannica and Merriam-Webster lawsuit against OpenAI about?

Encyclopedia Britannica and its subsidiary Merriam-Webster filed a copyright lawsuit against OpenAI on March 13, 2026, in the U.S. District Court for the Southern District of New York. They allege that OpenAI scraped nearly 100,000 copyrighted reference articles and dictionary entries without permission to train ChatGPT, and that the model sometimes reproduces this content verbatim. The lawsuit also includes trademark infringement claims, arguing that AI hallucinations falsely attributed to Britannica damage the publisher's reputation.

Q: What is OpenAI's defense in the Britannica lawsuit?

OpenAI argues that its models are trained on publicly available data and that such training qualifies as fair use under U.S. copyright law. The company contends that its systems transform source material into statistical patterns rather than reproducing original works. This fair use argument is the same defense OpenAI is using in several parallel copyright cases, including its ongoing dispute with The New York Times.

Q: Which other publishers have sued AI companies for training data?

The list of publishers suing AI companies over training data is growing: The New York Times sued OpenAI in 2023, Getty Images sued Stability AI, Britannica sued Perplexity AI in 2025, and now Britannica has sued OpenAI in 2026. Several authors and code repositories have also filed or joined class-action lawsuits. These cases collectively challenge whether AI companies can use copyrighted material to train models without licensing agreements.

March 30, 2026 · Happycapy Guide

TL;DR

On March 13, 2026, Encyclopedia Britannica and Merriam-Webster filed a federal copyright lawsuit against OpenAI in Manhattan, alleging ChatGPT was trained on nearly 100,000 reference articles without permission and sometimes reproduces them word-for-word. OpenAI is defending with a fair use argument. The outcome could determine whether AI companies owe licensing fees to publishers — and reshape how much the AI tools you rely on actually cost to run.

~100K

Britannica articles allegedly scraped for training

licensing fees paid to publishers (alleged)

publishers named in the lawsuit (Britannica + Merriam-Webster)

active AI copyright suits in 2026 across U.S. courts

What Happened

On March 13, 2026, Encyclopaedia Britannica and its wholly owned subsidiary Merriam-Webster filed a lawsuit in the U.S. District Court for the Southern District of New York. The complaint names OpenAI as the sole defendant and contains two core claims: copyright infringement and trademark infringement.

On copyright, Britannica alleges that OpenAI scraped and used nearly 100,000 of its online articles and Merriam-Webster dictionary entries as training data for ChatGPT without obtaining a license. The complaint cites examples in which ChatGPT outputs closely mirror — and in some cases reproduce verbatim — passages from Britannica's encyclopedic entries. The publishers argue this directly harms their web traffic and paid subscription revenue by giving users access to the content without visiting the source.

On trademark, the lawsuit alleges that ChatGPT occasionally fabricates information and attributes it to Britannica — a form of AI “hallucination” that Britannica says violates the Lanham Act by suggesting unauthorized endorsement and damaging the publisher's reputation for factual accuracy.

This is not Britannica's first AI lawsuit. In 2025, Britannica filed a similar suit against Perplexity AI, which remains ongoing. The March 2026 OpenAI complaint follows the same legal strategy but targets a far larger defendant with deeper pockets and a more prominent public profile.

OpenAI's Defense: Fair Use

OpenAI's position is consistent with its defense in the New York Times lawsuit filed in 2023 and several other copyright cases still working through courts. The company argues that:

Training AI models on publicly accessible content qualifies as transformative fair use under U.S. copyright law.
The model learns statistical patterns from text rather than “copying” specific works in any traditional sense.
Requiring licenses for training data would make large-scale AI development prohibitively expensive and stifle innovation.

U.S. courts have not yet ruled definitively on fair use as applied to AI training. The New York Times case is expected to produce the first major ruling — a decision that will set the precedent by which Britannica's case (and dozens of similar pending suits) will be decided.

Try Happycapy — Claude-powered AI built on transparent data practices, from $17/mo

Why This Matters for Everyday AI Users

If OpenAI loses, your AI tools get more expensive

If courts rule that training on copyrighted material requires licensing, OpenAI and every other AI company using similar data would face retroactive licensing fees and ongoing royalty payments. Those costs would almost certainly flow to consumers through higher subscription prices — or a fundamental change in what data future models can be trained on.

The verbatim reproduction concern is real

Britannica's complaint specifically calls out instances where ChatGPT produces near-verbatim text from Britannica articles. If you use ChatGPT to research topics covered by major reference works, the outputs you are reading may contain copyrighted prose. This matters most for professional use cases where the provenance of text is important — journalism, academic research, legal filings.

Trademark hallucinations are a separate legal risk

The trademark claim is less commonly discussed but arguably more immediately damaging to Britannica. When ChatGPT generates a false fact and a user asks “where did you get this?” and the model points to Britannica or Merriam-Webster, it damages those brands' 180-year reputation for accuracy. This type of misattribution is hard to prevent at scale and harder to detect.

Plaintiff	Defendant	Filed	Core Claim	Status
The New York Times	OpenAI + Microsoft	Dec 2023	Training on millions of NYT articles verbatim	Pre-trial discovery
Getty Images	Stability AI	Jan 2023	12M+ copyrighted images in training set	Ongoing
Authors Guild (class action)	OpenAI	Sep 2023	Fiction used to train ChatGPT	Certified class
Britannica / Merriam-Webster	Perplexity AI	2025	Reference articles as training + verbatim answers	Ongoing
Britannica / Merriam-Webster	OpenAI	Mar 13, 2026	~100K articles + trademark misattribution	Filed — active
Record labels (UMG, Sony, WMG)	Suno + Udio	2024	Copyrighted music in audio model training	Settled (undisclosed)

What about Anthropic? Anthropic, the company behind Claude (and the AI powering Happycapy), has taken a notably different approach to training data. Anthropic uses Constitutional AI training methods and has stated policies on data sourcing that prioritize consent and licensing where possible. No major publisher lawsuit has been filed against Anthropic to date, which reflects its more cautious posture on training data compared to OpenAI.

Frequently Asked Questions

What is the Britannica and Merriam-Webster lawsuit against OpenAI about?

Encyclopaedia Britannica and its subsidiary Merriam-Webster filed a copyright lawsuit against OpenAI on March 13, 2026, in Manhattan federal court. They allege that OpenAI scraped nearly 100,000 of their copyrighted reference articles and dictionary entries without permission to train ChatGPT, and that the model sometimes reproduces this content verbatim. The lawsuit also includes trademark infringement claims, arguing that AI hallucinations falsely attributed to Britannica damage the publisher's accuracy reputation.

Is ChatGPT safe to use given these copyright lawsuits?

For individual users, copyright lawsuits against OpenAI do not create personal legal risk — the liability sits with OpenAI, not you. However, the lawsuits raise legitimate questions about whether AI-generated content derived from disputed training data could face commercial use restrictions in the future, and whether ChatGPT responses accurately represent their sources. Users relying on AI for research, journalism, or published content should be especially aware of the verbatim reproduction concern.

What is OpenAI's defense in the Britannica lawsuit?

OpenAI argues that training on publicly available content qualifies as transformative fair use under U.S. copyright law — the same defense it is using in the New York Times case and other pending copyright suits. The company contends that AI systems learn statistical patterns rather than “copying” works in a traditional sense, and that requiring training data licenses would make large-scale AI development prohibitively expensive.

Which other publishers have sued AI companies for training data?

The list is growing. The New York Times sued OpenAI in December 2023. Getty Images sued Stability AI in January 2023. The Authors Guild filed a class action against OpenAI. Britannica sued Perplexity AI in 2025. Major record labels settled with Suno and Udio over audio model training in 2024. Each case is building a legal record that courts will use to eventually set binding precedent on fair use in AI training.

Happycapy Pro — Claude-powered agents, transparent AI, no ads, $17/mo

Sources

← Back to all articles