By Connie · Last reviewed: April 2026 — pricing & tools verified · AI-assisted, human-edited · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.

AI Safety

OpenAI and Anthropic Are Routing Extremist Users to Human Deradicalization Counselors

April 6, 2026 · 8 min read

TL;DR

OpenAI and Anthropic both partnered with crisis contractor ThroughLine in early 2026.
ThroughLine operates 1,600 monitored helplines across 180 countries for deradicalization support.
When AI models detect extremist signals, they surface human counselor referrals instead of engaging.
This is the first AI-industry-wide coordinated response to radicalization — not law enforcement, not data sharing.
Google is also in discussions to join the program.

Reuters reported in April 2026 that both OpenAI and Anthropic have quietly partnered with a New Zealand-based crisis contractor called ThroughLine to handle a problem no AI safety team had solved cleanly before: what do you do when a user is sliding toward radicalization in real time?

The answer, it turns out, is not to ban the user, flag the conversation for review, or call law enforcement. It is to surface a human. ThroughLine runs a constantly-monitored network of 1,600 helplines in 180 countries, staffed by trained deradicalization counselors. When ChatGPT or Claude detects the right combination of signals, the AI surfaces contact information for a local counselor — and steps back.

What ThroughLine Actually Does

ThroughLine was founded by Scott Taylor, who runs the organization from rural New Zealand. It specializes in what it calls "crisis routing" — identifying the right intervention resource for someone in distress and connecting them to it before the situation escalates. The organization had already built relationships with governments and NGOs before approaching AI labs.

The AI integration works at the response layer. When a user's messages trigger radicalization-related classifiers — patterns of violence glorification, in-group/out-group escalation, requests for operational planning, or specific ideological rhetoric — the model is instructed to surface a ThroughLine-connected resource rather than continue the conversation on that track. The user is not banned. The conversation is not reported. They are simply offered a human contact.

This is structurally similar to how AI platforms already handle self-harm queries: a detected signal triggers a crisis resource (like the 988 Suicide and Crisis Lifeline in the US), and the model de-escalates rather than engages with the content directly.

Why This Approach Is Different From Platform Moderation

Traditional platform moderation removes content and bans accounts. It is reactive — it catches what was posted after the fact. ThroughLine's model is designed to intervene before the user acts, by offering a human connection at the moment of peak receptivity.

Approach	When It Acts	Method	Privacy Impact
Platform moderation (ban)	After content posted	Remove + suspend account	Account data reviewed
Law enforcement referral	After threat detected	Report to authorities	Conversation shared
ThroughLine model	During conversation	Surface human counselor	No data shared externally
AI refusal (standard)	During conversation	Decline to respond	No intervention offered

The key differentiator is privacy. No conversation content is transmitted to ThroughLine or law enforcement. The user sees a message, chooses whether to engage with the resource, and retains full agency. This makes the intervention legally clean and less likely to deter at-risk users from seeking help.

OpenAI and Anthropic: What Each Has Confirmed

OpenAI publicly confirmed its relationship with ThroughLine when Reuters reported the story but declined to provide further detail. The confirmation itself is notable — it suggests the program has moved beyond pilot and is now part of OpenAI's standard safety infrastructure for ChatGPT.

Anthropic was also named in the Reuters report but did not respond to requests for comment at publication time. Given Anthropic's known Constitutional AI approach — where Claude is trained to steer conversations away from harmful directions rather than simply refusing — the ThroughLine integration fits naturally into its model. Claude is already designed to de-escalate; ThroughLine gives that de-escalation somewhere to go.

Google was reported to be in discussions with ThroughLine as well, though no partnership was confirmed as of April 2026. If Google joins, the program would cover effectively every major consumer AI platform in the world.

The Radicalization Problem AI Was Not Designed to Solve

AI models are excellent at generating content and terrible at preventing escalation in a sustained, human-aware way. A person radicalizing online does not announce it. They approach the topic obliquely, test the AI's responses, and use the platform for reinforcement. Standard AI refusals ("I can't help with that") are ineffective at this stage — they are either worked around or ignored.

Research on deradicalization consistently shows that the most effective interventions are human, relational, and non-punitive. A trained counselor who meets someone where they are — without judgment, without punishment — has a success rate no algorithm can replicate. ThroughLine's model plugs the gap between AI pattern detection (where machines excel) and human connection (where they do not).

What This Means for AI Safety Standards in 2026

The ThroughLine partnership signals a maturing approach to AI safety that goes beyond model-level guardrails. The first generation of AI safety work was about preventing models from generating harmful content. The second generation — where the industry appears to be now — is about what happens at the edge cases where harm is not in the content but in the user's intent and trajectory.

For enterprises deploying AI internally, this model is instructive. The question is not just "what can the AI refuse to do?" but "what happens when a user is in distress?" Building routing logic that connects AI to human support resources — whether for mental health, legal, or safety concerns — is becoming a best practice.

Platforms like Happycapy give teams the infrastructure to build these kinds of conditional workflow triggers — routing certain inputs to human review queues rather than continuing automated processing. The underlying principle is the same: AI detects the signal, human handles the case.

Build AI Workflows with Human-in-the-Loop Routing

Happycapy gives you agent infrastructure with conditional routing, human review queues, and 150+ skills. Pro plan starts at $17/month.

Try Happycapy Free

FAQ

What is ThroughLine and how does it relate to OpenAI and Anthropic?

ThroughLine is a crisis intervention contractor founded by Scott Taylor in New Zealand. It operates a network of 1,600 checked helplines across 180 countries. Both OpenAI and Anthropic have partnered with ThroughLine to refer users who display signs of radicalization or extremist behavior to human counselors rather than leaving them solely in the AI conversation.

How does AI detect extremist intent in a conversation?

AI models like ChatGPT and Claude are trained to recognize patterns associated with radicalization: escalating language around violence, in-group/out-group framing, requests for operational planning, and specific ideological rhetoric. When these signals appear, the system surfaces intervention messages with contact information for human support resources.

Does this mean AI companies are spying on conversations?

No. Detection happens in real time within the AI response layer — it does not involve storing or reporting conversation content to law enforcement. The AI surfaces a resource message to the user. User data is not shared with ThroughLine unless the user chooses to contact them directly.

Which AI companies have joined this extremism intervention program?

OpenAI confirmed its relationship with ThroughLine publicly. Anthropic's partnership was reported by Reuters in April 2026. Google is also reportedly in discussions. This suggests the deradicalization intervention model is becoming a standard safety layer across frontier AI providers.

Sources

Reuters: “Crisis contractor for OpenAI, Anthropic eyes a move to combat extremism” — April 2, 2026
OpenAI Safety Policies — openai.com/policies
Anthropic Constitutional AI Framework — anthropic.com/research/constitutional-ai

Sources

OpenAI OpenAI ChatGPT Anthropic Anthropic Claude

← Back to all articles

SharePost on X LinkedIn

—Was this helpful?

Get the best AI tools tips — weekly

Honest reviews, tutorials, and Happycapy tips. No spam.

AI Safety

Stalking Victim Sues OpenAI: What the ChatGPT Lawsuit Means for AI Safety

9 min

AI Safety

OpenAI Launched a Safety Fellowship Hours After the New Yorker Reported It Deleted 'Safely' From Its Mission