Anthropic

Anthropic's fight over model distillation in China: What is confirmed, what is claimed, and what comes next

Pedro Gonzalez

Feb 23, 2026 • 5 min read

Anthropic's fight over model distillation in China: What is confirmed, what is claimed, and what comes next

Anthropic says three Chinese AI labs used fraudulent access to Claude at scale to improve their own models. That is a serious allegation with technical, commercial, and policy consequences. Right now, the public evidence is mostly company-reported, so the key is to separate what is confirmed from what is still a claim.

This goes beyond routine platform abuse. If true, this is both a governance issue and a national security concern. If any part is overstated, it could still influence export-control policy.

This breakdown separates confirmed facts from unresolved claims, then closes with practical steps for platform and security teams.

TLDR for busy readers

Anthropic alleges three Chinese AI labs used fraudulent access to distill Claude capabilities.
The most detailed evidence is company-reported, with Reuters providing external reporting context.
The issue matters across security, competition, and export-control policy.
Key unknowns remain, including independent forensic validation and legal outcomes.
Teams should respond with practical abuse detection and precise, claims-safe communication.

What Anthropic says happened

In an official post, Anthropic says DeepSeek, Moonshot, and MiniMax ran industrial-scale extraction campaigns against Claude.

Key claims from Anthropic:

More than 16 million Claude exchanges in total
Roughly 24,000 fraudulent accounts
Claimed attribution to specific labs with high confidence
Focus on high-value capabilities: reasoning, tool use, coding

Anthropic describes this as illicit "distillation" at scale.

Important nuance: Anthropic also says distillation itself is a legitimate and common training technique when used on your own models. The dispute is about unauthorized extraction from a competitor and alleged terms violations.

Distillation volume at a glance (Anthropic claim data)

How to read this data:

These numbers are from Anthropic's own reporting.
Reuters reported the allegations, but did not independently audit technical telemetry.
Treat these counts as credible claims under investigation, not adjudicated fact.

Lab	Reported exchanges	Primary claimed targets
DeepSeek	>150,000	Reasoning quality, reward-model style grading, policy-sensitive query rewrites
Moonshot	>3.4 million	Agentic reasoning, tool use, coding, data analysis, computer use, vision
MiniMax	>13 million	Agentic coding, orchestration, tool use

Reported Claude exchanges by lab (Anthropic claims)DeepSeekMoonshotMiniMax0.15M+3.4M+13M+Source: Anthropic, Detecting and preventing distillation attacks (2026-02-23). Not independently audited.

How these campaigns allegedly worked

Anthropic's description points to a repeatable abuse pattern:

Create or buy access through large batches of fraudulent accounts.
Route requests through proxy infrastructure to reduce traceability.
Send high-volume prompt sets focused on reasoning, coding, and tool use.
Use outputs for post-training and reinforcement workflows.
Pivot fast when a new model release exposes better capabilities.

Alleged campaign flow (based on Anthropic reporting)Fraudulentaccount setupProxy routingand maskingCapabilityextraction promptsDistillation /RL post-trainingRapid pivotto new modelSource basis: Anthropic "Detecting and preventing distillation attacks". Diagram is an editorial simplification.

What is confirmed versus what is still a claim

Confirmed

Anthropic published the allegations on its official site.
Anthropic says it detected large-scale abusive patterns and tightened controls.
Anthropic's restrictions update says entities more than 50% owned, directly or indirectly, by companies headquartered in unsupported regions are prohibited.
Anthropic ties this to broader policy arguments around chip export controls.
Reuters reported the allegations and said the named companies did not immediately respond.

Claimed by Anthropic, not independently verified in full public detail

Exact scale numbers by lab
Specific attribution methodology outcomes for each lab
Degree of capability transfer achieved by each campaign

Unknown right now

Any full third-party forensic validation
Any legal findings or enforcement outcomes tied to these campaigns
Any published technical rebuttal from the named labs

Why this matters beyond one company

There are three layers here.

Competitive integrity Frontier model outputs are expensive to produce. If a competitor can copy high-value behavior through unauthorized extraction, training timelines and cost structures change fast.
Safety transfer risk Anthropic argues that distilled models may not carry over safeguard behavior. That could matter if advanced capabilities spread with weaker abuse controls.
Policy leverage Anthropic uses this incident to support stricter export controls. That puts one incident class into a larger geopolitical debate about compute access and strategic advantage.

The policy and market tension

This is where the debate often splits.

Anthropic's policy case is straightforward: if large-scale illicit distillation is real, then export controls and access restrictions need stronger enforcement.

Critics raise a different concern: broad restrictions can become blunt tools. They may hurt legitimate research paths, reduce transparency, and further fragment global AI ecosystems.

Two realities can coexist here:

Unauthorized extraction is a real risk.
Overbroad policy responses can create new problems.

If you run a model or API platform: practical playbook

This incident is a strong stress test template.

Next 7 days checklist

Define your top 3 abuse signatures for coordinated extraction.
Set alert thresholds for high-volume, capability-focused prompt clusters.
Run one red-team simulation for output-extraction behavior.
Assign an incident owner and escalation path for suspected distillation campaigns.
Draft one claims-safe incident communication template for legal and PR review.

Severity rubric (quick triage)

Low: isolated suspicious prompts, low coordination signals.
Medium: repeated patterns across multiple accounts, limited scale.
High: clear multi-account coordination targeting model capabilities.
Critical: industrial-scale abuse with rapid pivoting to new model releases.

KPIs to track

Time-to-detect coordinated account behavior.
False-positive rate for abuse classifiers.
Suspicious traffic blocked pre-inference.
Time-to-mitigate after first confirmed signal.
Time-to-adapt controls after a major model release.

Baseline operating guidance

Track coordinated behavioral fingerprints, not only account-level abuse.
Watch for repeated capability-focused prompt clusters at scale.
Harden account verification routes that attackers can industrialize.
Share technical indicators with peers when abuse patterns cross providers.
Be cautious with policy conclusions until independent validation is available.

If you are in policy or compliance: decision prompts

What independent evidence threshold should trigger policy advocacy in your org?
Which controls are reversible if applied too broadly?
How will you separate fraud enforcement from broad geographic overreach?
What transparency standard should accompany public attributions?

Open questions to watch

Will independent third-party technical validation emerge?
Will any legal or regulatory actions confirm or challenge key allegations?
Will named labs issue detailed technical rebuttals?
Will cross-provider intelligence sharing improve attribution confidence?

Confidence callout

High confidence: Anthropic published these claims and policy framing in official posts.
High confidence: Reuters reports the same allegations and lack of immediate response from named firms at publication time.
Medium confidence: Full operational scope and attribution precision, because the public evidence set is still mostly company-reported.
Low confidence: Sweeping geopolitical conclusions from this event alone.

Bottom line

This story is bigger than one vendor conflict. It is a live test of how frontier labs defend model behavior, how attribution claims are evaluated, and how fast policy reacts before independent verification is complete.

The right takeaway is straightforward: take the allegations seriously, stay precise about evidence, and update your view as third-party validation emerges.

Sources

Primary and high-signal sources:

Anthropic, "Detecting and preventing distillation attacks" (official): https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
Anthropic, "Updating restrictions of sales to unsupported regions" (official): https://www.anthropic.com/news/updating-restrictions-of-sales-to-unsupported-regions
Anthropic, "Supported countries and regions" (official): https://www.anthropic.com/supported-countries
Anthropic, "A statement from Dario Amodei on Anthropic's commitment to American AI leadership" (official): https://www.anthropic.com/news/statement-dario-amodei-american-ai-leadership
Anthropic, "AI Export Controls Framework Response" (official): https://www.anthropic.com/news/securing-america-s-compute-advantage-anthropic-s-position-on-the-diffusion-rule
Reuters, Feb 23, 2026 coverage: https://www.reuters.com/world/china/chinese-companies-used-claude-improve-own-models-anthropic-says-2026-02-23/
Reuters, Feb 12, 2026 OpenAI/DeepSeek memo context: https://www.reuters.com/world/china/openai-accuses-deepseek-distilling-us-models-gain-advantage-bloomberg-news-2026-02-12/