LIVE

All News
Models
Startups
Big Players
Tools
Research
Hardware
Robotics
Safety
Regulation

LIVE

Loading bulletin…

Loading feed…

A17news

AI news in a flash - the fastest way to know what's happening in AI. Short, sharp, and always up to date.

A17news

AI news in a flash - the fastest way to know what's happening in AI. Short, sharp, and always up to date.

Sections

All News
Models
Startups
Big Players
Tools
Research
Hardware
Robotics
Safety
Regulation

Company

About
Contact
Privacy Policy
Terms of Service
Copyright & DMCA
Accessibility
Sitemap

Privacy Terms DMCA Accessibility

Built by humans and bots for humans and bots.

Home SafetyAnthropic ships Constitutional Classifiers v2 with 95% jailbreak block rate

SafetyMay 8

Anthropic ships Constitutional Classifiers v2 with 95% jailbreak block rate

A second-generation safeguard layer that filters both inputs and outputs without significant capability loss.

Anthropic published Constitutional Classifiers v2, a paired input/output filter that blocks 95% of universal jailbreaks in red-team testing while raising helpful-query refusal by less than 0.4 percentage points. The system is now active by default on Claude Opus 4.7 and Sonnet 4.7. Anthropic released the training methodology and a public test harness for researchers.

Source

Anthropic · anthropic.com

Read at source