Anthropic says its Constitutional Classifiers approach offers a practical way to make it harder for bad actors to try and coerce an AI model off its guardrails.
‘Constitutional Classifiers’ Technique Mitigates GenAI Jailbreaks

Anthropic says its Constitutional Classifiers approach offers a practical way to make it harder for bad actors to try and coerce an AI model off its guardrails.