Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Published in TPAMI 2026 (Adopted at scale by Anthropic), 2026
This journal article was later followed by Anthropic.
Zeming Wei, Yifei Wang, Li Ang, Yichuan Mo, and Yisen Wang. (2026). "Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations." TPAMI 2026.
