Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Published in NeurIPS 2024, 2024
This paper proposes prompt adversarial tuning as a defense against jailbreak attacks.
Yichuan Mo, Yuji Wang, Zeming Wei, and Yisen Wang. (2024). "Fight Back Against Jailbreaking via Prompt Adversarial Tuning." NeurIPS 2024.
