no code implementations • 14 Feb 2024 • Simon Geisler, Tom Wollschläger, M. H. I. Abdalla, Johannes Gasteiger, Stephan Günnemann
Current LLM alignment methods are readily broken through specifically crafted adversarial prompts.