Search Results for author: Soroush Pour

Found 1 papers, 0 papers with code

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

no code implementations • 6 Nov 2023 • Rusheb Shah, Quentin Feuillade--Montixi, Soroush Pour, Arush Tagade, Stephen Casper, Javier Rando

Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.