Search Results for author: Robie Gonzales

Found 2 papers, 1 papers with code

Representation noising effectively prevents harmful fine-tuning on LLMs

no code implementations • 23 May 2024 • Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, David Atanasov, Robie Gonzales, Subhabrata Majumdar, Carsten Maple, Hassan Sajjad, Frank Rudzicz

We provide empirical evidence that the effectiveness of our defence lies in its "depth": the degree to which information about harmful representations is removed across all layers of the LLM.

Paper
Add Code

Long-form evaluation of model editing

1 code implementation • 14 Feb 2024 • Domenic Rosati, Robie Gonzales, Jinkun Chen, Xuemin Yu, Melis Erkan, Yahya Kayani, Satya Deepika Chavatapalli, Frank Rudzicz, Hassan Sajjad

Evaluations of model editing currently only use the `next few token' completions after a prompt.

Model Editing Text Generation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.