no code implementations • ICML 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.
no code implementations • 19 Apr 2024 • Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, Alex Beutel
Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.
no code implementations • 8 Mar 2024 • Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine
Large language models (LLMs) have a tendency to generate plausible-sounding yet factually incorrect responses, especially when queried on unfamiliar concepts.
1 code implementation • 19 Feb 2024 • Alexander Wan, Eric Wallace, Dan Klein
Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as "is aspartame linked to cancer".
no code implementations • 28 Nov 2023 • Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset.
no code implementations • 11 Sep 2023 • Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr
Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more.
1 code implementation • 8 Aug 2023 • Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer
SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e. g., containing copyrighted books or news) that is only queried during inference.
1 code implementation • 25 May 2023 • Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao liu, Pieter Abbeel, Sergey Levine, Dawn Song
This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
1 code implementation • 1 May 2023 • Alexander Wan, Eric Wallace, Sheng Shen, Dan Klein
In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired trigger phrase appears in the input.
1 code implementation • 30 Jan 2023 • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.
1 code implementation • 15 Nov 2022 • Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel
The Internet contains a wealth of knowledge -- from the birthdays of historical figures to tutorials on how to code -- all of which may be learned by language models.
no code implementations • 30 Jun 2022 • Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang
In memorization, models overfit specific training examples and become susceptible to privacy attacks.
1 code implementation • ACL 2022 • Eric Wallace, Nicholas Tomlin, Albert Xu, Kevin Yang, Eshaan Pathak, Matthew Ginsberg, Dan Klein
We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles.
3 code implementations • 12 Apr 2022 • Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis
Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.
Ranked #85 on Code Generation on MBPP
3 code implementations • 14 Feb 2022 • Nikhil Kandpal, Eric Wallace, Colin Raffel
Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set.
1 code implementation • Findings (ACL) 2022 • Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela
To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena.
2 code implementations • Findings (ACL) 2022 • Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel
Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning.
1 code implementation • NAACL 2021 • Albert Xu, Eshaan Pathak, Eric Wallace, Suchin Gururangan, Maarten Sap, Dan Klein
Language models (LMs) must be both safe and equitable to be responsibly deployed in practice.
5 code implementations • 19 Feb 2021 • Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh
We show that this type of few-shot learning can be unstable: the choice of prompt format, training examples, and even the order of the training examples can cause accuracy to vary from near chance to near state-of-the-art.
3 code implementations • 14 Dec 2020 • Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data.
no code implementations • EMNLP 2020 • Eric Wallace, Matt Gardner, Sameer Singh
Although neural NLP models are highly expressive and empirically successful, they also systematically fail in counterintuitive ways and are opaque in their decision-making process.
3 code implementations • EMNLP 2020 • Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, Sameer Singh
The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining.
no code implementations • NAACL 2021 • Eric Wallace, Tony Z. Zhao, Shi Feng, Sameer Singh
In this work, we develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Junlin Wang, Jens Tuyls, Eric Wallace, Sameer Singh
Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their faithfulness.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 10 Aug 2020 • Rosario Cammarota, Matthias Schunter, Anand Rajan, Fabian Boemer, Ágnes Kiss, Amos Treiber, Christian Weinert, Thomas Schneider, Emmanuel Stapf, Ahmad-Reza Sadeghi, Daniel Demmler, Joshua Stock, Huili Chen, Siam Umar Hussain, Sadegh Riazi, Farinaz Koushanfar, Saransh Gupta, Tajan Simunic Rosing, Kamalika Chaudhuri, Hamid Nejatollahi, Nikil Dutt, Mohsen Imani, Kim Laine, Anuj Dubey, Aydin Aysu, Fateme Sadat Hosseini, Chengmo Yang, Eric Wallace, Pamela Norton
Additionally, such systems should also use Privacy-Enhancing Technologies (PETs) to protect customers' data at any time.
1 code implementation • EMNLP 2020 • Eric Wallace, Mitchell Stern, Dawn Song
To mitigate these vulnerabilities, we propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models.
1 code implementation • ACL 2020 • Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song
Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions?
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
2 code implementations • 26 Feb 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.
1 code implementation • IJCNLP 2019 • Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner, Sameer Singh
Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior.
1 code implementation • IJCNLP 2019 • Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner
The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks.
1 code implementation • IJCNLP 2019 • Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh
We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset.
1 code implementation • ACL 2019 • Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer
Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs.
no code implementations • ACL 2019 • Shi Feng, Eric Wallace, Jordan Boyd-Graber
Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e. g., hypothesis-only models for SNLI or question-only models for VQA).
1 code implementation • 1 Feb 2019 • Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi
Second, we compute the importance of group-features in deep learning interpretation by introducing a sparsity regularization term.
1 code implementation • WS 2018 • Eric Wallace, Shi Feng, Jordan Boyd-Graber
However, the confidence of neural networks is not a robust measure of model uncertainty.
1 code implementation • TACL 2019 • Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber
We propose human-in-the-loop adversarial generation, where human authors are guided to break models.
no code implementations • ACL 2018 • Eric Wallace, Jordan Boyd-Graber
Modern question answering systems have been touted as approaching human performance.
no code implementations • EMNLP 2018 • Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber
In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word.