Search Results for author: Jonas Geiping

Found 59 papers, 39 papers with code

LMD3: Language Model Data Density Dependence

no code implementations • 10 May 2024 • John Kirchenbauer, Garrett Honke, Gowthami Somepalli, Jonas Geiping, Daphne Ippolito, Katherine Lee, Tom Goldstein, David Andre

We develop a methodology for analyzing language model task performance at the individual example level based on training data density estimation.

Paper
Add Code

Measuring Style Similarity in Diffusion Models

1 code implementation • 1 Apr 2024 • Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shramay Palta, Micah Goldblum, Jonas Geiping, Abhinav Shrivastava, Tom Goldstein

We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model.

Attribute

Paper
Code

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

no code implementations • 1 Apr 2024 • Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini

In this paper, we unveil a new vulnerability: the privacy backdoor attack.

Backdoor Attack

Paper
Add Code

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

1 code implementation • 25 Mar 2024 • Hossein Souri, Arpit Bansal, Hamid Kazemi, Liam Fowl, Aniruddha Saha, Jonas Geiping, Andrew Gordon Wilson, Rama Chellappa, Tom Goldstein, Micah Goldblum

As a result, we may be able to craft more potent poisons by carefully choosing the base samples.

Backdoor Attack

Paper
Code

What do we learn from inverting CLIP models?

1 code implementation • 5 Mar 2024 • Hamid Kazemi, Atoosa Chegini, Jonas Geiping, Soheil Feizi, Tom Goldstein

We employ an inversion-based approach to examine CLIP models.

Paper
Code

Coercing LLMs to do and reveal (almost) anything

1 code implementation • 21 Feb 2024 • Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, Tom Goldstein

It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements.

Paper
Code

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

1 code implementation • 22 Jan 2024 • Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors.

159

Paper
Code

Object Recognition as Next Token Prediction

1 code implementation • 4 Dec 2023 • Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim

We present an approach to pose object recognition as next token prediction.

Decoder Language Modelling +2

109

Paper
Code

A Simple and Efficient Baseline for Data Attribution on Images

1 code implementation • 3 Nov 2023 • Vasu Singla, Pedro Sandoval-Segura, Micah Goldblum, Jonas Geiping, Tom Goldstein

Our approach serves as a simple and efficient baseline for data attribution on images.

Attribute Self-Supervised Learning

Paper
Code

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

no code implementations • 23 Oct 2023 • Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

But in parallel to the development of detection frameworks, researchers have also concentrated on designing strategies to elude detection, i. e., focusing on the impossibilities of AI-generated text detection.

Misinformation Text Detection

Paper
Add Code

NEFTune: Noisy Embeddings Improve Instruction Finetuning

3 code implementations • 9 Oct 2023 • Neel Jain, Ping-Yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation.

Language Modelling

5,977

Paper
Code

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

1 code implementation • 1 Sep 2023 • Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-Yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, Tom Goldstein

We find that the weakness of existing discrete optimizers for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.

Paper
Code

Augmenters at SemEval-2023 Task 1: Enhancing CLIP in Handling Compositionality and Ambiguity for Zero-Shot Visual WSD through Prompt Augmentation and Text-To-Image Diffusion

no code implementations • 9 Jul 2023 • Jie S. Li, Yow-Ting Shiue, Yong-Siang Shih, Jonas Geiping

SD Sampling uses text-to-image Stable Diffusion to generate multiple images from the given phrase, increasing the likelihood that a subset of images match the one that paired with the text.

Descriptive Word Sense Disambiguation

Paper
Add Code

Seeing in Words: Learning to Classify through Language Bottlenecks

no code implementations • 29 Jun 2023 • Khalid Saifullah, Yuxin Wen, Jonas Geiping, Micah Goldblum, Tom Goldstein

Neural networks for computer vision extract uninterpretable features despite achieving high accuracy on benchmarks.

Paper
Add Code

On the Exploitability of Instruction Tuning

1 code implementation • NeurIPS 2023 • Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein

In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior.

Data Poisoning Instruction Following

Paper
Code

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

1 code implementation • 23 Jun 2023 • Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative.

Chatbot Language Modelling

107

Paper
Code

On the Reliability of Watermarks for Large Language Models

1 code implementation • 7 Jun 2023 • John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.

451

Paper
Code

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

1 code implementation • 31 May 2023 • Yuxin Wen, John Kirchenbauer, Jonas Geiping, Tom Goldstein

The watermark embeds a pattern into the initial noise vector used for sampling.

Image Generation

209

Paper
Code

Understanding and Mitigating Copying in Diffusion Models

1 code implementation • NeurIPS 2023 • Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein

While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role.

Image Captioning Memorization

Paper
Code

What Can We Learn from Unlearnable Datasets?

1 code implementation • NeurIPS 2023 • Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein

First, it is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization.

Paper
Code

A Cookbook of Self-Supervised Learning

no code implementations • 24 Apr 2023 • Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann Lecun, Micah Goldblum

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning.

Navigate Self-Supervised Learning

Paper
Add Code

JPEG Compressed Images Can Bypass Protections Against AI Editing

no code implementations • 5 Apr 2023 • Pedro Sandoval-Segura, Jonas Geiping, Tom Goldstein

Recently developed text-to-image diffusion models make it easy to edit or create high-quality images.

Face Swapping

Paper
Add Code

Universal Guidance for Diffusion Models

1 code implementation • 14 Feb 2023 • Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, Tom Goldstein

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining.

Face Recognition object-detection +1

414

Paper
Code

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

2 code implementations • NeurIPS 2023 • Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, Tom Goldstein

In the text-to-image setting, the method creates hard prompts for diffusion models, allowing API users to easily generate, discover, and mix and match image concepts without prior knowledge on how to prompt the model.

570

Paper
Code

A Watermark for Large Language Models

7 code implementations • 24 Jan 2023 • John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein

Potential harms of large language models can be mitigated by watermarking model output, i. e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens.

Language Modelling

8,000

Paper
Code

Cramming: Training a Language Model on a Single GPU in One Day

1 code implementation • 28 Dec 2022 • Jonas Geiping, Tom Goldstein

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners.

Language Modelling Masked Language Modeling

1,239

Paper
Code

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

no code implementations • CVPR 2023 • Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein

Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes.

Image Retrieval Retrieval

Paper
Add Code

K-SAM: Sharpness-Aware Minimization at the Speed of SGD

no code implementations • 23 Oct 2022 • Renkun Ni, Ping-Yeh Chiang, Jonas Geiping, Micah Goldblum, Andrew Gordon Wilson, Tom Goldstein

Sharpness-Aware Minimization (SAM) has recently emerged as a robust technique for improving the accuracy of deep neural networks.

Paper
Add Code

Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

1 code implementation • 19 Oct 2022 • Yuxin Wen, Arpit Bansal, Hamid Kazemi, Eitan Borgnia, Micah Goldblum, Jonas Geiping, Tom Goldstein

As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners.

Paper
Code

Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning

1 code implementation • 17 Oct 2022 • Yuxin Wen, Jonas Geiping, Liam Fowl, Hossein Souri, Rama Chellappa, Micah Goldblum, Tom Goldstein

Federated learning is particularly susceptible to model poisoning and backdoor attacks because individual users have direct control over the training data and model updates.

Federated Learning Image Classification +2

Paper
Code

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

1 code implementation • 12 Oct 2022 • Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, Andrew Gordon Wilson

Despite the clear performance benefits of data augmentations, little is known about why they are so effective.

Paper
Code

A Simple Strategy to Provable Invariance via Orbit Mapping

no code implementations • 24 Sep 2022 • Kanchana Vaishnavi Gandikota, Jonas Geiping, Zorah Lähner, Adam Czapliński, Michael Moeller

Many applications require robustness, or ideally invariance, of neural networks to certain transformations of input data.

3D Point Cloud Classification Computational Efficiency +2

Paper
Add Code

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

2 code implementations • NeurIPS 2023 • Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S. Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, Tom Goldstein

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

Image Restoration Variational Inference

7,106

Paper
Code

Autoregressive Perturbations for Data Poisoning

2 code implementations • 8 Jun 2022 • Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein, David W. Jacobs

Unfortunately, existing methods require knowledge of both the target architecture and the complete dataset so that a surrogate network can be trained, the parameters of which are used to generate the attack.

Data Poisoning

Paper
Code

Poisons that are learned faster are more effective

no code implementations • 19 Apr 2022 • Pedro Sandoval-Segura, Vasu Singla, Liam Fowl, Jonas Geiping, Micah Goldblum, David Jacobs, Tom Goldstein

We advocate for evaluating poisons in terms of peak test accuracy.

Paper
Add Code

Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

1 code implementation • 1 Feb 2022 • Yuxin Wen, Jonas Geiping, Liam Fowl, Micah Goldblum, Tom Goldstein

Federated learning (FL) has rapidly risen in popularity due to its promise of privacy and efficiency.

Federated Learning

245

Paper
Code

Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

1 code implementation • 29 Jan 2022 • Liam Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojtek Czaja, Micah Goldblum, Tom Goldstein

A central tenet of Federated learning (FL), which trains models without centralizing user data, is privacy.

Federated Learning

245

Paper
Code

Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models

2 code implementations • ICLR 2022 • Liam Fowl, Jonas Geiping, Wojtek Czaja, Micah Goldblum, Tom Goldstein

Federated learning has quickly gained popularity with its promises of increased user privacy and efficiency.

Federated Learning

245

Paper
Code

DARTS for Inverse Problems: a Study on Stability

no code implementations • NeurIPS Workshop Deep_Invers 2021 • Jonas Geiping, Jovita Lukasik, Margret Keuper, Michael Moeller

Differentiable architecture search (DARTS) is a widely researched tool for neural architecture search, due to its promising results for image classification.

Image Classification Neural Architecture Search

Paper
Add Code

Protecting Proprietary Data: Poisoning for Secure Dataset Release

no code implementations • 29 Sep 2021 • Liam H Fowl, Ping-Yeh Chiang, Micah Goldblum, Jonas Geiping, Arpit Amit Bansal, Wojciech Czaja, Tom Goldstein

These two behaviors can be in conflict as an organization wants to prevent competitors from using their own data to replicate the performance of their proprietary models.

Data Poisoning

Paper
Add Code

Stochastic Training is Not Necessary for Generalization

1 code implementation • ICLR 2022 • Jonas Geiping, Micah Goldblum, Phillip E. Pope, Michael Moeller, Tom Goldstein

It is widely believed that the implicit regularization of SGD is fundamental to the impressive generalization behavior we observe in neural networks.

Data Augmentation

Paper
Code

DP-InstaHide: Data Augmentations Provably Enhance Guarantees Against Dataset Manipulations

no code implementations • 29 Sep 2021 • Eitan Borgnia, Jonas Geiping, Valeriia Cherepanova, Liam H Fowl, Arjun Gupta, Amin Ghiasi, Furong Huang, Micah Goldblum, Tom Goldstein

Data poisoning and backdoor attacks manipulate training data to induce security breaches in a victim model.

Data Poisoning

Paper
Add Code

Is Differentiable Architecture Search truly a One-Shot Method?

no code implementations • 12 Aug 2021 • Jonas Geiping, Jovita Lukasik, Margret Keuper, Michael Moeller

In this work, we investigate DAS in a systematic case study of inverse problems, which allows us to analyze these potential benefits in a controlled manner.

Hyperparameter Optimization Image Classification +2

Paper
Add Code

Adversarial Examples Make Strong Poisons

2 code implementations • NeurIPS 2021 • Liam Fowl, Micah Goldblum, Ping-Yeh Chiang, Jonas Geiping, Wojtek Czaja, Tom Goldstein

The adversarial machine learning literature is largely partitioned into evasion attacks on testing data and poisoning attacks on training data.

Data Poisoning

Paper
Code

Training or Architecture? How to Incorporate Invariance in Neural Networks

no code implementations • 18 Jun 2021 • Kanchana Vaishnavi Gandikota, Jonas Geiping, Zorah Lähner, Adam Czapliński, Michael Moeller

Many applications require the robustness, or ideally the invariance, of a neural network to certain transformations of input data.

3D Point Cloud Classification Computational Efficiency +1

Paper
Add Code

DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with Differentially Private Data Augmentations

1 code implementation • 2 Mar 2021 • Eitan Borgnia, Jonas Geiping, Valeriia Cherepanova, Liam Fowl, Arjun Gupta, Amin Ghiasi, Furong Huang, Micah Goldblum, Tom Goldstein

The InstaHide method has recently been proposed as an alternative to DP training that leverages supposed privacy properties of the mixup augmentation, although without rigorous guarantees.

Data Poisoning

Paper
Code

What Doesn't Kill You Makes You Robust(er): How to Adversarially Train against Data Poisoning

1 code implementation • 26 Feb 2021 • Jonas Geiping, Liam Fowl, Gowthami Somepalli, Micah Goldblum, Michael Moeller, Tom Goldstein

Data poisoning is a threat model in which a malicious actor tampers with training data to manipulate outcomes at inference time.

Data Poisoning

Paper
Code

Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release

no code implementations • 16 Feb 2021 • Liam Fowl, Ping-Yeh Chiang, Micah Goldblum, Jonas Geiping, Arpit Bansal, Wojtek Czaja, Tom Goldstein

Large organizations such as social media companies continually release data, for example user images.

Data Poisoning

Paper
Add Code

Inverting Gradients - How easy is it to break privacy in federated learning?

1 code implementation • NeurIPS 2020 • Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, Michael Moeller

The idea of federated learning is to collaboratively train a neural network on a server.

Federated Learning

237

Paper
Code

Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

1 code implementation • 18 Nov 2020 • Eitan Borgnia, Valeriia Cherepanova, Liam Fowl, Amin Ghiasi, Jonas Geiping, Micah Goldblum, Tom Goldstein, Arjun Gupta

Data poisoning and backdoor attacks manipulate victim models by maliciously modifying training data.

Data Augmentation Data Poisoning

Paper
Code

Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

2 code implementations • ICLR 2021 • Jonas Geiping, Liam Fowl, W. Ronny Huang, Wojciech Czaja, Gavin Taylor, Michael Moeller, Tom Goldstein

We consider a particularly malicious poisoning attack that is both "from scratch" and "clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data.

Data Poisoning

Paper
Code

Fast Convex Relaxations using Graph Discretizations

no code implementations • 23 Apr 2020 • Jonas Geiping, Fjedor Gaede, Hartmut Bauermeister, Michael Moeller

We discuss this methodology in detail and show examples in multi-label segmentation by minimal partitions and stereo estimation, where we demonstrate that the proposed graph discretization can reduce runtime as well as memory consumption of convex relaxations of matching problems by up to a factor of 10.

Optical Flow Estimation Segmentation

Paper
Add Code

MetaPoison: Practical General-purpose Clean-label Data Poisoning

2 code implementations • NeurIPS 2020 • W. Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, Tom Goldstein

Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models.

AutoML Bilevel Optimization +2

Paper
Code

Inverting Gradients -- How easy is it to break privacy in federated learning?

6 code implementations • 31 Mar 2020 • Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, Michael Moeller

The idea of federated learning is to collaboratively train a neural network on a server.

Federated Learning

330

Paper
Code

WITCHcraft: Efficient PGD attacks with random step size

no code implementations • 18 Nov 2019 • Ping-Yeh Chiang, Jonas Geiping, Micah Goldblum, Tom Goldstein, Renkun Ni, Steven Reich, Ali Shafahi

State-of-the-art adversarial attacks on neural networks use expensive iterative methods and numerous random restarts from different initial points.

Computational Efficiency

Paper
Add Code

Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory

1 code implementation • ICLR 2020 • Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein

We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike.

Learning Theory

Paper
Code

Parametric Majorization for Data-Driven Energy Minimization Methods

1 code implementation • ICCV 2019 • Jonas Geiping, Michael Moeller

Energy minimization methods are a classical tool in a multitude of computer vision applications.

Paper
Code

Composite Optimization by Nonconvex Majorization-Minimization

no code implementations • 20 Feb 2018 • Jonas Geiping, Michael Moeller

A popular class of algorithms for solving such problems are majorization-minimization techniques which iteratively approximate the composite nonconvex function by a majorizing function that is easy to minimize.

Super-Resolution

Paper
Add Code

Multiframe Motion Coupling for Video Super Resolution

1 code implementation • 23 Nov 2016 • Jonas Geiping, Hendrik Dirks, Daniel Cremers, Michael Moeller

The idea of video super resolution is to use different view points of a single scene to enhance the overall resolution and quality.

Motion Estimation Video Super-Resolution

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.