1 code implementation • 13 Mar 2024 • Xiaojun Xu, Yuanshun Yao, Yang Liu
While prior works focus on token-level watermark that embeds signals into the output, we design a model-level watermark that embeds signals into the LLM weights, and such signals can be detected by a paired detector.
no code implementations • 12 Mar 2024 • Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu
Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences.
no code implementations • 20 Feb 2024 • Jinlong Pang, Jialu Wang, Zhaowei Zhu, Yuanshun Yao, Chen Qian, Yang Liu
A fair classifier should ensure the benefit of people from different groups, while the group information is often sensitive and unsuitable for model training.
no code implementations • 16 Feb 2024 • Jiaheng Wei, Yuanshun Yao, Jean-Francois Ton, Hongyi Guo, Andrew Estornell, Yang Liu
In this work, we propose Factualness Evaluations via Weighting LLMs (FEWL), the first hallucination metric that is specifically designed for the scenario when gold-standard answers are absent.
no code implementations • 13 Feb 2024 • Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu
We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning.
no code implementations • 6 Jan 2024 • Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu
The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples.
1 code implementation • 14 Oct 2023 • Yuanshun Yao, Xiaojun Xu, Yang Liu
To the best of our knowledge, our work is among the first to explore LLM unlearning.
no code implementations • 9 Oct 2023 • Tongxin Yin, Jean-François Ton, Ruocheng Guo, Yuanshun Yao, Mingyan Liu, Yang Liu
To generalize the abstaining decisions to test samples, we then train a surrogate model to learn the abstaining decisions based on the IP solutions in an end-to-end manner.
1 code implementation • 10 Aug 2023 • Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li
However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations.
no code implementations • 30 Jun 2023 • Yuanshun Yao, Yang Liu
Identifying the causes of a model's unfairness is an important yet relatively unexplored task.
1 code implementation • 18 Jan 2023 • Shangyu Xie, Xin Yang, Yuanshun Yao, Tianyi Liu, Taiqing Wang, Jiankai Sun
In this work, we step further to study the leakage in the scenario of the regression model, where the private labels are continuous numbers (instead of discrete labels in classification).
no code implementations • 17 Nov 2022 • Yuanshun Yao, Chong Wang, Hang Li
The key idea is to train a surrogate model to learn the effect of removing a subset of user history on the recommendation.
1 code implementation • 6 Oct 2022 • Zhaowei Zhu, Yuanshun Yao, Jiankai Sun, Hang Li, Yang Liu
Our theoretical analyses show that directly using proxy models can give a false sense of (un)fairness.
1 code implementation • 25 Aug 2022 • Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di wu, Chong Wang
Federated learning (FL) has gained significant attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple participants.
no code implementations • 16 Jun 2022 • Ruihan Wu, Xin Yang, Yuanshun Yao, Jiankai Sun, Tianyi Liu, Kilian Q. Weinberger, Chong Wang
Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects.
no code implementations • 24 May 2022 • Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di wu, Chong Wang
In this work, we propose two evaluation algorithms that can more accurately compute the widely used AUC (area under curve) metric when using label DP in vFL.
no code implementations • 4 Mar 2022 • Xin Yang, Jiankai Sun, Yuanshun Yao, Junyuan Xie, Chong Wang
Split learning is a distributed training framework that allows multiple parties to jointly train a machine learning model over vertically partitioned data (partitioned by attributes).
no code implementations • 2 Mar 2022 • Jiankai Sun, Xin Yang, Yuanshun Yao, Chong Wang
As the raw labels often contain highly sensitive information, some recent work has been proposed to prevent the label leakage from the backpropagated gradients effectively in vFL.
no code implementations • 2 Mar 2022 • Yuanshun Yao, Chong Wang, Hang Li
Modern recommender systems face an increasing need to explain their recommendations.
no code implementations • 21 Jul 2021 • Jiankai Sun, Yuanshun Yao, Weihao Gao, Junyuan Xie, Chong Wang
Recently researchers have studied input leakage problems in Federated Learning (FL) where a malicious party can reconstruct sensitive training inputs provided by users from shared gradient.
no code implementations • 10 Jun 2021 • Jiankai Sun, Xin Yang, Yuanshun Yao, Aonan Zhang, Weihao Gao, Junyuan Xie, Chong Wang
In this paper, we propose a vFL framework based on Private Set Union (PSU) that allows each party to keep sensitive membership information to itself.
no code implementations • CVPR 2021 • Emily Wenger, Josephine Passananti, Arjun Bhagoji, Yuanshun Yao, Haitao Zheng, Ben Y. Zhao
A critical question remains unanswered: can backdoor attacks succeed using physical objects as triggers, thus making them a credible threat against deep learning systems in the real world?
no code implementations • 24 May 2019 • Yuanshun Yao, Huiying Li, Hai-Tao Zheng, Ben Y. Zhao
Recent work has proposed the concept of backdoor attacks on deep neural networks (DNNs), where misbehaviors are hidden inside "normal" models, only to be triggered by very specific inputs.
no code implementations • 27 Aug 2017 • Yuanshun Yao, Bimal Viswanath, Jenna Cryan, Hai-Tao Zheng, Ben Y. Zhao
Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers.
Cryptography and Security Social and Information Networks