1 code implementation • 9 May 2024 • Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang
Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities. However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications.
no code implementations • 11 Apr 2024 • Dan Qiao, Yu-Xiang Wang
We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.
no code implementations • 2 Feb 2024 • Dan Qiao, Yu-Xiang Wang
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints -- a new problem motivated by real-world applications where deployments of new policies are costly and the number of policy updates must be minimized.
1 code implementation • 19 Sep 2023 • Juntao Li, Zecheng Tang, Yuyang Ding, Pinzheng Wang, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang
This report provides the main details to pre-train an analogous model, including pre-training data processing, Bilingual Flan data collection, the empirical observations that inspire our model architecture design, training objectives of different stages, and other enhancement techniques.
1 code implementation • 19 Aug 2023 • Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan
In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods.
no code implementations • 24 Jun 2023 • Sunil Madhow, Dan Qiao, Ming Yin, Yu-Xiang Wang
Developing theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable.
no code implementations • 18 May 2023 • Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin, Hongyuan Zha
The difficulty of appropriately assigning credit is particularly heightened in cooperative MARL with sparse reward, due to the concurrent time and structural scales involved.
no code implementations • 24 Feb 2023 • Dan Qiao, Ming Yin, Yu-Xiang Wang
In many real-life reinforcement learning (RL) problems, deploying new policies is costly.
no code implementations • 9 Dec 2022 • Dan Qiao, Yu-Xiang Wang
We close this gap for the JDP case by designing an $\epsilon$-JDP algorithm with a regret of $\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$ which matches the information-theoretic lower bound of non-private learning for all choices of $\epsilon> S^{1. 5}A^{0. 5} H^2/\sqrt{T}$.
1 code implementation • COLING 2022 • Dan Qiao, Chenchen Dai, Yuyang Ding, Juntao Li, Qiang Chen, Wenliang Chen, Min Zhang
The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks.
no code implementations • 3 Oct 2022 • Dan Qiao, Yu-Xiang Wang
We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the \emph{reward-free} exploration setting.
no code implementations • 23 Sep 2022 • Jianyu Xu, Dan Qiao, Yu-Xiang Wang
We show that a doubly fair policy must be random to have higher revenue than the best trivial policy that assigns the same price to different groups.
no code implementations • 13 Feb 2022 • Dan Qiao, Ming Yin, Ming Min, Yu-Xiang Wang
In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$.
no code implementations • 24 Jan 2022 • Dan Qiao, Zhaoxia Peng, Guoguang Wen, TingWen Huang
This paper develops a novel saturated Nussbaum function to relax such limitations and proposes a Nussbaum function based control scheme for the consensus problem of multi-agent systems with arbitrary non-identical unknown control directions and safe control progress.