Search Results for author: Dan Qiao

Found 14 papers, 4 papers with code

OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

1 code implementation • 9 May 2024 • Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities. However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications.

Common Sense Reasoning named-entity-recognition +2

Paper
Code

Differentially Private Reinforcement Learning with Self-Play

no code implementations • 11 Apr 2024 • Dan Qiao, Yu-Xiang Wang

We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.

Multi-agent Reinforcement Learning reinforcement-learning

Paper
Add Code

Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

no code implementations • 2 Feb 2024 • Dan Qiao, Yu-Xiang Wang

We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints -- a new problem motivated by real-world applications where deployments of new policies are costly and the number of policy updates must be minimized.

Multi-agent Reinforcement Learning reinforcement-learning

Paper
Add Code

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

1 code implementation • 19 Sep 2023 • Juntao Li, Zecheng Tang, Yuyang Ding, Pinzheng Wang, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang

This report provides the main details to pre-train an analogous model, including pre-training data processing, Bilingual Flan data collection, the empirical observations that inspire our model architecture design, training objectives of different stages, and other enhancement techniques.

Paper
Code

GameEval: Evaluating LLMs on Conversational Games

1 code implementation • 19 Aug 2023 • Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan

In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods.

Question Answering

Paper
Code

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

no code implementations • 24 Jun 2023 • Sunil Madhow, Dan Qiao, Ming Yin, Yu-Xiang Wang

Developing theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable.

Offline RL reinforcement-learning

Paper
Add Code

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

no code implementations • 18 May 2023 • Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin, Hongyuan Zha

The difficulty of appropriately assigning credit is particularly heightened in cooperative MARL with sparse reward, due to the concurrent time and structural scales involved.

Decision Making Multi-agent Reinforcement Learning +2

Paper
Add Code

Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

no code implementations • 24 Feb 2023 • Dan Qiao, Ming Yin, Yu-Xiang Wang

In many real-life reinforcement learning (RL) problems, deploying new policies is costly.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Near-Optimal Differentially Private Reinforcement Learning

no code implementations • 9 Dec 2022 • Dan Qiao, Yu-Xiang Wang

We close this gap for the JDP case by designing an $\epsilon$-JDP algorithm with a regret of $\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$ which matches the information-theoretic lower bound of non-private learning for all choices of $\epsilon> S^{1. 5}A^{0. 5} H^2/\sqrt{T}$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

SelfMix: Robust Learning Against Textual Label Noise with Self-Mixup Training

1 code implementation • COLING 2022 • Dan Qiao, Chenchen Dai, Yuyang Ding, Juntao Li, Qiang Chen, Wenliang Chen, Min Zhang

The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks.

text-classification Text Classification

Paper
Code

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

no code implementations • 3 Oct 2022 • Dan Qiao, Yu-Xiang Wang

We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the \emph{reward-free} exploration setting.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Doubly Fair Dynamic Pricing

no code implementations • 23 Sep 2022 • Jianyu Xu, Dan Qiao, Yu-Xiang Wang

We show that a doubly fair policy must be random to have higher revenue than the best trivial policy that assigns the same price to different groups.

Fairness

Paper
Add Code

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

no code implementations • 13 Feb 2022 • Dan Qiao, Ming Yin, Ming Min, Yu-Xiang Wang

In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Novel Nussbaum-Type Function based Safe Adaptive Distributed Consensus Control with Arbitrary Unknown Control Direction

no code implementations • 24 Jan 2022 • Dan Qiao, Zhaoxia Peng, Guoguang Wen, TingWen Huang

This paper develops a novel saturated Nussbaum function to relax such limitations and proposes a Nussbaum function based control scheme for the consensus problem of multi-agent systems with arbitrary non-identical unknown control directions and safe control progress.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.