Search Results for author: Ziniu Li

Found 13 papers, 6 papers with code

Why Transformers Need Adam: A Hessian Perspective

1 code implementation • 26 Feb 2024 • Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Paper
Code

Policy Optimization in RLHF: The Impact of Out-of-preference Data

1 code implementation • 17 Dec 2023 • Ziniu Li, Tian Xu, Yang Yu

These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model.

Paper
Code

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

1 code implementation • 16 Oct 2023 • Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo

Based on these properties, we develop ReMax, a new algorithm tailored for RLHF.

115

Paper
Code

Provably Efficient Adversarial Imitation Learning with Unknown Transitions

1 code implementation • 11 Jun 2023 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed.

Imitation Learning

Paper
Code

Deploying Offline Reinforcement Learning with Human Feedback

no code implementations • 13 Mar 2023 • Ziniu Li, Ke Xu, Liu Liu, Lanqing Li, Deheng Ye, Peilin Zhao

To address this issue, we propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase.

Decision Making Model Selection +3

Paper
Add Code

Theoretical Analysis of Offline Imitation With Supplementary Dataset

1 code implementation • 27 Jan 2023 • Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

This paper considers a situation where, besides the small amount of expert data, a supplementary dataset is available, which can be collected cheaply from sub-optimal policies.

Imitation Learning

Paper
Code

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis

no code implementations • 3 Aug 2022 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

Imitation learning learns a policy from expert trajectories.

Imitation Learning

Paper
Add Code

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

no code implementations • 22 Mar 2022 • Ziniu Li, Tian Xu, Yang Yu

In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is $\widetilde{\mathcal O}(|\mathcal S|^2|\mathcal A|^2 (1-\gamma)^{-5}\varepsilon^{-2})$.

Q-Learning

Paper
Add Code

Rethinking ValueDice: Does It Really Improve Performance?

no code implementations • 5 Feb 2022 • Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

First, we show that ValueDice could reduce to BC under the offline setting.

Imitation Learning

Paper
Add Code

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

1 code implementation • ICLR 2022 • Ziniu Li, Yingru Li, Yushun Zhang, Tong Zhang, Zhi-Quan Luo

However, it is limited to the case where 1) a good feature is known in advance and 2) this feature is fixed during the training: if otherwise, RLSVI suffers an unbearable computational burden to obtain the posterior samples of the parameter in the $Q$-value function.

Efficient Exploration reinforcement-learning +1

Paper
Code

On Generalization of Adversarial Imitation Learning and Beyond

no code implementations • 19 Jun 2021 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

For some MDPs, we show that vanilla AIL has a worse sample complexity than BC.

Imitation Learning

Paper
Add Code

Error Bounds of Imitating Policies and Environments

no code implementations • NeurIPS 2020 • Tian Xu, Ziniu Li, Yang Yu

In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation.

Imitation Learning Model-based Reinforcement Learning +2

Paper
Add Code

On Value Discrepancy of Imitation Learning

no code implementations • 16 Nov 2019 • Tian Xu, Ziniu Li, Yang Yu

We also show that the framework leads to the value discrepancy of GAIL in an order of O((1-\gamma)^{-1}).

Imitation Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.