Search Results for author: Jiafan He

Found 24 papers, 0 papers with code

Settling Constant Regrets in Linear Markov Decision Processes

no code implementations • 16 Apr 2024 • Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu

To the best of our knowledge, Cert-LSVI-UCB is the first algorithm to achieve a constant, instance-dependent, high-probability regret bound in RL with linear function approximation for infinite runs without relying on prior distribution assumptions.

Reinforcement Learning (RL)

Paper
Add Code

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

no code implementations • 16 Apr 2024 • Qiwei Di, Jiafan He, Quanquan Gu

Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM).

Paper
Add Code

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

no code implementations • 14 Feb 2024 • Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang

We also prove a lower bound to show that the additive dependence on $C$ is optimal.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

no code implementations • 14 Feb 2024 • Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes.

Paper
Add Code

Reinforcement Learning from Human Feedback with Active Queries

no code implementations • 14 Feb 2024 • Kaixuan Ji, Jiafan He, Quanquan Gu

Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF).

Active Learning reinforcement-learning

Paper
Add Code

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

no code implementations • 26 Nov 2023 • Heyang Zhao, Jiafan He, Quanquan Gu

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

no code implementations • 2 Oct 2023 • Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees.

Offline RL reinforcement-learning +1

Paper
Add Code

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

no code implementations • 15 May 2023 • Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu

Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$.

Open-Ended Question Answering reinforcement-learning +1

Paper
Add Code

Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension

no code implementations • 15 May 2023 • Yue Wu, Jiafan He, Quanquan Gu

Recently, there has been remarkable progress in reinforcement learning (RL) with general function approximation.

Open-Ended Question Answering Reinforcement Learning (RL)

Paper
Add Code

Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation

no code implementations • 10 May 2023 • Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

We study multi-agent reinforcement learning in the setting of episodic Markov decision processes, where multiple agents cooperate via communication through a central server.

Multi-agent Reinforcement Learning reinforcement-learning

Paper
Add Code

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

no code implementations • 16 Mar 2023 • Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu

We show that, when the misspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/\Delta)$ as in the well-specified setting up to logarithmic factors.

Multi-Armed Bandits

Paper
Add Code

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

no code implementations • 21 Feb 2023 • Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

Computational Efficiency Decision Making +1

Paper
Add Code

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

no code implementations • 12 Dec 2022 • Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study reinforcement learning (RL) with linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

no code implementations • 7 Jul 2022 • Jiafan He, Tianhao Wang, Yifei Min, Quanquan Gu

To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated contextual linear bandits, while achieving the same regret guarantee as in the single-agent setting.

Paper
Add Code

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

no code implementations • 13 May 2022 • Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

Multi-Armed Bandits

Paper
Add Code

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

no code implementations • 28 Feb 2022 • Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.

regression

Paper
Add Code

Learning Stochastic Shortest Path with Linear Function Approximation

no code implementations • 25 Oct 2021 • Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP.

Paper
Add Code

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

no code implementations • 19 Oct 2021 • Chonghua Liao, Jiafan He, Quanquan Gu

To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

Privacy Preserving reinforcement-learning +1

Paper
Add Code

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

no code implementations • 22 Jun 2021 • Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity.

Offline RL reinforcement-learning +2

Paper
Add Code

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

no code implementations • 17 Feb 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode.

Reinforcement Learning (RL)

Paper
Add Code

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

no code implementations • 23 Nov 2020 • Jiafan He, Dongruo Zhou, Quanquan Gu

Reinforcement learning (RL) with linear function approximation has received increasing attention recently.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

no code implementations • NeurIPS 2021 • Jiafan He, Dongruo Zhou, Quanquan Gu

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

no code implementations • 23 Jun 2020 • Dongruo Zhou, Jiafan He, Quanquan Gu

We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.