1 code implementation • 22 Apr 2024 • Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar
Our main finding is that, in general, approaches that use on-policy sampling or attempt to push down the likelihood on certain responses (i. e., employ a "negative gradient") outperform offline and maximum likelihood objectives.
no code implementations • 4 Apr 2024 • Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie
In this paper, we introduce Direct Nash Optimization (DNO), a provable and scalable algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences.
no code implementations • 20 Mar 2024 • Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford
We study two types of settings: one where there is iid noise in the observation, and a more challenging setting where there is also the presence of exogenous noise, which is non-iid noise that is temporally correlated, such as the motion of people or cars in the background.
1 code implementation • 20 Feb 2024 • Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee
We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.
no code implementations • 18 Jan 2024 • Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie
The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.
no code implementations • NeurIPS 2023 • Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.
no code implementations • 8 Nov 2022 • Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.
no code implementations • 9 Oct 2022 • Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade
Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.
no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.
3 code implementations • 5 Feb 2022 • Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.
no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.
no code implementations • 9 Jun 2021 • Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad
We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.
no code implementations • NeurIPS 2021 • Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai
This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL.
no code implementations • 5 Feb 2021 • Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.
no code implementations • 2 Nov 2020 • Philip Amortila, Nan Jiang, Tengyang Xie
Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case.
1 code implementation • 11 Aug 2020 • Tengyang Xie, Nan Jiang
We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class.
no code implementations • 9 Mar 2020 • Tengyang Xie, Nan Jiang
We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning.
no code implementations • NeurIPS 2019 • Tengyang Xie, Yifei Ma, Yu-Xiang Wang
To solve this problem, we consider a marginalized importance sampling (MIS) estimator that recursively estimates the state marginal distribution for the target policy at every step.
no code implementations • NeurIPS 2019 • Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization.
no code implementations • 1 Feb 2019 • Tengyang Xie, Philip S. Thomas, Gerome Miklau
Many reinforcement learning applications involve the use of data that is sensitive, such as medical records of patients or financial information.
no code implementations • NeurIPS 2018 • Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yin-Lam Chow, Daoming Lyu, Daesub Yoon
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare.