1 code implementation • 20 Jun 2023 • Jonathan D. Chang, Kiante Brantley, Rajkumar Ramamurthy, Dipendra Misra, Wen Sun
In particular, we extend RL algorithms to allow them to interact with a dynamic black-box guide LLM and propose RL with guided feedback (RLGF), a suite of RL algorithms for LLM fine-tuning.
2 code implementations • ICLR 2020 • Kiante Brantley, Wen Sun, Mikael Henaff
We present a simple and effective algorithm designed to address the covariate shift problem in imitation learning.