1 code implementation • NeurIPS 2023 • Geunwoo Kim, Pierre Baldi, Stephen Mcaleer
We compare multiple LLMs and find that RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function.