1 code implementation • NeurIPS 2023 • Ev Zisselman, Itai Lavie, Daniel Soudry, Aviv Tamar
Our insight is that learning a policy that effectively $\textit{explores}$ the domain is harder to memorize than a policy that maximizes reward for a specific task, and therefore we expect such learned behavior to generalize well; we indeed demonstrate this empirically on several domains that are difficult for invariance-based approaches.