1 code implementation • 2 Nov 2021 • Giuseppe Paolo, Miranda Coninx, Alban Laflaquière, Stephane Doncieux
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions.