no code implementations • 3 Oct 2022 • Tom Bewley, Jonathan Lawry, Arthur Richards, Rachel Craddock, Ian Henderson
Recent efforts to learn reward functions from human feedback have tended to use deep neural networks, whose lack of transparency hampers our ability to explain agent behaviour or verify alignment.