no code implementations • 23 Jun 2023 • Jinxin Liu, Lipeng Zu, Li He, Donglin Wang
As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards.