In Support of Over-Parametrization in Deep Reinforcement Learning: an Empirical Study
There is significant recent evidence in supervised learning that, in the over-parametrized setting, wider networks achieve better test error. In other words, the bias-variance tradeoff is not directly observable when increasing network width arbitrarily. We investigate whether a corresponding phenomenon is present in reinforcement learning. We experiment on four OpenAI Gym environments, increasing the width of the value and policy networks beyond their prescribed values. Our empirical results lend support to this hypothesis. However, tuning the hyperparameters of each network width separately remains as important future work in environments/algorithms where the optimal hyperparameters vary noticably across widths, confounding the results when the same hyperparameters are used for all widths.
PDF Abstract