The Curse of Depth in Kernel Regime

Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Empirical results in (Lee et al., 2019) demonstrated high performance of a linearized version of training using the so-called NTK regime. In this paper, we show that the large depth limit of this regime is unexpectedly trivial, and we fully characterize the convergence rate to this trivial regime.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods