Learning Rate Schedules

Inverse Square Root Schedule

Inverse Square Root is a learning rate schedule 1 / $\sqrt{\max\left(n, k\right)}$ where $n$ is the current training iteration and $k$ is the number of warm-up steps. This sets a constant learning rate for the first $k$ steps, then exponentially decays the learning rate until pre-training is over.


Paper Code Results Date Stars


Task Papers Share
Language Modelling 97 9.62%
Question Answering 65 6.45%
Text Generation 48 4.76%
Sentence 44 4.37%
Translation 32 3.17%
Retrieval 31 3.08%
Machine Translation 26 2.58%
Natural Language Understanding 22 2.18%
Semantic Parsing 19 1.88%


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign
