Search Results for author: Shaoxiong Duan

Found 1 papers, 1 papers with code

From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers

1 code implementation18 Oct 2023 Shaoxiong Duan, Yining Shi, Wei Xu

We then introduce Attention Bias Calibration (ABC), a calibration stage that enables the model to automatically learn the proper attention biases, which we show to be connected to mechanisms in relative position encoding.

Position

Cannot find the paper you are looking for? You can Submit a new open access paper.