1 code implementation • 14 May 2024 • Da Xiao, Qingye Meng, Shengping Li, Xingyuan Yuan
At the core of DCMHA is a $\it{Compose}$ function that transforms the attention score and weight matrices in an input-dependent way.
Language Modelling