Search Results for author: Shengping Li

Found 1 papers, 1 papers with code

Improving Transformers with Dynamically Composable Multi-Head Attention

1 code implementation14 May 2024 Da Xiao, Qingye Meng, Shengping Li, Xingyuan Yuan

At the core of DCMHA is a $\it{Compose}$ function that transforms the attention score and weight matrices in an input-dependent way.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.