no code implementations • 29 Sep 2021 • JingJie Wang, Xiang Wei, Xiaoyu Liu
By appropriately compressing the dimensions of the self-attention relationship variables, the Transformer network can be more efficient and even perform better.
Image Classification Relation