no code implementations • 11 Jan 2024 • Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao Ye, Chenliang Li, Ming Yan, Shikun Zhang, Songhang Huang, Fei Huang
Vision Transformers (ViTs) have become increasingly popular in large-scale Vision and Language Pre-training (VLP) models.