1 code implementation • 16 May 2024 • Yifan Xu, Xiaoshan Yang, Yaguang Song, Changsheng Xu
Specifically, we incorporate a routed visual expert with a cross-modal bridge module into a pretrained LLM to route the vision and language flows during attention computing to enable different attention patterns in inner-modal modeling and cross-modal interaction scenarios.