1 code implementation • 8 May 2024 • Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang
Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context.
no code implementations • 21 Feb 2024 • Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang
Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e. g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i. e., connecting visuals to their relevant knowledge.
no code implementations • 15 May 2023 • Xuanchen Li, Yan Niu, Bo Zhao, Haoyuan Shi, Zitong An
In both applications, our model substantially alleviates artifacts such as Moir\'e and over-smoothness at similar or lower computational cost to currently top-performing models, as validated by diverse evaluations.