Search Results for author: Haoyuan Shi

Found 3 papers, 1 papers with code

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

1 code implementation • 8 May 2024 • Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang

Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context.

Math Mathematical Reasoning

Paper
Code

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

no code implementations • 21 Feb 2024 • Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang

Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e. g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i. e., connecting visuals to their relevant knowledge.

Language Modelling Question Answering +1

Paper
Add Code

Toward Moiré-Free and Detail-Preserving Demosaicking

no code implementations • 15 May 2023 • Xuanchen Li, Yan Niu, Bo Zhao, Haoyuan Shi, Zitong An

In both applications, our model substantially alleviates artifacts such as Moir\'e and over-smoothness at similar or lower computational cost to currently top-performing models, as validated by diverse evaluations.

Demosaicking Denoising +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.