Search Results for author: Zhiyin Ma

Found 3 papers, 3 papers with code

Exploring the Capabilities of Large Multimodal Models on Dense Text

1 code implementation9 May 2024 Shuo Zhang, Biao Yang, Zhang Li, Zhiyin Ma, Yuliang Liu, Xiang Bai

To further explore the capabilities of LMM in complex text tasks, we propose the DT-VQA dataset, with 170k question-answer pairs.

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

1 code implementation11 Nov 2023 Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai

Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.

Image Captioning Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.