Search Results for author: Zhiyin Ma

Found 3 papers, 3 papers with code

Exploring the Capabilities of Large Multimodal Models on Dense Text

1 code implementation • 9 May 2024 • Shuo Zhang, Biao Yang, Zhang Li, Zhiyin Ma, Yuliang Liu, Xiang Bai

To further explore the capabilities of LMM in complex text tasks, we propose the DT-VQA dataset, with 170k question-answer pairs.

300

Paper
Code

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

1 code implementation • 7 Mar 2024 • Yuliang Liu, Biao Yang, Qiang Liu, Zhang Li, Zhiyin Ma, Shuo Zhang, Xiang Bai

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.

document understanding Key Information Extraction +4

1,423

Paper
Code

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

1 code implementation • 11 Nov 2023 • Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai

Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.

Image Captioning Question Answering +2

1,423

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.