Multimodal Reasoning

37 papers with code • 3 benchmarks • 4 datasets

Reasoning over multimodal inputs.

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Reasoning

Dataset	Best Model	Compare
REBUS	GPT-4V	See all
MATH-V	GPT4V	See all
AlgoPuzzleVQA	GPT-4	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations

virginie-do/e-SNLI-VE • • 7 Apr 2020

The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning.

Paper
Code

Dual Attention Networks for Multimodal Reasoning and Matching

iammrhelo/pytorch-vqa-dan • • CVPR 2017

We propose Dual Attention Networks (DANs) which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language.

Paper
Code

WebQA: Multihop and Multimodal QA

WebQnA/WebQA_Baseline • • CVPR 2022

Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation.

Paper
Code

Multimodal Analogical Reasoning over Knowledge Graphs

zjunlp/MKG_Analogy • • 1 Oct 2022

Analogical reasoning is fundamental to human cognition and holds an important place in various fields.

Paper
Code

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models

zoeyyao27/graph-of-thought • • 26 May 2023

Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph.

Paper
Code

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

declare-lab/puzzle-reasoning • 6 Mar 2024

We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning.

Paper
Code

PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

declare-lab/llm-puzzletest • 20 Mar 2024

As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns.

Paper
Code

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

phellonchen/DMRM • • 18 Dec 2019

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image.

Paper
Code

A Multimodal Framework for the Detection of Hateful Memes

Nithin-Holla/meme_challenge • • 23 Dec 2020

An increasingly common expression of online hate speech is multimodal in nature and comes in the form of memes.

Paper
Code

UniT: Multimodal Multitask Learning with a Unified Transformer

facebookresearch/mmf • • ICCV 2021

We propose UniT, a Unified Transformer model to simultaneously learn the most prominent tasks across different domains, ranging from object detection to natural language understanding and multimodal reasoning.

Paper
Code

Multimodal Reasoning

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result