Multimodal Intent Recognition

10 papers with code • 3 benchmarks • 3 datasets

Intent recognition on multimodal content.

Image source: MIntRec: A New Dataset for Multimodal Intent Recognition

Libraries

Use these libraries to find Multimodal Intent Recognition models and implementations
2 papers
529
2 papers
313
See all 8 libraries.

Most implemented papers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

google-research/bert NAACL 2019

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

huggingface/transformers arXiv 2019

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

google-research/ALBERT ICLR 2020

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

dandelin/vilt 5 Feb 2021

Vision-and-Language Pre-training (VLP) has improved performance on various joint vision-and-language downstream tasks.

MIntRec: A New Dataset for Multimodal Intent Recognition

thuiar/mintrec 9 Sep 2022

This paper introduces a novel dataset for multimodal intent recognition (MIntRec) to address this issue.

MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

victorsungo/mmdialog 10 Nov 2022

First, it is the largest multi-modal conversation dataset by the number of dialogues by 88x.

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

alibabaresearch/damo-convai 19 May 2023

In this paper, we propose Speech-text dialog Pre-training for spoken dialog understanding with ExpliCiT cRoss-Modal Alignment (SPECTRA), which is the first-ever speech-text dialog pre-training model.

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

AlibabaResearch/DAMO-ConvAI 24 May 2023

It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data.

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

thuiar/TCL-MAP 22 Dec 2023

To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism.

MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations

thuiar/mintrec2.0 16 Mar 2024

We believe that MIntRec2. 0 will serve as a valuable resource, providing a pioneering foundation for research in human-machine conversational interactions, and significantly facilitating related applications.