no code implementations • 4 Oct 2023 • Shihao Zou, Xianying Huang, Xudong Shen
MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features.
Ranked #2 on Emotion Recognition in Conversation on IEMOCAP