CAMoE

Introduced by Cheng et al. in Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE is a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (MoE) for video-text retrieval. The CAMoE employs Mixture-of-Experts (MoE) to extract multi-perspective video representations, including action, entity, scene, etc., then align them with the corresponding part of the text. A Dual Softmax Loss (DSL) is used to avoid the one-way optimum-match which occurs in previous contrastive methods. Introducing the intrinsic prior of each pair in a batch, DSL serves as a reviser to correct the similarity matrix and achieves the dual optimal match.

Source: Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Retrieval	1	33.33%
Video Retrieval	1	33.33%
Video-Text Retrieval	1	33.33%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
BERT	Language Models
Dual Softmax Loss	Loss Functions
Vision Transformer	Image Models

Categories

Add Remove

Video-Text Retrieval Models