Video-Text Retrieval Models

Composed Video Retrieval

Introduced by Ventura et al. in CoVR: Learning Composed Video Retrieval from Web Video Captions

The composed video retrieval (CoVR) task is a new task, where the goal is to find a video that matches both a query image and a query text. The query image represents a visual concept that the user is interested in, and the query text specifies how the concept should be modified or refined. For example, given an image of a fountain and the text during show at night, the CoVR task is to retrieve a video that shows the fountain at night with a show.

Source: CoVR: Learning Composed Video Retrieval from Web Video Captions

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories