Zero-shot Classification (unified classes)
3 papers with code • 1 benchmarks • 2 datasets
This task has no description! Would you like to contribute one?
Most implemented papers
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.
Uni3D: Exploring Unified 3D Representation at Scale
Scaling up representations for images or text has been extensively investigated in the past few years and has led to revolutions in learning vision and language.
ImageBind: One Embedding Space To Bind Them All
We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.