Video Recognition Models

Audiovisual SlowFast Network

Introduced by Xiao et al. in Audiovisual SlowFast Networks for Video Recognition

Audiovisual SlowFast Network, or AVSlowFast, is an architecture for integrated audiovisual perception. AVSlowFast has Slow and Fast visual pathways that are integrated with a Faster Audio pathway to model vision and sound in a unified representation. Audio and visual features are fused at multiple layers, enabling audio to contribute to the formation of hierarchical audiovisual concepts. To overcome training difficulties that arise from different learning dynamics for audio and visual modalities, DropPathway is used, which randomly drops the Audio pathway during training as an effective regularization technique. Inspired by prior studies in neuroscience, hierarchical audiovisual synchronization is performed to learn joint audiovisual features.

Source: Audiovisual SlowFast Networks for Video Recognition

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Action Classification 1 50.00%
Video Recognition 1 50.00%

Components


Component Type
DropPathway
Regularization

Categories