no code implementations • 5 Apr 2024 • Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho
We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention.
Ranked #4 on Action Recognition on Diving-48
no code implementations • CVPR 2022 • Dayoung Gong, Joonseok Lee, Manjin Kim, Seong Jong Ha, Minsu Cho
The task of predicting future actions from a video is crucial for a real-world agent interacting with others.
1 code implementation • NeurIPS 2021 • Manjin Kim, Heeseung Kwon, Chunyu Wang, Suha Kwak, Minsu Cho
Convolution has been arguably the most important feature transform for modern neural networks, leading to the advance of deep learning.
Ranked #11 on Action Recognition on Diving-48
1 code implementation • ICCV 2021 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
With a sufficient volume of the neighborhood in space and time, it effectively captures long-term interaction and fast motion in the video, leading to robust action recognition.
Ranked #18 on Action Recognition on Something-Something V1 (using extra training data)
1 code implementation • 1 Jan 2021 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
We leverage the whole volume of STSS and let our model learn to extract an effective motion representation from it.
2 code implementations • ECCV 2020 • Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho
As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding.
Ranked #1 on Video Classification on Something-Something V2