Search Results for author: Muhammad Uzair Khattak

Found 8 papers, 6 papers with code

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

no code implementations • 6 May 2024 • Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan

Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks.

Autonomous Vehicles Video Understanding

Paper
Add Code

Learning to Prompt with Text Only Supervision for Vision-Language Models

1 code implementation • 4 Jan 2024 • Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari

While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.

Prompt Engineering

Paper
Code

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

no code implementations • NeurIPS 2023 • Jameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks.

Domain Generalization Zero-shot Generalization

Paper
Add Code

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

2 code implementations • ICCV 2023 • Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

Video transformer designs are based on self-attention that can model global context at a high computational cost.

Ranked #1 on Action Recognition on Diving-48

Action Recognition Temporal Action Localization +1

Paper
Code

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations • ICCV 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Ranked #2 on Prompt Engineering on ImageNet V2

Prompt Engineering

187

Paper
Code

Fine-tuned CLIP Models are Efficient Video Learners

1 code implementation • CVPR 2023 • Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan

Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.

216

Paper
Code