Search Results for author: Hammad Ayyubi

Found 3 papers, 1 papers with code

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

no code implementations18 May 2024 Junzhang Liu, Zhecan Wang, Hammad Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang

Despite the widespread adoption of Vision-Language Understanding (VLU) benchmarks such as VQA v2, OKVQA, A-OKVQA, GQA, VCR, SWAG, and VisualCOMET, our analysis reveals a pervasive issue affecting their integrity: these benchmarks contain samples where answers rely on assumptions unsupported by the provided context.

Visual Question Answering (VQA)

RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos

no code implementations27 Mar 2024 Ali Zare, Yulei Niu, Hammad Ayyubi, Shih-Fu Chang

(3) Annotation cost: Annotating instructional videos with step-level labels (i. e., timestamp) or sequence-level labels (i. e., action category) is demanding and labor-intensive, limiting its generalizability to large-scale datasets. In this work, we propose a new and practical setting, called adaptive procedure planning in instructional videos, where the procedure length is not fixed or pre-determined.

Relation Retrieval +1

Weakly-Supervised Temporal Article Grounding

1 code implementation22 Oct 2022 Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang

Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales.

Natural Language Queries Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.