1 code implementation • CVPR 2023 • Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah
The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.
Ranked #6 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)
1 code implementation • 5 Jul 2022 • Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah
Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.
1 code implementation • CVPR 2021 • Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah
In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.
1 code implementation • CVPR 2018 • Aisha Urooj Khan, Ali Borji
In the quest for robust hand segmentation methods, we evaluated the performance of the state of the art semantic segmentation methods, off the shelf and fine-tuned, on existing datasets.
no code implementations • 26 Dec 2017 • Cecilia La Place, Aisha Urooj Khan, Ali Borji
As a result of our efforts, we have seen an improvement of 10-15% in the average MCR compared to the prior methods on SkyFinder dataset.
no code implementations • 9 Oct 2016 • Jessica Finocchiaro, Aisha Urooj Khan, Ali Borji
We used both traditional computer vision approaches and deep learning in order to determine the visual cues that results in best height estimation.