no code implementations • 30 May 2024 • Muhammad Saif Ullah Khan, Dhavalkumar Limbachiya, Didier Stricker, Muhammad Zeshan Afzal
Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems.
no code implementations • 8 May 2024 • Iqraa Ehsan, Tahira Shehzadi, Didier Stricker, Muhammad Zeshan Afzal
Table detection, a pivotal task in document analysis, aims to precisely recognize and locate tables within document images.
no code implementations • 6 May 2024 • Sankalp Sinha, Muhammad Saif Ullah Khan, Talha Uddin Sheikh, Didier Stricker, Muhammad Zeshan Afzal
We provide a comprehensive document image classification analysis in Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) settings to address this gap.
no code implementations • 30 Apr 2024 • Tahira Shehzadi, Shalini Sarode, Didier Stricker, Muhammad Zeshan Afzal
However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms.
no code implementations • 27 Apr 2024 • Tahira Shehzadi, Didier Stricker, Muhammad Zeshan Afzal
This paper navigates the complexities of understanding various elements within document images, such as text, images, tables, and headings.
no code implementations • 2 Apr 2024 • Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal
In this paper, we address the limitations of the DETR-based semi-supervised object detection (SSOD) framework, particularly focusing on the challenges posed by the quality of object queries.
Ranked #1 on Semi-Supervised Object Detection on COCO
no code implementations • 11 Mar 2024 • Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc van Gool, Didier Stricker, Muhammad Zeshan Afzal
We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks.
Ranked #1 on Age Classification on EMOTIC
1 code implementation • ICCV 2023 • Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc van Gool, Didier Stricker, Federico Tombari, Muhammad Zeshan Afzal
While the model faces a disjoint set of classes in each task in this setting, we argue that these classes can be encoded to the same embedding space of a pre-trained language encoder.
no code implementations • 23 Jun 2023 • Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, Muhammad Zeshan Afzal
Upon integrating query modifications in the DETR, we outperform prior works and achieve new state-of-the-art results with the mAP of 96. 9\%, 95. 7\% and 99. 3\% on TableBank, PubLaynet, PubTables, respectively.
Ranked #3 on Document Layout Analysis on PubLayNet val
2 code implementations • 7 Jun 2023 • Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal
The astounding performance of transformers in natural language processing (NLP) has motivated researchers to explore their applications in computer vision tasks.
no code implementations • 4 May 2023 • Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, Muhammad Zeshan Afzal
Table detection is the task of classifying and localizing table objects within document images.
no code implementations • 27 Apr 2023 • Hamam Mokayed, Palaiahnakote Shivakumara, Lama Alkhaled, Rajkumar Saini, Muhammad Zeshan Afzal, Yan Chai Hum, Marcus Liwicki
Vehicle detection in real-time scenarios is challenging because of the time constraints and the presence of multiple types of vehicles with different speeds, shapes, structures, etc.
no code implementations • CVPR 2023 • Muhammad Ferjad Naeem, Muhammad Gul Zain Ali Khan, Yongqin Xian, Muhammad Zeshan Afzal, Didier Stricker, Luc van Gool, Federico Tombari
Our proposed model, I2MVFormer, learns multi-view semantic embeddings for zero-shot image classification with these class views.
no code implementations • 20 Oct 2022 • Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc van Gool, Alain Pagani, Didier Stricker, Muhammad Zeshan Afzal
CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions.
1 code implementation • 28 Apr 2022 • Danish Nazir, Marcus Liwicki, Didier Stricker, Muhammad Zeshan Afzal
Depth completion involves recovering a dense depth map from a sparse map and an RGB image.
Ranked #1 on Depth Completion on KITTI Depth Completion
no code implementations • 29 Apr 2021 • Khurram Azeem Hashmi, Marcus Liwicki, Didier Stricker, Muhammad Adnan Afzal, Muhammad Ahtsham Afzal, Muhammad Zeshan Afzal
Table understanding has substantially benefited from the recent breakthroughs in deep neural networks.
no code implementations • 21 Apr 2021 • Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, Muhammad Noman Afzal, Muhammad Zeshan Afzal
Subsequently, these anchors are exploited to locate the rows and columns in tabular images.
no code implementations • 1 Apr 2018 • Andreas Kölsch, Ashutosh Mishra, Saurabh Varshneya, Muhammad Zeshan Afzal, Marcus Liwicki
This paper introduces a very challenging dataset of historic German documents and evaluates Fully Convolutional Neural Network (FCNN) based methods to locate handwritten annotations of any kind in these documents.
no code implementations • 3 Nov 2017 • Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, Marcus Liwicki
This paper presents an approach for real-time training and testing for document image classification.
5 code implementations • 11 Apr 2017 • Muhammad Zeshan Afzal, Andreas Kölsch, Sheraz Ahmed, Marcus Liwicki
We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half.
Ranked #27 on Document Image Classification on RVL-CDIP
4 code implementations • 19 Mar 2017 • Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Marcus Liwicki, Muhammad Zeshan Afzal
In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions.
no code implementations • 19 Mar 2017 • Andreas Kölsch, Muhammad Zeshan Afzal, Marcus Liwicki
In this work, we propose the combined usage of low- and high-level blocks of convolutional neural networks (CNNs) for improving object recognition.
no code implementations • 4 May 2016 • Sheraz Ahmed, Muhammad Imran Malik, Muhammad Zeshan Afzal, Koichi Kise, Masakazu Iwamura, Andreas Dengel, Marcus Liwicki
The method is generic, language independent and can be used for generation of labeled documents datasets (both scanned and cameracaptured) in any cursive and non-cursive language, e. g., English, Russian, Arabic, Urdu, etc.
3 code implementations • 17 Sep 2015 • Peter Burkert, Felix Trier, Muhammad Zeshan Afzal, Andreas Dengel, Marcus Liwicki
The proposed architecture achieves 99. 6% for CKP and 98. 63% for MMI, therefore performing better than the state of the art using CNNs.
Ranked #1 on Facial Expression Recognition (FER) on MMI