no code implementations • 12 Mar 2024 • Yuexi Chen, Vlad I. Morariu, Anh Truong, Zhicheng Liu
Mixed-media tutorials, which integrate videos, images, text, and diagrams to teach procedural skills, offer more browsable alternatives than timeline-based videos.
no code implementations • IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023 • Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Ani Nenkova, Dinesh Manocha, Vlad I. Morariu
Experiments show that our approach outperforms competitive baselines by 10-15% on three diverse datasets of forms and mobile app screen layouts for the tasks of spatial region classification, higher-order group identification, layout hierarchy extraction, reading order detection, and word grouping.
no code implementations • 27 Nov 2022 • Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu
In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.
no code implementations • 22 Apr 2022 • Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Nikolaos Barmpalios, Rajiv Jain, Ani Nenkova, Tong Sun
Document intelligence automates the extraction of information from documents and supports many business applications.
Ranked #7 on Document Layout Analysis on PubLayNet val
no code implementations • CVPR 2021 • Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu
For downstream usage, we propose a novel modality-adaptive attention mechanism for multimodal feature fusion by adaptively emphasizing language and vision signals.
no code implementations • 18 Apr 2021 • Kai Li, Curtis Wigington, Chris Tensmeyer, Vlad I. Morariu, Handong Zhao, Varun Manjunatha, Nikolaos Barmpalios, Yun Fu
Contrasted with prior work, this paper provides a complementary solution to align domains by learning the same auxiliary tasks in both domains simultaneously.
2 code implementations • CVPR 2021 • Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, Kate Saenko
We propose D-RISE, a method for generating visual explanations for the predictions of object detectors.
1 code implementation • CVPR 2020 • Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu
We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation.
no code implementations • WS 2019 • Peratham Wiriyathammabhum, Abhinav Shrivastava, Vlad I. Morariu, Larry S. Davis
This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos.
2 code implementations • CVPR 2018 • Peng Zhou, Xintong Han, Vlad I. Morariu, Larry S. Davis
Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned.
no code implementations • 2 May 2018 • Xianzhi Du, Mostafa El-Khamy, Vlad I. Morariu, Jungwon Lee, Larry Davis
The classification system further classifies the generated candidates based on opinions of multiple deep verification networks and a fusion network which utilizes a novel soft-rejection fusion method to adjust the confidence in the detection results.
no code implementations • ICCV 2019 • Ruichi Yu, Hongcheng Wang, Ang Li, Jingxiao Zheng, Vlad I. Morariu, Larry S. Davis
We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance.
no code implementations • 29 Mar 2018 • Peng Zhou, Xintong Han, Vlad I. Morariu, Larry S. Davis
We propose a two-stream network for face tampering detection.
no code implementations • CVPR 2018 • Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis
In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the "final response layer" (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power.
no code implementations • CVPR 2018 • Mingfei Gao, Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images.
no code implementations • ECCV 2018 • Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
We introduce count-guided weakly supervised localization (C-WSL), an approach that uses per-class object count as a new form of supervision to improve weakly supervised localization (WSL).
no code implementations • ICCV 2017 • Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis
Understanding visual relationships involves identifying the subject, the object, and a predicate relating them.
1 code implementation • CVPR 2017 • Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis
We present a Deep Convolutional Neural Network architecture which serves as a generic image-to-image regressor that can be trained end-to-end without any further machinery.
1 code implementation • CVPR 2018 • Yaming Wang, Vlad I. Morariu, Larry S. Davis
Compared to earlier multistage frameworks using CNN features, recent end-to-end deep approaches for fine-grained recognition essentially enhance the mid-level learning capability of CNNs.
Ranked #20 on Fine-Grained Image Classification on CUB-200-2011
no code implementations • CVPR 2017 • Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
Since interactions between objects can be reduced to a limited set of atomic spatial relations in 3D, we study the possibility of inferring 3D structure from a text description rather than an image, applying physical relation models to synthesize holistic 3D abstract object layouts satisfying the spatial constraints present in a textual description.
no code implementations • 9 Sep 2016 • Ruichi Yu, Xi Chen, Vlad I. Morariu, Larry S. Davis
We investigate the reasons why context in object detection has limited utility by isolating and evaluating the predictive power of different context cues under ideal conditions in which context provided by an oracle.
1 code implementation • 1 Aug 2016 • Varun K. Nagaraja, Vlad I. Morariu, Larry S. Davis
Our approach uses an LSTM to learn the probability of a referring expression, with input features from a region and a context region.
no code implementations • CVPR 2016 • Yaming Wang, Jonghyun Choi, Vlad I. Morariu, Larry S. Davis
Fine-grained classification involves distinguishing between similar sub-categories based on subtle differences in highly localized regions; therefore, accurate localization of discriminative regions remains a major challenge.
no code implementations • 10 Dec 2015 • Xintong Han, Bharat Singh, Vlad I. Morariu, Larry S. Davis
VRFP is a real-time video retrieval framework based on short text input queries, which obtains weakly labeled training images from the web after the query is known.
no code implementations • 24 Nov 2015 • Varun K. Nagaraja, Vlad I. Morariu, Larry S. Davis
However, we can use structure in the scene to search for objects without processing the entire image.
no code implementations • ICCV 2015 • Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, Larry S. Davis
Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos.
no code implementations • ECCV 2014 • Ang Li, Vlad I. Morariu, Larry S. Davis
Image based geolocation aims to answer the question: where was this ground photograph taken?
no code implementations • NeurIPS 2008 • Vlad I. Morariu, Balaji V. Srinivasan, Vikas C. Raykar, Ramani Duraiswami, Larry S. Davis
To solve the second problem, we present an online tuning approach that results in a black box method that automatically chooses the evaluation method and its parameters to yield the best performance for the input data, desired accuracy, and bandwidth.