no code implementations • 10 Jul 2023 • Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos.
no code implementations • 28 Jun 2023 • Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman
The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e. g., "where did I leave my purse?").
no code implementations • 18 Jan 2023 • Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee, Michael L. Littman, Sandeep Madireddy, Jorge A. Mendez, Eric Q. Nguyen, Christine D. Piatko, Praveen K. Pilly, Aswin Raghavan, Abrar Rahman, Santhosh Kumar Ramakrishnan, Neale Ratzlaff, Andrea Soltoggio, Peter Stone, Indranil Sur, Zhipeng Tang, Saket Tiwari, Kyle Vedder, Felix Wang, Zifan Xu, Angel Yanguas-Gil, Harel Yedidsion, Shangqun Yu, Gautam K. Vallabha
Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed.
1 code implementation • CVPR 2023 • Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman
Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand.
no code implementations • 8 Jun 2022 • Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics.
no code implementations • CVPR 2022 • Ziad Al-Halah, Santhosh K. Ramakrishnan, Kristen Grauman
In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments.
no code implementations • CVPR 2022 • Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman
We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'
no code implementations • ICLR 2022 • Santhosh Kumar Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.
no code implementations • ICCV 2021 • Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment.
no code implementations • 3 Feb 2021 • Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.
no code implementations • CVPR 2021 • Changan Chen, Ziad Al-Halah, Kristen Grauman
We propose a transformer-based model to tackle this new semantic AudioGoal task, incorporating an inferred goal descriptor that captures both spatial and semantic properties of the target.
no code implementations • 17 Nov 2020 • Ziad Al-Halah, Kristen Grauman
The discovered influence relationships reveal how both cities and brands exert and receive fashion influence for an array of visual styles inferred from the images.
1 code implementation • ECCV 2020 • Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman
State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent.
Ranked #3 on Robot Navigation on Habitat 2020 Point Nav test-std
1 code implementation • ICLR 2021 • Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e. g., a phone ringing in another room).
no code implementations • ECCV 2020 • Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman
Several animal species (e. g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world.
1 code implementation • CVPR 2020 • Ziad Al-Halah, Kristen Grauman
The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively.
2 code implementations • ECCV 2020 • Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.
no code implementations • 14 Jul 2019 • Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero
Additionally, we introduce a novel emoji representation based on their visual emotional response which supports a deeper understanding of the emoji modality and their usage on social media.
3 code implementations • CVPR 2021 • Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, Rogerio Feris
We provide a detailed analysis of the characteristics of the Fashion IQ data, and present a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with image features, user feedback, and dialog history, leading to improved performance over the state of the art in dialog-based image retrieval.
no code implementations • 1 Dec 2018 • Ziad Al-Halah, Andreas M. Lehrmann, Leonid Sigal
While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex.
no code implementations • 30 Oct 2018 • Alina Roitberg, Ziad Al-Halah, Rainer Stiefelhagen
While it is common in activity recognition to assume a closed-set setting, i. e. test samples are always of training categories, this assumption is impractical in a real-world scenario.
no code implementations • ICCV 2017 • Ziad Al-Halah, Rainer Stiefelhagen, Kristen Grauman
What is the future of fashion?
no code implementations • CVPR 2017 • Ziad Al-Halah, Rainer Stiefelhagen
Furthermore, we demonstrate that our model outperforms the state-of-the-art in zero-shot learning on three data sets: ImageNet, Animals with Attributes and aPascal/aYahoo.
no code implementations • 22 Nov 2016 • Manuel Martinez, Monica Haurilet, Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen
The Earth Mover's Distance (EMD) computes the optimal cost of transforming one distribution into another, given a known transport metric between them.
no code implementations • CVPR 2016 • Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen
In this work, we aim to carry out attribute-based zero-shot classification in an unsupervised manner.
no code implementations • 1 Apr 2016 • Ziad Al-Halah, Rainer Stiefelhagen
We propose to capture these variations in a hierarchical model that expands the knowledge source with additional abstraction levels of attributes.