Search Results for author: Ziad Al-Halah

Found 26 papers, 6 papers with code

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

no code implementations • 10 Jul 2023 • Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos.

Audio Denoising Denoising

Paper
Add Code

SpotEM: Efficient Video Search for Episodic Memory

no code implementations • 28 Jun 2023 • Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e. g., "where did I leave my purse?").

Natural Language Queries

Paper
Add Code

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

no code implementations • 18 Jan 2023 • Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee, Michael L. Littman, Sandeep Madireddy, Jorge A. Mendez, Eric Q. Nguyen, Christine D. Piatko, Praveen K. Pilly, Aswin Raghavan, Abrar Rahman, Santhosh Kumar Ramakrishnan, Neale Ratzlaff, Andrea Soltoggio, Peter Stone, Indranil Sur, Zhipeng Tang, Saket Tiwari, Kyle Vedder, Felix Wang, Zifan Xu, Angel Yanguas-Gil, Harel Yedidsion, Shangqun Yu, Gautam K. Vallabha

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed.

Paper
Add Code

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

1 code implementation • CVPR 2023 • Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand.

Data Augmentation Natural Language Queries

Paper
Code

Few-Shot Audio-Visual Learning of Environment Acoustics

no code implementations • 8 Jun 2022 • Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics.

audio-visual learning Room Impulse Response (RIR)

Paper
Add Code

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

no code implementations • CVPR 2022 • Ziad Al-Halah, Santhosh K. Ramakrishnan, Kristen Grauman

In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments.

Transfer Learning Visual Navigation

Paper
Add Code

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

no code implementations • CVPR 2022 • Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'

Navigate

Paper
Add Code

Environment Predictive Coding for Visual Navigation

no code implementations • ICLR 2022 • Santhosh Kumar Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

Move2Hear: Active Audio-Visual Source Separation

no code implementations • ICCV 2021 • Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment.

Audio Source Separation Object

Paper
Add Code

Environment Predictive Coding for Embodied Agents

no code implementations • 3 Feb 2021 • Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.

Self-Supervised Learning

Paper
Add Code

Semantic Audio-Visual Navigation

no code implementations • CVPR 2021 • Changan Chen, Ziad Al-Halah, Kristen Grauman

We propose a transformer-based model to tackle this new semantic AudioGoal task, incorporating an inferred goal descriptor that captures both spatial and semantic properties of the target.

Position Visual Navigation

Paper
Add Code

Modeling Fashion Influence from Photos

no code implementations • 17 Nov 2020 • Ziad Al-Halah, Kristen Grauman

The discovered influence relationships reveal how both cities and brands exert and receive fashion influence for an array of visual styles inferred from the images.

Paper
Add Code

Occupancy Anticipation for Efficient Exploration and Navigation

1 code implementation • ECCV 2020 • Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman

State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent.

Ranked #3 on Robot Navigation on Habitat 2020 Point Nav test-std

Decision Making Efficient Exploration +1

Paper
Code

Learning to Set Waypoints for Audio-Visual Navigation

1 code implementation • ICLR 2021 • Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e. g., a phone ringing in another room).

Visual Navigation

313

Paper
Code

VisualEchoes: Spatial Image Representation Learning through Echolocation

no code implementations • ECCV 2020 • Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman

Several animal species (e. g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world.

Monocular Depth Estimation Representation Learning +2

Paper
Add Code

From Paris to Berlin: Discovering Fashion Style Influences Around the World

1 code implementation • CVPR 2020 • Ziad Al-Halah, Kristen Grauman

The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively.

Paper
Code

SoundSpaces: Audio-Visual Navigation in 3D Environments

2 code implementations • ECCV 2020 • Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.

Navigate Visual Navigation

313

Paper
Code

Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis

no code implementations • 14 Jul 2019 • Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero

Additionally, we introduce a novel emoji representation based on their visual emotional response which supports a deeper understanding of the emoji modality and their usage on social media.

Sentiment Analysis Transfer Learning

Paper
Add Code

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

3 code implementations • CVPR 2021 • Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, Rogerio Feris

We provide a detailed analysis of the characteristics of the Fashion IQ data, and present a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with image features, user feedback, and dialog history, leading to improved performance over the state of the art in dialog-based image retrieval.

Attribute Image Retrieval +1

154

Paper
Code

Traversing the Continuous Spectrum of Image Retrieval with Deep Dynamic Models

no code implementations • 1 Dec 2018 • Ziad Al-Halah, Andreas M. Lehrmann, Leonid Sigal

While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex.

Attribute Continuous Control +2

Paper
Add Code

Informed Democracy: Voting-based Novelty Detection for Action Recognition

no code implementations • 30 Oct 2018 • Alina Roitberg, Ziad Al-Halah, Rainer Stiefelhagen

While it is common in activity recognition to assume a closed-set setting, i. e. test samples are always of training categories, this assumption is impractical in a real-world scenario.

Action Classification Action Recognition +2

Paper
Add Code

Fashion Forward: Forecasting Visual Style in Fashion

no code implementations • ICCV 2017 • Ziad Al-Halah, Rainer Stiefelhagen, Kristen Grauman

What is the future of fashion?

Paper
Add Code

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

no code implementations • CVPR 2017 • Ziad Al-Halah, Rainer Stiefelhagen

Furthermore, we demonstrate that our model outperforms the state-of-the-art in zero-shot learning on three data sets: ImageNet, Animals with Attributes and aPascal/aYahoo.

Attribute Zero-Shot Learning

Paper
Add Code

Relaxed Earth Mover's Distances for Chain- and Tree-connected Spaces and their use as a Loss Function in Deep Learning

no code implementations • 22 Nov 2016 • Manuel Martinez, Monica Haurilet, Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen

The Earth Mover's Distance (EMD) computes the optimal cost of transforming one distribution into another, given a known transport metric between them.

Small Data Image Classification

Paper
Add Code

Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning

no code implementations • CVPR 2016 • Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen

In this work, we aim to carry out attribute-based zero-shot classification in an unsupervised manner.

Attribute Zero-Shot Learning

Paper
Add Code

How to Transfer? Zero-Shot Object Recognition via Hierarchical Transfer of Semantic Attributes

no code implementations • 1 Apr 2016 • Ziad Al-Halah, Rainer Stiefelhagen

We propose to capture these variations in a hierarchical model that expands the knowledge source with additional abstraction levels of attributes.

Attribute Object Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.