no code implementations • 8 Nov 2023 • Yuliang Li, Nitin Kamra, Ruta Desai, Alon Halevy
The vision of creating AI-powered personal assistants also involves creating structured outputs, such as a plan for one's day, or for an overseas trip.
3 code implementations • 19 Oct 2023 • Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi
We present Habitat 3. 0: a simulation platform for studying collaborative human-robot tasks in home environments.
no code implementations • 31 May 2023 • Andrew Szot, Unnat Jain, Dhruv Batra, Zsolt Kira, Ruta Desai, Akshara Rai
We present the task of "Social Rearrangement", consisting of cooperative everyday tasks like setting up the dinner table, tidying a house or unpacking groceries in a simulated multi-agent environment.
1 code implementation • ICCV 2023 • Dhruvesh Patel, Hamid Eghbalzadeh, Nitin Kamra, Michael Louis Iuzzolino, Unnat Jain, Ruta Desai
Given a succinct natural language goal, e. g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i. e., a sequence of actions such as "sand shelf", "paint shelf", etc.
1 code implementation • ICCV 2023 • Rishi Hazra, Brian Chen, Akshara Rai, Nitin Kamra, Ruta Desai
The goal in EgoTV is to verify the execution of tasks from egocentric videos based on the natural language description of these tasks.
no code implementations • 24 Jan 2023 • Engin Tekin, Elaheh Barati, Nitin Kamra, Ruta Desai
This requires efficient trade-offs between exploration of the environment and planning for rearrangement, which is challenging because of long-horizon nature of the problem.
no code implementations • 11 Jan 2023 • Weichao Mao, Ruta Desai, Michael Louis Iuzzolino, Nitin Kamra
Given video demonstrations and paired narrations of an at-home procedural task such as changing a tire, we present an approach to extract the underlying task structure -- relevant actions and their temporal dependencies -- via action-centric task graphs.
no code implementations • 14 Dec 2022 • Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e. g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e. g. a robotic manipulator in a simulated kitchen.
no code implementations • 7 Oct 2022 • Meera Hahn, Kevin Carlberg, Ruta Desai, James Hillis
We introduce a novel interface for large scale collection of human memory and assistance.
no code implementations • CVPR 2022 • Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh
Towards that end, we introduce (1) a new task - Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of grounded questions designed to probe the agent's spatio-temporal understanding of the tour, and (3) a model for the task that encodes the scene as an allocentric, top-down semantic feature map and grounds the question into the map to localize the answer.
no code implementations • 4 Oct 2021 • Satoshi Tsutsui, Ruta Desai, Karl Ridgeway
We are particularly interested in learning egocentric video representations benefiting from the head-motion generated by users' daily activities, which can be easily obtained from IMU sensors embedded in AR/VR devices.
no code implementations • 14 Oct 2020 • Benjamin Newman, Kevin Carlberg, Ruta Desai
We introduce a novel framework for computing and displaying AR assistance that consists of (1) associating an optimal action sequence with the policy of an embodied agent and (2) presenting this sequence to the user as suggestions in the AR system's heads-up display.