Search Results for author: Ruta Desai

Found 12 papers, 3 papers with code

Human-Centered Planning

no code implementations • 8 Nov 2023 • Yuliang Li, Nitin Kamra, Ruta Desai, Alon Halevy

The vision of creating AI-powered personal assistants also involves creating structured outputs, such as a plan for one's day, or for an overseas trip.

Paper
Add Code

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

3 code implementations • 19 Oct 2023 • Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

We present Habitat 3. 0: a simulation platform for studying collaborative human-robot tasks in home environments.

Social Navigation

2,393

Paper
Code

Adaptive Coordination in Social Embodied Rearrangement

no code implementations • 31 May 2023 • Andrew Szot, Unnat Jain, Dhruv Batra, Zsolt Kira, Ruta Desai, Akshara Rai

We present the task of "Social Rearrangement", consisting of cooperative everyday tasks like setting up the dinner table, tidying a house or unpacking groceries in a simulated multi-agent environment.

Paper
Add Code

Pretrained Language Models as Visual Planners for Human Assistance

1 code implementation • ICCV 2023 • Dhruvesh Patel, Hamid Eghbalzadeh, Nitin Kamra, Michael Louis Iuzzolino, Unnat Jain, Ruta Desai

Given a succinct natural language goal, e. g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i. e., a sequence of actions such as "sand shelf", "paint shelf", etc.

Action Segmentation Language Modelling

Paper
Code

EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

1 code implementation • ICCV 2023 • Rishi Hazra, Brian Chen, Akshara Rai, Nitin Kamra, Ruta Desai

The goal in EgoTV is to verify the execution of tasks from egocentric videos based on the natural language description of these tasks.

Paper
Code

Effective Baselines for Multiple Object Rearrangement Planning in Partially Observable Mapped Environments

no code implementations • 24 Jan 2023 • Engin Tekin, Elaheh Barati, Nitin Kamra, Ruta Desai

This requires efficient trade-offs between exploration of the environment and planning for rearrangement, which is challenging because of long-horizon nature of the problem.

Object Object Recognition

Paper
Add Code

Action Dynamics Task Graphs for Learning Plannable Representations of Procedural Tasks

no code implementations • 11 Jan 2023 • Weichao Mao, Ruta Desai, Michael Louis Iuzzolino, Nitin Kamra

Given video demonstrations and paired narrations of an at-home procedural task such as changing a tire, we present an approach to extract the underlying task structure -- relevant actions and their temporal dependencies -- via action-centric task graphs.

Task 2

Paper
Add Code

Cross-Domain Transfer via Semantic Skill Imitation

no code implementations • 14 Dec 2022 • Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai

We propose an approach for semantic imitation, which uses demonstrations from a source domain, e. g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e. g. a robotic manipulator in a simulated kitchen.

Reinforcement Learning (RL) Robot Manipulation

Paper
Add Code

Learning a Visually Grounded Memory Assistant

no code implementations • 7 Oct 2022 • Meera Hahn, Kevin Carlberg, Ruta Desai, James Hillis

We introduce a novel interface for large scale collection of human memory and assistance.

Paper
Add Code

Episodic Memory Question Answering

no code implementations • CVPR 2022 • Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh

Towards that end, we introduce (1) a new task - Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of grounded questions designed to probe the agent's spatio-temporal understanding of the tour, and (3) a model for the task that encodes the scene as an allocentric, top-down semantic feature map and grounds the question into the map to localize the answer.

Question Answering

Paper
Add Code

How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors

no code implementations • 4 Oct 2021 • Satoshi Tsutsui, Ruta Desai, Karl Ridgeway

We are particularly interested in learning egocentric video representations benefiting from the head-motion generated by users' daily activities, which can be easily obtained from IMU sensors embedded in AR/VR devices.

Representation Learning Self-Supervised Learning

Paper
Add Code

Optimal Assistance for Object-Rearrangement Tasks in Augmented Reality

no code implementations • 14 Oct 2020 • Benjamin Newman, Kevin Carlberg, Ruta Desai

We introduce a novel framework for computing and displaying AR assistance that consists of (1) associating an optimal action sequence with the policy of an embodied agent and (2) presenting this sequence to the user as suggestions in the AR system's heads-up display.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.