Search Results for author: Zihui Xue

Found 14 papers, 5 papers with code

Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

no code implementations • 11 Mar 2024 • Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman

We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective.

Hallucination Translation

Paper
Add Code

Detours for Navigating Instructional Videos

no code implementations • 3 Jan 2024 • Kumar Ashutosh, Zihui Xue, Tushar Nagarajan, Kristen Grauman

We introduce the video detours problem for navigating instructional videos.

16k Question Answering +2

Paper
Add Code

Learning Object State Changes in Videos: An Open-World Perspective

no code implementations • 19 Dec 2023 • Zihui Xue, Kumar Ashutosh, Kristen Grauman

Object State Changes (OSCs) are pivotal for video understanding.

Video Understanding

Paper
Add Code

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

1 code implementation • 30 Nov 2023 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

290

Paper
Code

Egocentric Video Task Translation @ Ego4D Challenge 2022

no code implementations • 3 Feb 2023 • Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

With no modification to the baseline architectures, our proposed approach achieves competitive performance on two Ego4D challenges, ranking the 1st in the talking to me challenge and the 3rd in the PNR keyframe localization challenge.

Translation

Paper
Add Code

Egocentric Video Task Translation

no code implementations • CVPR 2023 • Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e. g., classifying sports in one dataset, tracking animals in another).

Multi-Task Learning Translation +1

Paper
Add Code

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

2 code implementations • 13 Jun 2022 • Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao

Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications.

Knowledge Distillation Transfer Learning

Paper
Code

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

no code implementations • 5 Apr 2022 • Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao

Multimodal fusion emerges as an appealing technique to improve model performances on many tasks.

Paper
Add Code

Dynamic Multimodal Fusion

1 code implementation • 31 Mar 2022 • Zihui Xue, Radu Marculescu

In this work, we propose dynamic multimodal fusion (DynMM), a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference.

Ranked #43 on Semantic Segmentation on NYU Depth v2

Computational Efficiency Semantic Segmentation +1

Paper
Code

SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

no code implementations • 31 Jan 2022 • Zihui Xue, Yuedong Yang, Mengtian Yang, Radu Marculescu

Graph Neural Networks (GNNs) have demonstrated a great potential in a variety of graph-based applications, such as recommender systems, drug discovery, and object recognition.

Drug Discovery Edge-computing +3

Paper
Add Code

Co-advise: Cross Inductive Bias Distillation

no code implementations • CVPR 2022 • Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao

Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks.

Inductive Bias

Paper
Add Code

What Makes Multi-modal Learning Better than Single (Provably)

no code implementations • NeurIPS 2021 • Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo Huang

The world provides us with data of multiple modalities.

Paper
Add Code

On Feature Decorrelation in Self-Supervised Learning

1 code implementation • ICCV 2021 • Tianyu Hua, Wenxiao Wang, Zihui Xue, Sucheng Ren, Yue Wang, Hang Zhao

In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations.

Representation Learning Self-Supervised Learning

Paper
Code

Multimodal Knowledge Expansion

1 code implementation • ICCV 2021 • Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao

The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data.

Ranked #63 on Semantic Segmentation on NYU Depth v2

Denoising Knowledge Distillation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.