Search Results for author: Wenke Xia

Found 7 papers, 6 papers with code

Learning Manipulation by Predicting Interaction

1 code implementation • 1 Jun 2024 • Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation. Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively.

Representation Learning

Paper
Code

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

no code implementations • 30 May 2024 • Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li

In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning.

Instruction Following Robot Manipulation +1

Paper
Add Code

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

2 code implementations • 6 Nov 2023 • Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

Generalizable articulated object manipulation is essential for home-assistant robots.

Imitation Learning In-Context Learning +1

Paper
Code

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

1 code implementation • 16 Apr 2023 • Wenke Xia, Xingjian Li, Andong Deng, Haoyi Xiong, Dejing Dou, Di Hu

However, such semantic consistency from the synchronization is hard to guarantee in unconstrained videos, due to the irrelevant modality noise and differentiated semantic correlation.

Action Recognition Audio Tagging +3

Paper
Code

Balanced Audiovisual Dataset for Imbalance Analysis

1 code implementation • 14 Feb 2023 • Wenke Xia, Xu Zhao, Xincheng Pang, Changqing Zhang, Di Hu

We surprisingly find that: the multimodal models with existing imbalance algorithms consistently perform worse than the unimodal one on specific subsets, in accordance with the modality bias.

207

Paper
Code

Revisiting Pre-training in Audio-Visual Learning

1 code implementation • 7 Feb 2023 • Ruoxuan Feng, Wenke Xia, Di Hu

Specifically, we explore the effects of pre-trained models on two audio-visual learning scenarios: cross-modal initialization and multi-modal joint learning.

audio-visual learning

Paper
Code

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

1 code implementation • 14 Jan 2023 • Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu

Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall.

Knowledge Graphs

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.