Search Results for author: Chenxu Hu

Found 6 papers, 3 papers with code

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

no code implementations • 19 Feb 2024 • Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Chenxu Hu, Yang Wang, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities.

Autonomous Driving Scene Understanding

Paper
Add Code

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

1 code implementation • NeurIPS 2023 • Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production.

Audio Synthesis

115

Paper
Code

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory

no code implementations • 6 Jun 2023 • Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, Hang Zhao

Large language models (LLMs) with memory are computationally universal.

Paper
Add Code

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

1 code implementation • CVPR 2023 • Junru Gu, Chenxu Hu, Tianyuan Zhang, Xuanyao Chen, Yilun Wang, Yue Wang, Hang Zhao

In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene.

Autonomous Driving Trajectory Prediction

117

Paper
Code

Neural Dubber: Dubbing for Videos According to Scripts

no code implementations • NeurIPS 2021 • Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao

Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.

Paper
Add Code

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

32 code implementations • ICLR 2021 • Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Knowledge Distillation Speech Synthesis +1

29,788

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.