1 code implementation • 6 Jun 2024 • Zilu Guo, Liuyang Bian, Xuan Huang, Hu Wei, Jingyu Li, Huasheng Ni
Following these guidelines, we propose DSNet, a Dual-Branch CNN architecture, which incorporates atrous convolutions in the shallow layers of the model architecture, as well as pretraining the nearly entire encoder on ImageNet to achieve better performance.
1 code implementation • 27 May 2024 • Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui
In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 24 May 2024 • Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang
To overcome these challenges, we introduce a novel quality-aware masked diffusion transformer (QA-MDT) approach that enables generative models to discern the quality of input music waveform during training.
Ranked #1 on Text-to-Music Generation on MusicCaps
no code implementations • 17 Sep 2023 • Zilu Guo, Jun Du, Chin-Hui Lee
The starting state is noisy speech and the ending state is clean speech.
1 code implementation • 14 Jun 2023 • Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang
The goal of this study is to implement diffusion models for speech enhancement (SE).
no code implementations • 4 Dec 2021 • Longtian Qiu, Renrui Zhang, Ziyu Guo, Ziyao Zeng, Zilu Guo, Yafeng Li, Guangnan Zhang
Contrastive Language-Image Pre-training (CLIP) has drawn increasing attention recently for its transferable visual representation learning.