no code implementations • 5 Mar 2024 • Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao
Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.
no code implementations • 22 Jan 2024 • Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li
To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.
no code implementations • 8 May 2023 • Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li
Both objective and subjective evaluation results show that the accented TTS front-end fine-tuned with a small accented phonetic lexicon (5k words) effectively handles the phonetic variation of accents, while the accented TTS acoustic model fine-tuned with a limited amount of accented speech data (approximately 3 minutes) effectively improves the prosodic rendering including pitch and duration.
no code implementations • 20 Dec 2022 • Yi Zhou, Zhizheng Wu, Mingyang Zhang, Xiaohai Tian, Haizhou Li
Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speech data.
1 code implementation • 1 Dec 2022 • Ruibin Yuan, Hanzhi Yin, Yi Wang, Yifan He, Yushi Ye, Lei Zhang, Zhizheng Wu
We apply this technique to the automatic speaker verification (ASV) task as a proof of concept.
no code implementations • 12 Apr 2019 • Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu
When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded.
no code implementations • 22 Feb 2016 • Zhizheng Wu, Simon King
We propose two novel techniques --- stacking bottleneck features and minimum generation error training criterion --- to improve the performance of deep neural network (DNN)-based speech synthesis.
no code implementations • 9 Feb 2016 • Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li
To simulate the real-life scenarios, we perform a preliminary investigation of spoofing detection under additive noisy conditions, and also describe an initial database for this task.
no code implementations • 11 Jan 2016 • Zhizheng Wu, Simon King
Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS).