Search Results for author: Yiwei Guo

Found 14 papers, 0 papers with code

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

no code implementations • 30 Apr 2024 • Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

We call the attention maps of those heads Alignment-Emerged Attention Maps (AEAMs).

Paper
Add Code

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

no code implementations • 23 Apr 2024 • Sen Liu, Yiwei Guo, Xie Chen, Kai Yu

While acoustic expressiveness has long been studied in expressive text-to-speech (ETTS), the inherent expressiveness in text lacks sufficient attention, especially for ETTS of artistic works.

Paper
Add Code

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

no code implementations • 9 Apr 2024 • Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, HUI ZHANG, Xie Chen, Kai Yu

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

no code implementations • 25 Jan 2024 • Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, HUI ZHANG, Xie Chen, Kai Yu

Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech prompt.

Decoder Hallucination

Paper
Add Code

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

no code implementations • 14 Dec 2023 • Junjie Li, Yiwei Guo, Xie Chen, Kai Yu

Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged.

Position Voice Conversion

Paper
Add Code

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

no code implementations • 2 Nov 2023 • Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu

The LLM selects the best-matching style references from annotated utterances based on external style prompts, which can be raw input text or natural language style descriptions.

Language Modelling Large Language Model +1

Paper
Add Code

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

no code implementations • 19 Sep 2023 • Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.

Data Augmentation Language Modelling +5

Paper
Add Code

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

no code implementations • 10 Sep 2023 • Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu

Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency.

Paper
Add Code

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

no code implementations • 25 Jun 2023 • Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i. e. speaker similarity) and eliminate the accents from their first language(i. e. nativeness).

Speech Synthesis

Paper
Add Code

DiffVoice: Text-to-Speech with Latent Diffusion

no code implementations • 23 Apr 2023 • Zhijun Liu, Yiwei Guo, Kai Yu

In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion.

Paper
Add Code

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

no code implementations • 17 Nov 2022 • Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and \textit{Neutral} is set to $\alpha$ and $1-\alpha$ respectively.

Denoising

Paper
Add Code

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

no code implementations • 2 Apr 2022 • Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu

The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(AM) that predicts acoustic feature from the input transcript and a vocoder that generates waveform according to the given acoustic feature.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Unsupervised word-level prosody tagging for controllable speech synthesis

no code implementations • 15 Feb 2022 • Yiwei Guo, Chenpeng Du, Kai Yu

Although word-level prosody modeling in neural text-to-speech (TTS) has been investigated in recent research for diverse speech synthesis, it is still challenging to control speech synthesis manually without a specific reference.

Speech Synthesis

Paper
Add Code

GlobalWalk: Learning Global-aware Node Embeddings via Biased Sampling

no code implementations • 22 Jan 2022 • Zhengrong Xue, Ziao Guo, Yiwei Guo

Popular node embedding methods such as DeepWalk follow the paradigm of performing random walks on the graph, and then requiring each node to be proximate to those appearing along with it.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.