no code implementations • 8 Jul 2022 • Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke
Speaker identification (SID) in the household scenario (e. g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances.
no code implementations • 7 Feb 2022 • Jie Pu, Yixiong Meng, Oguz Elibol
The diversity of speaker profiles in multi-speaker TTS systems is a crucial aspect of its performance, as it measures how many different speaker profiles TTS systems could possibly synthesize.
no code implementations • 14 Jun 2021 • Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo
Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3