Neutral TTS Female Voice Corpus in Brazilian Portuguese
This paper introduces a new dataset designed to address the limitations in high-quality, diverse and representative datasets for training text-to-speech (TTS) models, specifically for female voices in Brazilian Portuguese. The dataset features a female voice recorded in a professional and controlled environment with neutral emotion and comprises more than 20 hours of recordings. The goal is to facilitate transfer learning and enable the development of more natural-sounding, high-quality, and gender-balanced TTS systems. Alongside the dataset, gender-aware voice transfer experiments are performed to understand the impact of utilizing gender-specific pretrained models for speech synthesis. The results obtained show that same-gender voice transfer yields better speech similarity and intelligibility when compared to cross-gender transfer, emphasizing the importance of gender-aware training procedures and highlighting the need for balanced gender data.
PDFDatasets
Introduced in the Paper:
GneutralSpeech Female