Axial Residual Networks for CycleGAN-based Voice Conversion

16 Feb 2021 · Jaeseong You, Gyuhyeon Nam, Dalhyun Kim, Gyeongsu Chae ·

We propose a novel architecture and improved training objectives for non-parallel voice conversion. Our proposed CycleGAN-based model performs a shape-preserving transformation directly on a high frequency-resolution magnitude spectrogram, converting its style (i.e. speaker identity) while preserving the speech content. Throughout the entire conversion process, the model does not resort to compressed intermediate representations of any sort (e.g. mel spectrogram, low resolution spectrogram, decomposed network feature). We propose an efficient axial residual block architecture to support this expensive procedure and various modifications to the CycleGAN losses to stabilize the training process. We demonstrate via experiments that our proposed model outperforms Scyclone and shows a comparable or better performance to that of CycleGAN-VC2 even without employing a neural vocoder.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Voice Conversion

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Axial Attention • Batch Normalization • Convolution • Cycle Consistency Loss • CycleGAN • GAN Least Squares Loss • Instance Normalization • Leaky ReLU • PatchGAN • ReLU • Residual Block • Residual Connection • Sigmoid Activation • Tanh Activation

Edit Social Preview

Axial Residual Networks for CycleGAN-based Voice Conversion

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove