Non-Parallel Training Approach for Emotional Voice Conversion Using CycleGAN
The focus of this research is proposing a nonparallel emotional voice conversion for Egyptian Arabic speech. This method aims to change emotion-related features of a speech signal without changing its lexical content or speaker identity. We relied on the assumption that any speech signal can be divided into content and style code and the conversion between different emotion domains is done by combining the target style code with the content code of the input speech signal. We evaluated the model using an Egyptian Arabic dataset covering two emotion domains and the conversion results were successful depending on a survey conducted on random people. Our purpose is to produce a state-of-the-art pre-trained model as it will be an unprecedented model in the Egyptian Arabic language as far as we are concerned.
PDF