Conditioning Sequence-to-sequence Networks with Learned Activations

ICLR 2022 · Alberto Gil Couto Pimentel Ramos, Abhinav Mehrotra, Nicholas Donald Lane, Sourav Bhattacharya ·

Conditional neural networks play an important role in a number of sequence-to-sequence modeling tasks, including personalized sound enhancement (PSE), speaker dependent automatic speech recognition (ASR), and generative modeling such as text-to-speech synthesis. In conditional neural networks, the output of a model is often influenced by a conditioning vector, in addition to the input. Common approaches of conditioning include input concatenation or modulation with the conditioning vector, which comes at the cost of increased model size.In this work, we introduce a novel approach of neural network conditioning by learning intermediate layer activations based on the conditioning vector. We systematically explore and show that learned activations can produce conditional models with comparable or better quality, while having significantly lower sizes, thus making them ideal candidates for resource-efficient on-device deployment. As exemplary target use-cases we consider (i) the task of PSE as a pre-processing technique for improving telephony or pre-trained ASR performance under babble or ambient noise, and (ii) personalized ASR in single speaker scenarios. We find that conditioning via activation learning is an effective modeling strategy, suggesting a broad applicability of the proposed technique across a number of application domains.

PDF Abstract