Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

29 Oct 2022 · Yashish M. Siriwardena, Carol Espy-Wilson, Shihab Shamma ·

Most organisms including humans function by coordinating and integrating sensory signals with motor actions to survive and accomplish desired tasks. Learning these complex sensorimotor mappings proceeds simultaneously and often in an unsupervised or semi-supervised fashion. An autoencoder architecture (MirrorNet) inspired by this sensorimotor learning paradigm is explored in this work to control an articulatory synthesizer, with minimal exposure to ground-truth articulatory data. The articulatory synthesizer takes as input a set of six vocal Tract Variables (TVs) and source features (voicing indicators and pitch) and is able to synthesize continuous speech for unseen speakers. We show that the MirrorNet, once initialized (with ~30 mins of articulatory data) and further trained in unsupervised fashion (`learning phase'), can learn meaningful articulatory representations with comparable accuracy to articulatory speech-inversion systems trained in a completely supervised fashion.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

AutoEncoder

Edit Social Preview

Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove