no code implementations • RaPID (LREC) 2022 • Birger Moell, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris, Joakim Gustafson, Jonas Beskow
As part of the PSST challenge, we explore how data augmentations, data sources, and model size affect phoneme transcription accuracy on speech produced by individuals with aphasia.
no code implementations • LREC 2022 • Siyang Wang, Joakim Gustafson, Éva Székely
Perceptual results show little difference between compared filler insertion models including with ground-truth, which may be due to the ambiguity of what is good filler insertion and a strong neural spontaneous TTS that produces natural speech irrespective of input.
no code implementations • 11 Jul 2023 • Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech.
no code implementations • 29 May 2023 • Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze
Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues.
no code implementations • 5 Mar 2023 • Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Recent work has explored using self-supervised learning (SSL) speech representations such as wav2vec2. 0 as the representation medium in standard two-stage TTS, in place of conventionally used mel-spectrograms.
no code implementations • 24 Nov 2022 • Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS.
1 code implementation • 25 Aug 2021 • Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely
Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline.
no code implementations • LREC 2020 • Eva Szekely, Jens Edlund, Joakim Gustafson
Spontaneous speech is emergent and transient, whereas text read out loud is pre-planned.
no code implementations • LREC 2020 • Dimosthenis Kontogiorgos, Elena Sibirtseva, Joakim Gustafson
In this paper, we introduce a multimodal dataset in which subjects are instructing each other how to assemble IKEA furniture.
no code implementations • 5 Sep 2017 • Patrik Jonell, Joseph Mendelson, Thomas Storskog, Goran Hagman, Per Ostberg, Iolanda Leite, Taras Kucherenko, Olga Mikheeva, Ulrika Akenine, Vesna Jelic, Alina Solomon, Jonas Beskow, Joakim Gustafson, Miia Kivipelto, Hedvig Kjellstrom
This paper presents the EACare project, an ambitious multi-disciplinary collaboration with the aim to develop an embodied system, capable of carrying out neuropsychological tests to detect early signs of dementia, e. g., due to Alzheimer's disease.
no code implementations • LREC 2016 • Jens Edlund, Joakim Gustafson
In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP).