no code implementations • NoDaLiDa 2021 • Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo
The proposed method is used to improve the speech intelligibility to enhance the children’s speech recognition using an acoustic model trained on adult speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 25 Sep 2023 • Farhad Javanmardi, Saska Tirronen, Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku
Automatic detection and severity level classification of dysarthria directly from acoustic speech signals can be used as a tool in medical diagnosis.
no code implementations • 25 Sep 2023 • Sudarsana Reddy Kadiri, Paavo Alku
From the detection experiments it was observed that the performance achieved with the studied glottal source features is comparable or better than that of conventional MFCCs and perceptual linear prediction (PLP) features.
1 code implementation • 31 Aug 2023 • Dhananjaya Gowda, Sudarsana Reddy Kadiri, Brad Story, Paavo Alku
Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner).
no code implementations • 17 Aug 2023 • Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku
Developing objective methods for assessing the severity of Parkinson's disease (PD) is crucial for improving the diagnosis and treatment.
no code implementations • 17 Aug 2023 • Paavo Alku, Sudarsana Reddy Kadiri, Dhananjaya Gowda
The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis.
no code implementations • 6 Aug 2023 • Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku
Between the features, the pre-trained model-based features showed better classification accuracies, both for speech and NSA inputs compared to the conventional features.
no code implementations • 5 Jan 2022 • Dhananjaya Gowda, Bajibabu Bollepalli, Sudarsana Reddy Kadiri, Paavo Alku
Formant tracking is investigated in this study by using trackers based on dynamic programming (DP) and deep neural nets (DNNs).
no code implementations • 29 Dec 2019 • Thomas Drugman, Paavo Alku, Abeer Alwan, Bayya Yegnanarayana
The great majority of current voice technology applications relies on acoustic features characterizing the vocal tract response, such as the widely used MFCC of LPC parameters.
no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling
Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.
1 code implementation • 8 Apr 2019 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech.
no code implementations • 14 Mar 2019 • Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
The results show that the newly proposed GANs achieve synthesis quality comparable to that of widely-used DNNs, without using an additive noise component.
no code implementations • 30 Oct 2018 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
The state-of-the-art in text-to-speech synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet.
no code implementations • 29 Oct 2018 • Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
Moreover, we experiment with a WaveNet vocoder in synthesis of Lombard speech.
no code implementations • 25 Apr 2018 • Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i. e., generating speech waveforms from acoustic features.
1 code implementation • 3 Apr 2018 • Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis.