no code implementations • IWSLT 2016 • Wilfried Michel, Zoltán Tüske, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney
In this paper the RWTH large vocabulary continuous speech recognition (LVCSR) systems developed for the IWSLT-2016 evaluation campaign are described.
no code implementations • 12 Oct 2023 • Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter
Synthetic data generated by text-to-speech (TTS) systems can be used to improve automatic speech recognition (ASR) systems in low-resource or domain mismatch tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Oct 2023 • Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney
In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers.
1 code implementation • 4 Oct 2023 • Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney
We investigate recognition results and additionally Viterbi alignments of our models.
no code implementations • 25 Sep 2023 • Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney
Empirically, we show that ILM subtraction and sequence discriminative training achieve similar effects across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and limited context.
no code implementations • 15 Sep 2023 • Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney
We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks.
no code implementations • 15 Sep 2023 • Peter Vieting, Simon Berger, Thilo von Neumann, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach
This mixture encoder leverages the original overlapped speech to mitigate the effect of artifacts introduced by the speech separation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Aug 2023 • Peter Vieting, Ralf Schlüter, Hermann Ney
In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 21 Jun 2023 • Simon Berger, Peter Vieting, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach
Modular approaches separate speakers and recognize each of them with a single-speaker ASR system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 28 May 2023 • Wei Zhou, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney
Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only.
no code implementations • 3 Mar 2023 • Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe
In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Jan 2023 • Christoph Lüscher, Jingjing Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney
By further adding neural speaker embeddings, we gain additional ~3% relative WER improvement on Hub5'00.
no code implementations • 7 Dec 2022 • Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney
Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Nov 2022 • Wei Zhou, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney
Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training.
1 code implementation • 26 Oct 2022 • Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney
We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 26 Oct 2022 • Peter Vieting, Christoph Lüscher, Julian Dierkes, Ralf Schlüter, Hermann Ney
Unsupervised representation learning has recently helped automatic speech recognition (ASR) to tackle tasks with limited labeled data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 24 Oct 2022 • Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Tina Raissi, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney
Language barriers present a great challenge in our increasingly connected and global world.
no code implementations • 26 Jun 2022 • Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney
In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 Apr 2022 • Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney
In this work, we propose an efficient 3-stage progressive training pipeline to build highly-performing neural transducer models from scratch with very limited computation resources in a reasonable short time period.
no code implementations • 13 Nov 2021 • Yu Qiao, Sourabh Zanwar, Rishab Bhattacharyya, Daniel Wiechmann, Wei Zhou, Elma Kerz, Ralf Schlüter
One of the key communicative competencies is the ability to maintain fluency in monologic speech and the ability to produce sophisticated language to argue a position convincingly.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Nov 2021 • Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney
Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 5 Nov 2021 • Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney
The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 18 Oct 2021 • Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney
We show on the LibriSpeech (LBS) and Switchboard (SWB) corpora that the model scales for a combination of attentionbased encoder-decoder acoustic model and language model can be learned as effectively as with manual tuning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 18 Oct 2021 • Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney
Lastly, it is shown that this technique can be used to effectively perform sequence discriminative training for attention-based encoder-decoder acoustic models on the LibriSpeech task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Oct 2021 • Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney
In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
1 code implementation • 31 May 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney
The peaky behavior of CTC models is well known experimentally.
no code implementations • 21 Apr 2021 • Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney
As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 19 Apr 2021 • Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney
Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 17 Apr 2021 • Yu Qiao, Wei Zhou, Elma Kerz, Ralf Schlüter
In recent years, automated approaches to assessing linguistic complexity in second language (L2) writing have made significant progress in gauging learner performance, predicting human ratings of the quality of learner productions, and benchmarking L2 development.
no code implementations • 13 Apr 2021 • Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney
With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Apr 2021 • Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
no code implementations • 12 Apr 2021 • Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney
We achieve a final word-error-rate of 3. 3%/10. 0% with a hybrid system on the clean/noisy test-sets, surpassing any previous state-of-the-art systems on Librispeech-100h that do not include unlabeled audio data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 9 Apr 2021 • Peter Vieting, Christoph Lüscher, Wilfried Michel, Ralf Schlüter, Hermann Ney
With the success of neural network based modeling in automatic speech recognition (ASR), many studies investigated acoustic modeling and learning of feature extractors directly based on the raw waveform.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 7 Apr 2021 • Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney
We present our transducer model on Librispeech.
Ranked #26 on Speech Recognition on LibriSpeech test-clean (using extra training data)
no code implementations • 30 Mar 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney
We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model.
no code implementations • 24 Nov 2020 • Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney
Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system.
no code implementations • 30 Oct 2020 • Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney
To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling.
no code implementations • 20 May 2020 • Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney
After that, we apply the best norm-scaling setup in combination with various margins and conduct neural language models rescoring experiments in automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 20 May 2020 • Wilfried Michel, Ralf Schlüter, Hermann Ney
This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 May 2020 • Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney
We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training.
1 code implementation • 19 May 2020 • Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney
Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e. g. byte-pair encoding (BPE).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 May 2020 • Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney
In this work, we address a direct phonetic context modeling for the hybrid deep neural network (DNN)/HMM, that does not build on any phone clustering algorithm for the determination of the HMM state inventory.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Apr 2020 • Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney
We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus.
1 code implementation • 19 Dec 2019 • Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney
We achieve improvements of up to 33% relative in word-error-rate (WER) over a strong baseline with data-augmentation in a low-resource environment (LibriSpeech-100h), closing the gap to a comparable oracle experiment by more than 50\%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 20 Nov 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • EMNLP (IWSLT) 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation.
no code implementations • 1 Jul 2019 • Wilfried Michel, Ralf Schlüter, Hermann Ney
This allows for a direct comparison of lattice-based and lattice-free sequence discriminative training criteria such as MMI and sMBR, both using the same language model during training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Jul 2019 • Eugen Beck, Wei Zhou, Ralf Schlüter, Hermann Ney
LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models.
no code implementations • 14 Jun 2019 • Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney
Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8\% relative improvement in word error rate on the NIST Hub5 2000 evaluation test set.
no code implementations • 10 May 2019 • Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney
We explore deep autoregressive Transformer models in language modeling for speech recognition.
no code implementations • 9 May 2019 • Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney
In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
2 code implementations • 8 May 2019 • Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.
Ranked #25 on Speech Recognition on LibriSpeech test-other
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 19 Jun 2018 • Tobias Menne, Ralf Schlüter, Hermann Ney
The proposed adaptation approach is based on the integration of the beamformer, which includes the mask estimation network, and the acoustic model of the ASR system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
14 code implementations • 8 May 2018 • Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney
Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.
Ranked #44 on Speech Recognition on LibriSpeech test-clean (using extra training data)
3 code implementations • 2 Aug 2016 • Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney
In this work we release our extensible and easily configurable neural network training software.
no code implementations • 22 Jun 2016 • Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney
On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1