Search Results for author: Ralf Schlüter

Found 56 papers, 10 papers with code

The RWTH Aachen LVCSR system for IWSLT-2016 German Skype conversation recognition task

no code implementations • IWSLT 2016 • Wilfried Michel, Zoltán Tüske, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney

In this paper the RWTH large vocabulary continuous speech recognition (LVCSR) systems developed for the IWSLT-2016 evaluation campaign are described.

Language Modelling speech-recognition +1

Paper
Add Code

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

no code implementations • 12 Oct 2023 • Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter

Synthetic data generated by text-to-speech (TTS) systems can be used to improve automatic speech recognition (ASR) systems in low-resource or domain mismatch tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers

no code implementations • 11 Oct 2023 • Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers.

Paper
Add Code

End-to-End Training of a Neural HMM with Label and Transition Probabilities

1 code implementation • 4 Oct 2023 • Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney

We investigate recognition results and additionally Viterbi alignments of our models.

Paper
Code

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

no code implementations • 25 Sep 2023 • Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Empirically, we show that ILM subtraction and sequence discriminative training achieve similar effects across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and limited context.

Language Modelling Relation +2

Paper
Add Code

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

no code implementations • 15 Sep 2023 • Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks.

Decoder speech-recognition +1

Paper
Add Code

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

no code implementations • 15 Sep 2023 • Peter Vieting, Simon Berger, Thilo von Neumann, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach

This mixture encoder leverages the original overlapped speech to mitigate the effect of artifacts introduced by the speech separation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Comparative Analysis of the wav2vec 2.0 Feature Extractor

no code implementations • 8 Aug 2023 • Peter Vieting, Ralf Schlüter, Hermann Ney

In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Mixture Encoder for Joint Speech Separation and Recognition

no code implementations • 21 Jun 2023 • Simon Berger, Peter Vieting, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach

Modular approaches separate speakers and recognize each of them with a single-speaker ASR system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

no code implementations • 28 May 2023 • Wei Zhou, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney

Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only.

Decoder Sequence-To-Sequence Speech Recognition +1

Paper
Add Code

End-to-End Speech Recognition: A Survey

no code implementations • 3 Mar 2023 • Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe

In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Analyzing And Improving Neural Speaker Embeddings for ASR

no code implementations • 11 Jan 2023 • Christoph Lüscher, Jingjing Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

By further adding neural speaker embeddings, we gain additional ~3% relative WER improvement on Hub5'00.

Speaker Verification

Paper
Add Code

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

no code implementations • 7 Dec 2022 • Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Enhancing and Adversarial: Improve ASR with Speaker Labels

no code implementations • 11 Nov 2022 • Wei Zhou, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney

Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training.

Multi-Task Learning

Paper
Add Code

Monotonic segmental attention for automatic speech recognition

1 code implementation • 26 Oct 2022 • Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney

We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

151

Paper
Code

Efficient Utilization of Large Pre-Trained Models for Low Resource ASR

no code implementations • 26 Oct 2022 • Peter Vieting, Christoph Lüscher, Julian Dierkes, Ralf Schlüter, Hermann Ney

Unsupervised representation learning has recently helped automatic speech recognition (ASR) to tackle tasks with limited labeled data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

no code implementations • 24 Oct 2022 • Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Tina Raissi, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney

Language barriers present a great challenge in our increasingly connected and global world.

Translation

Paper
Add Code

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

no code implementations • 26 Jun 2022 • Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney

In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Efficient Training of Neural Transducer for Speech Recognition

no code implementations • 22 Apr 2022 • Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney

In this work, we propose an efficient 3-stage progressive training pipeline to build highly-performing neural transducer models from scratch with very limited computation resources in a reasonable short time period.

speech-recognition Speech Recognition

Paper
Add Code

Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features

no code implementations • 13 Nov 2021 • Yu Qiao, Sourabh Zanwar, Rishab Bhattacharyya, Daniel Wiechmann, Wei Zhou, Elma Kerz, Ralf Schlüter

One of the key communicative competencies is the ability to maintain fluency in monologic speech and the ability to produce sophisticated language to argue a position convincingly.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Self-Normalized Importance Sampling for Neural Language Modeling

no code implementations • 11 Nov 2021 • Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney

Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Conformer-based Hybrid ASR System for Switchboard Dataset

no code implementations • 5 Nov 2021 • Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney

The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Automatic Learning of Subword Dependent Model Scales

no code implementations • 18 Oct 2021 • Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

We show on the LibriSpeech (LBS) and Switchboard (SWB) corpora that the model scales for a combination of attentionbased encoder-decoder acoustic model and language model can be learned as effectively as with manual tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Efficient Sequence Training of Attention Models using Approximative Recombination

no code implementations • 18 Oct 2021 • Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney

Lastly, it is shown that this technique can be used to effectively perform sequence discriminative training for attention-based encoder-decoder acoustic models on the LibriSpeech task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

On Language Model Integration for RNN Transducer based Speech Recognition

no code implementations • 13 Oct 2021 • Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney

In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework.

Language Modelling speech-recognition +1

Paper
Add Code

Why does CTC result in peaky behavior?

1 code implementation • 31 May 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney

The peaky behavior of CTC models is well known experimentally.

151

Paper
Code

On Sampling-Based Training Criteria for Neural Language Modeling

no code implementations • 21 Apr 2021 • Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney

As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

no code implementations • 19 Apr 2021 • Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney

Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech

no code implementations • 17 Apr 2021 • Yu Qiao, Wei Zhou, Elma Kerz, Ralf Schlüter

In recent years, automated approaches to assessing linguistic complexity in second language (L2) writing have made significant progress in gauging learner performance, predicting human ratings of the quality of learner productions, and benchmarking L2 development.

Benchmarking

Paper
Add Code

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

no code implementations • 13 Apr 2021 • Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

no code implementations • 12 Apr 2021 • Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.

Decoder Language Modelling

Paper
Add Code

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

no code implementations • 12 Apr 2021 • Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

We achieve a final word-error-rate of 3. 3%/10. 0% with a hybrid system on the clean/noisy test-sets, surpassing any previous state-of-the-art systems on Librispeech-100h that do not include unlabeled audio data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

On Architectures and Training for Raw Waveform Feature Extraction in ASR

no code implementations • 9 Apr 2021 • Peter Vieting, Christoph Lüscher, Wilfried Michel, Ralf Schlüter, Hermann Ney

With the success of neural network based modeling in automatic speech recognition (ASR), many studies investigated acoustic modeling and learning of feature extractors directly based on the raw waveform.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Librispeech Transducer Model with Internal Language Model Prior Correction

2 code implementations • 7 Apr 2021 • Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney

We present our transducer model on Librispeech.

Ranked #26 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Language Modelling Sentence +1

349

Paper
Code

A study of latent monotonic attention variants

no code implementations • 30 Mar 2021 • Albert Zeyer, Ralf Schlüter, Hermann Ney

We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model.

Hard Attention speech-recognition +1

Paper
Add Code

Tight Integrated End-to-End Training for Cascaded Speech Translation

no code implementations • 24 Nov 2020 • Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney

Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system.

Decoder Translation

Paper
Add Code

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

no code implementations • 30 Oct 2020 • Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling.

Language Modelling speech-recognition +1

Paper
Add Code

Investigation of Large-Margin Softmax in Neural Language Modeling

no code implementations • 20 May 2020 • Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney

After that, we apply the best norm-scaling setup in combination with various margins and conduct neural language models rescoring experiments in automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Early Stage LM Integration Using Local and Global Log-Linear Combination

no code implementations • 20 May 2020 • Wilfried Michel, Ralf Schlüter, Hermann Ney

This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A New Training Pipeline for an Improved Neural Transducer

1 code implementation • 19 May 2020 • Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training.

151

Paper
Code

A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

1 code implementation • 19 May 2020 • Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney

Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e. g. byte-pair encoding (BPE).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

151

Paper
Code

Context-Dependent Acoustic Modeling without Explicit Phone Clustering

no code implementations • 15 May 2020 • Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney

In this work, we address a direct phonetic context modeling for the hybrid deep neural network (DNN)/HMM, that does not build on any phone clustering algorithm for the determination of the HMM state inventory.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

no code implementations • 2 Apr 2020 • Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney

We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus.

Data Augmentation

Paper
Add Code

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

1 code implementation • 19 Dec 2019 • Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney

We achieve improvements of up to 33% relative in word-error-rate (WER) over a strong baseline with data-augmentation in a low-resource environment (LibriSpeech-100h), closing the gap to a comparable oracle experiment by more than 50\%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

151

Paper
Code

On using 2D sequence-to-sequence models for speech recognition

no code implementations • 20 Nov 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

On Using SpecAugment for End-to-End Speech Translation

no code implementations • EMNLP (IWSLT) 2019 • Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation.

Data Augmentation Translation

Paper
Add Code

Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR

no code implementations • 1 Jul 2019 • Wilfried Michel, Ralf Schlüter, Hermann Ney

This allows for a direct comparison of lattice-based and lattice-free sequence discriminative training criteria such as MMI and sMBR, both using the same language model during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

no code implementations • 1 Jul 2019 • Eugen Beck, Wei Zhou, Ralf Schlüter, Hermann Ney

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models.

Paper
Add Code

Cumulative Adaptation for BLSTM Acoustic Models

no code implementations • 14 Jun 2019 • Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney

Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8\% relative improvement in word error rate on the NIST Hub5 2000 evaluation test set.

Acoustic Modelling Automatic Speech Recognition +4

Paper
Add Code

Language Modeling with Deep Transformers

no code implementations • 10 May 2019 • Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney

We explore deep autoregressive Transformer models in language modeling for speech recognition.

Decoder Language Modelling +2

Paper
Add Code

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

no code implementations • 9 May 2019 • Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation

2 code implementations • 8 May 2019 • Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.

Ranked #25 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

349

Paper
Code

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

no code implementations • 19 Jun 2018 • Tobias Menne, Ralf Schlüter, Hermann Ney

The proposed adaptation approach is based on the integration of the beamformer, which includes the mask estimation network, and the acoustic model of the ASR system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improved training of end-to-end attention models for speech recognition

14 code implementations • 8 May 2018 • Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.

Ranked #44 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Language Modelling Speech Recognition

349

Paper
Code

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

3 code implementations • 2 Aug 2016 • Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney

In this work we release our extensible and easily configurable neural network training software.

349

Paper
Code

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

no code implementations • 22 Jun 2016 • Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney

On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.