Search Results for author: Rama Doddipatla

Found 25 papers, 1 papers with code

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

no code implementations • 8 Jan 2024 • Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

We propose a modified teacher-student training for the extraction of frame-wise speaker embeddings that allows for an effective diarization of meeting scenarios containing partially overlapping speech.

Clustering

Paper
Add Code

Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues

no code implementations • 21 Sep 2023 • Norbert Braunschweiler, Rama Doddipatla, Simon Keizer, Svetlana Stoyanchev

Observing that document-grounded response generation via LLMs cannot be adequately assessed by automatic evaluation metrics as they are significantly more verbose, we perform a human evaluation where annotators rate the output of the shared task winning system, the two Chat-GPT variants outputs, and human responses.

Response Generation

Paper
Add Code

Adversarial learning of neural user simulators for dialogue policy optimisation

no code implementations • 1 Jun 2023 • Simon Keizer, Caroline Dockes, Norbert Braunschweiler, Svetlana Stoyanchev, Rama Doddipatla

Reinforcement learning based dialogue policies are typically trained in interaction with a user simulator.

Paper
Add Code

Frame-wise and overlap-robust speaker embeddings for meeting diarization

no code implementations • 1 Jun 2023 • Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

Using a Teacher-Student training approach we developed a speaker embedding extraction system that outputs embeddings at frame rate.

Paper
Add Code

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

no code implementations • 1 Jun 2023 • Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

We introduce a monaural neural speaker embeddings extractor that computes an embedding for each speaker present in a speech mixture.

Paper
Add Code

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition

no code implementations • 24 Apr 2023 • Mohan Li, Rama Doddipatla, Catalin Zorila

In previous works, latency was optimised by truncating the online attention weights based on the hard alignments obtained from conventional ASR models, without taking into account the potential loss of ASR accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

no code implementations • 21 Apr 2023 • Mohan Li, Rama Doddipatla

This paper presents the use of non-autoregressive (NAR) approaches for joint automatic speech recognition (ASR) and spoken language understanding (SLU) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

no code implementations • 29 Jul 2022 • Cong-Thanh Do, Mohan Li, Rama Doddipatla

The multiple-hypothesis approach yields a relative reduction of 3. 3% WER on the CHiME-4's single-channel real noisy evaluation set when compared with the single-hypothesis approach.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition

no code implementations • 9 May 2022 • Catalin Zorila, Rama Doddipatla

Improving the accuracy of single-channel automatic speech recognition (ASR) in noisy conditions is challenging.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

no code implementations • 3 May 2022 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

In this paper, we explore an improved framework to train a monoaural neural enhancement model for robust speech recognition.

Robust Speech Recognition Speech Enhancement +1

Paper
Add Code

Dialogue Strategy Adaptation to New Action Sets Using Multi-dimensional Modelling

no code implementations • 14 Apr 2022 • Simon Keizer, Norbert Braunschweiler, Svetlana Stoyanchev, Rama Doddipatla

A major bottleneck for building statistical spoken dialogue systems for new domains and applications is the need for large amounts of training data.

Dialogue Management Management +2

Paper
Add Code

Transformer-based Streaming ASR with Cumulative Attention

no code implementations • 11 Mar 2022 • Mohan Li, Shucong Zhang, Catalin Zorila, Rama Doddipatla

In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A study on cross-corpus speech emotion recognition and data augmentation

no code implementations • 10 Jan 2022 • Norbert Braunschweiler, Rama Doddipatla, Simon Keizer, Svetlana Stoyanchev

Models trained on mixed corpora can be more stable in mismatched contexts, and the performance reductions range from 1 to 8% when compared with single corpus models in matched conditions.

Cross-corpus Data Augmentation +1

Paper
Add Code

Monaural source separation: From anechoic to reverberant environments

no code implementations • 15 Nov 2021 • Tobias Cord-Landwehr, Christoph Boeddeker, Thilo von Neumann, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach

Impressive progress in neural network-based single-channel speech source separation has been made in recent years.

Paper
Add Code

Towards Handling Unconstrained User Preferences in Dialogue

no code implementations • 17 Sep 2021 • Suraj Pandey, Svetlana Stoyanchev, Rama Doddipatla

A user input to a schema-driven dialogue information navigation system, such as venue search, is typically constrained by the underlying database which restricts the user to specify a predefined set of preferences, or slots, corresponding to the database fields.

Information Retrieval Retrieval

Paper
Add Code

Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

no code implementations • 15 Jun 2021 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

The proposed method first uses mixtures of unseparated sources and the mixture invariant training (MixIT) criterion to train a teacher model.

Speech Separation

Paper
Add Code

Head-synchronous Decoding for Transformer-based Streaming ASR

no code implementations • 26 Apr 2021 • Mohan Li, Catalin Zorila, Rama Doddipatla

Online Transformer-based automatic speech recognition (ASR) systems have been extensively studied due to the increasing demand for streaming applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

no code implementations • 29 Mar 2021 • Cong-Thanh Do, Rama Doddipatla, Thomas Hain

In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

no code implementations • 9 Feb 2021 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

no code implementations • 7 Feb 2021 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments.

Speech Extraction speech-recognition +1

Paper
Add Code

On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

no code implementations • 11 Nov 2020 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

To reduce the influence of reverberation on spatial feature extraction, a dereverberation pre-processing method has been applied to further improve the separation performance.

speech-recognition Speech Recognition +1

Paper
Add Code

Action State Update Approach to Dialogue Management

no code implementations • 9 Nov 2020 • Svetlana Stoyanchev, Simon Keizer, Rama Doddipatla

Utterance interpretation is one of the main functions of a dialogue manager, which is the key component of a dialogue system.

Active Learning Dialogue Management +2

Paper
Add Code

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

1 code implementation • 26 Sep 2019 • Catalin Zorila, Christoph Boeddeker, Rama Doddipatla, Reinhold Haeb-Umbach

Despite the strong modeling power of neural network acoustic models, speech enhancement has been shown to deliver additional word error rate improvements if multi-channel data is available.

Speech Enhancement

107

Paper
Code

Top-down training for neural networks

no code implementations • 25 Sep 2019 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.

speech-recognition Speech Recognition

Paper
Add Code

The USFD Spoken Language Translation System for IWSLT 2014

no code implementations • 13 Sep 2015 • Raymond W. M. Ng, Mortaza Doulaty, Rama Doddipatla, Wilker Aziz, Kashif Shah, Oscar Saz, Madina Hasan, Ghada Alharbi, Lucia Specia, Thomas Hain

The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23. 45 and 14. 75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.