Search Results for author: Sundararajan Srinivasan

Found 10 papers, 1 papers with code

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

no code implementations • 14 May 2024 • Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories.

Adversarial Robustness Instruction Following +1

Paper
Add Code

SpeechVerse: A Large-scale Generalizable Audio Language Model

no code implementations • 14 May 2024 • Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, David Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions.

Automatic Speech Recognition Benchmarking +4

Paper
Add Code

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

1 code implementation • 1 Nov 2023 • Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.

Automatic Speech Recognition speech-recognition +3

Paper
Code

Speaker Diarization of Scripted Audiovisual Content

no code implementations • 4 Aug 2023 • Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.

speaker-diarization Speaker Diarization +2

Paper
Add Code

Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction

no code implementations • 15 Jun 2023 • Rohit Paturi, Sundararajan Srinivasan, Xiang Li

Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Device Directedness with Contextual Cues for Spoken Dialog Systems

no code implementations • 23 Nov 2022 • Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

no code implementations • 10 Dec 2021 • Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero

Also, most of these models are trained with synthetic mixtures and do not generalize to real conversational data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Representation learning through cross-modal conditional teacher-student training for speech emotion recognition

no code implementations • 30 Nov 2021 • Sundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff

To improve the efficacy of our approach, we propose a novel estimate of the quality of the emotion predictions, to condition teacher-student training.

Emotion Classification Representation Learning +1

Paper
Add Code

Speaker-conversation factorial designs for diarization error analysis

no code implementations • 10 Jun 2021 • Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff

Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex.

Clustering speaker-diarization +1

Paper
Add Code

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning

no code implementations • 10 Mar 2021 • Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan, Duen Horng Chau

Training deep neural networks for automatic speech recognition (ASR) requires large amounts of transcribed speech.

Accented Speech Recognition Automatic Speech Recognition +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.