Search Results for author: Rohit Paturi

Found 6 papers, 1 papers with code

SpeechVerse: A Large-scale Generalizable Audio Language Model

no code implementations14 May 2024 Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, David Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions.

Automatic Speech Recognition Benchmarking +4

Generalized zero-shot audio-to-intent classification

no code implementations4 Nov 2023 Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki

Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2. 75% and 18. 2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.

Classification Goal-Oriented Dialog +5

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

1 code implementation1 Nov 2023 Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.

Automatic Speech Recognition speech-recognition +3

Speaker Diarization of Scripted Audiovisual Content

no code implementations4 Aug 2023 Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.

speaker-diarization Speaker Diarization +2

Cannot find the paper you are looking for? You can Submit a new open access paper.