Search Results for author: Varun Nagaraja

Found 7 papers, 0 papers with code

On The Open Prompt Challenge In Conditional Audio Generation

no code implementations1 Nov 2023 Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra

Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text.

Audio Generation

FoleyGen: Visually-Guided Audio Generation

no code implementations19 Sep 2023 Xinhao Mei, Varun Nagaraja, Gael Le Lan, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

A prevalent problem in V2A generation is the misalignment of generated audio with the visible actions in the video.

Audio Generation Language Modelling

Enhance audio generation controllability through representation similarity regularization

no code implementations15 Sep 2023 Yangyang Shi, Gael Le Lan, Varun Nagaraja, Zhaoheng Ni, Xinhao Mei, Ernie Chang, Forrest Iandola, Yang Liu, Vikas Chandra

This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training.

Audio Generation Language Modelling +2

Stack-and-Delay: a new codebook pattern for music generation

no code implementations15 Sep 2023 Gael Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest Iandola, Vikas Chandra

In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns.

Language Modelling Music Generation

Collaborative Training of Acoustic Encoders for Speech Recognition

no code implementations16 Jun 2021 Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra

On-device speech recognition requires training models of different sizes for deploying on devices with various computational budgets.

speech-recognition Speech Recognition

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

no code implementations5 Apr 2021 Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.