Search Results for author: Chunyang Wu

Found 21 papers, 1 papers with code

Effective internal language model training and fusion for factorized transducer model

no code implementations • 2 Apr 2024 • Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

However, even with the adoption of factorized transducer models, limited improvement has been observed compared to shallow fusion.

Language Modelling

Paper
Add Code

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs

no code implementations • 12 Nov 2023 • Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data.

Question Answering

Paper
Add Code

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

no code implementations • 22 Sep 2023 • Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

End-to-End Speech Recognition Contextualization with Large Language Models

no code implementations • 19 Sep 2023 • Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen

Overall, we demonstrate that by only adding a handful number of trainable parameters via adapters, we can unlock contextualized speech recognition capability for the pretrained LLM while keeping the same text-only input functionality.

Decoder Language Modelling +2

Paper
Add Code

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

no code implementations • 5 Sep 2023 • Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Prompting Large Language Models with Speech Recognition Abilities

no code implementations • 21 Jul 2023 • Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings.

Abstractive Text Summarization Automatic Speech Recognition +3

Paper
Add Code

Towards Selection of Text-to-speech Data to Augment ASR Training

no code implementations • 30 May 2023 • Shuo Liu, Leda Sari, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-Head State Space Model for Speech Recognition

no code implementations • 21 May 2023 • Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches.

Ranked #8 on Speech Recognition on LibriSpeech test-clean

Language Modelling speech-recognition +1

Paper
Add Code

Anchored Speech Recognition with Neural Transducers

no code implementations • 20 Oct 2022 • Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Paper
Add Code

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

no code implementations • 7 Oct 2021 • Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer

This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution.

speech-recognition Speech Recognition

Paper
Add Code

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

no code implementations • 6 Apr 2021 • Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

In order to achieve flexible and better accuracy and latency trade-offs, the following techniques are used.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

no code implementations • 6 Apr 2021 • Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

no code implementations • 5 Apr 2021 • Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

speech-recognition Speech Recognition

Paper
Add Code

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

no code implementations • 3 Nov 2020 • Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications

no code implementations • 27 Oct 2020 • Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao

We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks.

speech-recognition Speech Recognition +1

Paper
Add Code

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition

1 code implementation • 21 Oct 2020 • Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Mike Seltzer

For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3. 01\%$ on test-clean and $7. 09\%$ on test-other.

speech-recognition Speech Recognition

Paper
Code

Weak-Attention Suppression For Transformer Based Speech Recognition

no code implementations • 18 May 2020 • Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

no code implementations • 16 May 2020 • Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang

The memory bankstores the embedding information for all the processed seg-ments.

Paper
Add Code

A machine learning method for the large-scale evaluation of urban visual environment

no code implementations • 11 Aug 2016 • Lun Liu, Hui Wang, Chunyang Wu

Given the size of modern cities in the urbanising age, it is beyond the perceptual capacity of most people to develop a good knowledge about the beauty and ugliness of the city at every street corner.

BIG-bench Machine Learning Cultural Vocal Bursts Intensity Prediction

Paper
Add Code

Chinese Coreference Resolution via Ordered Filtering

no code implementations • WS 2012 • Xiaotian Zhang, Chunyang Wu, Hai Zhao

coreference-resolution

Paper
Add Code

Regression with Phrase Indicators for Estimating MT Quality

no code implementations • WS 2012 • Chunyang Wu, Hai Zhao

Machine Translation regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.