Search Results for author: Xiaodong Cui

Found 34 papers, 3 papers with code

Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis

no code implementations • 23 Feb 2024 • Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.

Binary Classification In-Context Learning

Paper
Add Code

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

1 code implementation • 13 Jan 2024 • A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Soft Random Sampling: A Theoretical and Empirical Analysis

no code implementations • 21 Nov 2023 • Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei zhang, George Saon, Brian Kingsbury

Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

no code implementations • 26 Aug 2023 • Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky

The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.

Passage Retrieval Retrieval

Paper
Add Code

Diagonal State Space Augmented Transformers for Speech Recognition

no code implementations • 27 Feb 2023 • George Saon, Ankit Gupta, Xiaodong Cui

We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models.

speech-recognition Speech Recognition

Paper
Add Code

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

no code implementations • 16 Jun 2022 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan

We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T).

Language Modelling Model Compression +1

Paper
Add Code

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

no code implementations • 29 Mar 2022 • Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata

We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent

no code implementations • 2 Dec 2021 • Wei zhang, Mingrui Liu, Yu Feng, Xiaodong Cui, Brian Kingsbury, Yuhai Tu

We conduct extensive studies over 18 state-of-the-art DL models/tasks and demonstrate that DPSGD often converges in cases where SSGD diverges for large learning rates in the large batch setting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Asynchronous Decentralized Distributed Training of Acoustic Models

no code implementations • 21 Oct 2021 • Xiaodong Cui, Wei zhang, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung

Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

4-bit Quantization of LSTM-based Speech Recognition Models

no code implementations • 27 Aug 2021 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

no code implementations • 24 Aug 2021 • Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske

By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation

no code implementations • ACL 2021 • Wei zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, Fan Zhang

In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness.

Paper
Add Code

On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation

no code implementations • 9 Jun 2021 • Wei zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, Fan Zhang

Paper
Add Code

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

no code implementations • NeurIPS 2020 • Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei zhang, Kailash Gopalakrishnan

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.

Paper
Add Code

Federated Acoustic Modeling For Automatic Speech Recognition

no code implementations • 8 Feb 2021 • Xiaodong Cui, Songtao Lu, Brian Kingsbury

In this paper, we investigate federated acoustic modeling using data from multiple clients.

Federated Learning Speech Recognition Sound Distributed, Parallel, and Cluster Computing Audio and Speech Processing

Paper
Add Code

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

no code implementations • 3 Feb 2021 • Mingke Xu, Fan Zhang, Xiaodong Cui, Wei zhang

In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales.

Data Augmentation Speech Emotion Recognition

Paper
Add Code

Probing quasi-long-range ordering by magnetostriction in monolayer CoPS3

no code implementations • 4 Jan 2021 • Qiye Liu, Le Wang, Ying Fu, Xi Zhang, Lianglong Huang, Huimin Su, Junhao Lin, Xiaobin Chen, Dapeng Yu, Xiaodong Cui, Jia-Wei Mei, Jun-Feng Dai

Mermin-Wagner-Coleman theorem predicts no long-range magnetic order at finite temperature in the two-dimensional (2D) isotropic systems, but a quasi-long-range order with a divergent correlation length at the Kosterlitz-Thouless (KT) transition for planar magnets.

Mesoscale and Nanoscale Physics

Paper
Add Code

Ultra-Low Precision 4-bit Training of Deep Neural Networks

no code implementations • NeurIPS 2020 • Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan

In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.

Quantization

Paper
Add Code

Map Generation from Large Scale Incomplete and Inaccurate Data Labels

no code implementations • 20 May 2020 • Rui Zhang, Conrad Albrecht, Wei zhang, Xiaodong Cui, Ulrich Finkler, David Kung, Siyuan Lu

Accurately and globally mapping human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc..

Disaster Response Management

Paper
Add Code

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

no code implementations • 24 Feb 2020 • Xiaodong Cui, Wei zhang, Ulrich Finkler, George Saon, Michael Picheny, David Kung

The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Efficiency in Large-Scale Decentralized Distributed Training

no code implementations • 4 Feb 2020 • Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks.

speech-recognition Speech Recognition

Paper
Add Code

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

no code implementations • ICLR 2020 • Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei zhang, Xiaodong Cui, Payel Das, Tianbao Yang

Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$.

Paper
Add Code

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

no code implementations • NeurIPS 2019 • Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei zhang, Kailash Gopalakrishnan

Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads.

Image Classification object-detection +1

Paper
Add Code

A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

no code implementations • NeurIPS 2020 • Mingrui Liu, Wei zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das

Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner.

Paper
Add Code

Task-Based Learning via Task-Oriented Prediction Network with Applications in Finance

no code implementations • 17 Oct 2019 • Di Chen, Yada Zhu, Xiaodong Cui, Carla P. Gomes

Real-world applications often involve domain-specific and task-based performance objectives that are not captured by the standard machine learning losses, but are critical for decision making.

Decision Making

Paper
Add Code

Challenging the Boundaries of Speech Recognition: The MALACH Corpus

no code implementations • 9 Aug 2019 • Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech.

speech-recognition Speech Recognition

Paper
Add Code

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

no code implementations • 10 Jul 2019 • Xiaodong Cui, Michael Picheny

In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

no code implementations • 10 Jul 2019 • Khoi-Nguyen C. Mac, Xiaodong Cui, Wei zhang, Michael Picheny

In automatic speech recognition (ASR), wideband (WB) and narrowband (NB) speech signals with different sampling rates typically use separate acoustic models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

no code implementations • 10 Jul 2019 • Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny

On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Distributed Deep Learning Strategies For Automatic Speech Recognition

no code implementations • 10 Apr 2019 • Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny

We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7. 6% WER on the Hub5- 2000 Switchboard (SWB) test set and a 13. 1% WER on the CallHome (CH) test set.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

1 code implementation • NeurIPS 2018 • Xiaodong Cui, Wei zhang, Zoltán Tüske, Michael Picheny

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks.

Evolutionary Algorithms Language Modelling +2

Paper
Code

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

no code implementations • 17 Oct 2017 • Xiaodong Cui, Vaibhava Goel, George Saon

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling.

speech-recognition Speech Recognition

Paper
Add Code

Dilated Recurrent Neural Networks

2 code implementations • NeurIPS 2017 • Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.

Ranked #24 on Sequential Image Classification on Sequential MNIST

Sequential Image Classification

342

Paper
Code

English Conversational Telephone Speech Recognition by Humans and Machines

no code implementations • 6 Mar 2017 • George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall

This then raises two issues - what IS human performance, and how far down can we still drive speech recognition error rates?

Ranked #3 on Speech Recognition on Switchboard + Hub500

Language Modelling Multi-Task Learning +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.