no code implementations • 23 Feb 2024 • Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen
Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.
1 code implementation • 13 Jan 2024 • A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen
In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Nov 2023 • Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei zhang, George Saon, Brian Kingsbury
Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data.
no code implementations • 26 Aug 2023 • Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky
The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.
no code implementations • 27 Feb 2023 • George Saon, Ankit Gupta, Xiaodong Cui
We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models.
no code implementations • 16 Jun 2022 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan
We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T).
no code implementations • 29 Mar 2022 • Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata
We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Dec 2021 • Wei zhang, Mingrui Liu, Yu Feng, Xiaodong Cui, Brian Kingsbury, Yuhai Tu
We conduct extensive studies over 18 state-of-the-art DL models/tasks and demonstrate that DPSGD often converges in cases where SSGD diverges for large learning rates in the large batch setting.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 21 Oct 2021 • Xiaodong Cui, Wei zhang, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung
Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 27 Aug 2021 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan
We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 24 Aug 2021 • Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske
By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • ACL 2021 • Wei zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, Fan Zhang
In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness.
no code implementations • 9 Jun 2021 • Wei zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, Fan Zhang
In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness.
no code implementations • NeurIPS 2020 • Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei zhang, Kailash Gopalakrishnan
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.
no code implementations • 8 Feb 2021 • Xiaodong Cui, Songtao Lu, Brian Kingsbury
In this paper, we investigate federated acoustic modeling using data from multiple clients.
Federated Learning Speech Recognition Sound Distributed, Parallel, and Cluster Computing Audio and Speech Processing
no code implementations • 3 Feb 2021 • Mingke Xu, Fan Zhang, Xiaodong Cui, Wei zhang
In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales.
no code implementations • 4 Jan 2021 • Qiye Liu, Le Wang, Ying Fu, Xi Zhang, Lianglong Huang, Huimin Su, Junhao Lin, Xiaobin Chen, Dapeng Yu, Xiaodong Cui, Jia-Wei Mei, Jun-Feng Dai
Mermin-Wagner-Coleman theorem predicts no long-range magnetic order at finite temperature in the two-dimensional (2D) isotropic systems, but a quasi-long-range order with a divergent correlation length at the Kosterlitz-Thouless (KT) transition for planar magnets.
Mesoscale and Nanoscale Physics
no code implementations • NeurIPS 2020 • Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan
In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.
no code implementations • 20 May 2020 • Rui Zhang, Conrad Albrecht, Wei zhang, Xiaodong Cui, Ulrich Finkler, David Kung, Siyuan Lu
Accurately and globally mapping human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc..
no code implementations • 24 Feb 2020 • Xiaodong Cui, Wei zhang, Ulrich Finkler, George Saon, Michael Picheny, David Kung
The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 4 Feb 2020 • Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny
Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks.
no code implementations • ICLR 2020 • Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei zhang, Xiaodong Cui, Payel Das, Tianbao Yang
Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$.
no code implementations • NeurIPS 2019 • Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei zhang, Kailash Gopalakrishnan
Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads.
no code implementations • NeurIPS 2020 • Mingrui Liu, Wei zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das
Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner.
no code implementations • 17 Oct 2019 • Di Chen, Yada Zhu, Xiaodong Cui, Carla P. Gomes
Real-world applications often involve domain-specific and task-based performance objectives that are not captured by the standard machine learning losses, but are critical for decision making.
no code implementations • 9 Aug 2019 • Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon
This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech.
no code implementations • 10 Jul 2019 • Xiaodong Cui, Michael Picheny
In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 10 Jul 2019 • Khoi-Nguyen C. Mac, Xiaodong Cui, Wei zhang, Michael Picheny
In automatic speech recognition (ASR), wideband (WB) and narrowband (NB) speech signals with different sampling rates typically use separate acoustic models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 10 Jul 2019 • Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny
On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 10 Apr 2019 • Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny
We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7. 6% WER on the Hub5- 2000 Switchboard (SWB) test set and a 13. 1% WER on the CallHome (CH) test set.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • NeurIPS 2018 • Xiaodong Cui, Wei zhang, Zoltán Tüske, Michael Picheny
We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks.
no code implementations • 17 Oct 2017 • Xiaodong Cui, Vaibhava Goel, George Saon
An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling.
2 code implementations • NeurIPS 2017 • Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang
To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.
Ranked #24 on Sequential Image Classification on Sequential MNIST
no code implementations • 6 Mar 2017 • George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall
This then raises two issues - what IS human performance, and how far down can we still drive speech recognition error rates?
Ranked #3 on Speech Recognition on Switchboard + Hub500