no code implementations • 13 Dec 2023 • Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal
We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 May 2023 • Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He
With the large-scale training data, we obtain a 2-bit Conformer model with over 40% model size reduction against the 4-bit version at the cost of 17% relative word error rate degradation
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 24 May 2023 • David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He
With the rapid increase in the size of neural networks, model compression has become an important area of research.
no code implementations • 2 Feb 2023 • Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh
Recent innovations on hardware (e. g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference.
no code implementations • 25 Oct 2022 • Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal
We present a streaming-based approach to produce an acceptable delay, with minimal loss in speech conversion quality, when compared to a reference state of the art non-streaming approach.
1 code implementation • 29 Mar 2022 • Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov
Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Mar 2022 • Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno
We also show that learning a speaker-embedding space can scale further and reduce the amount of personalization training data required per speaker.
1 code implementation • 1 Mar 2022 • Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy
We present two methods of real time magnitude spectrogram inversion: streaming Griffin Lim(GL) and streaming MelGAN.
4 code implementations • 7 May 2021 • Amirali Abdolrashidi, Lisa Wang, Shivani Agrawal, Jonathan Malmaud, Oleg Rybakov, Chas Leichner, Lukasz Lew
In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves.
3 code implementations • 14 May 2020 • Oleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirko Visontai, Stella Laurenzo
In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones.
Ranked #10 on Keyword Spotting on Google Speech Commands
Audio and Speech Processing Sound
no code implementations • 19 Oct 2018 • Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, Shiv Naga Prasad Vitaladevuni
We investigate low-bit quantization to reduce computational cost of deep neural network (DNN) based keyword spotting (KWS).
no code implementations • ICLR 2018 • Oleg Rybakov, Vijai Mohan, Avishkar Misra, Scott LeGrand, Rejith Joseph, Kiuk Chung, Siddharth Singh, Qian You, Eric Nalisnick, Leo Dirac, Runfei Luo
We present a personalized recommender system using neural network for recommending products, such as eBooks, audio-books, Mobile Apps, Video and Music.