speech-recognition

999 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find speech-recognition models and implementations
16 papers
7,875
11 papers
44
10 papers
29,251
See all 23 libraries.

Most implemented papers

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

autoliuweijie/FastBERT 8 Mar 2021

Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others.

ISyNet: Convolutional Neural Networks design for AI accelerator

mindspore-ai/models 4 Sep 2021

To address this problem we propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method; and ISyNet - a set of architectures designed to be fast on the specialized neural processing unit (NPU) hardware and accurate at the same time.

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

facebookresearch/salina 3 Apr 2015

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients.

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

TensorSpeech/TensorFlowASR 7 May 2020

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Unsupervised Cross-lingual Representation Learning for Speech Recognition

huggingface/transformers 24 Jun 2020

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

An Overview of Multi-Task Learning in Deep Neural Networks

shenweichen/DeepCTR 15 Jun 2017

Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery.

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

upskyy/Transformer-Transducer 7 Feb 2020

We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

PaddlePaddle/PaddleSpeech 10 Dec 2020

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

microsoft/unilm 26 Oct 2021

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.