TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Speech Recognition	TIMIT	vq-wav2vec	Percentage error	11.6	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vq-wav2vec-self-supervised-learning-of-1/speech-recognition-on-timit)](https://paperswithcode.com/sota/speech-recognition-on-timit?p=vq-wav2vec-self-supervised-learning-of-1)`

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

ICLR 2020 · Alexei Baevski, Steffen Schneider, Michael Auli ·

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.