XLSR

Introduced by Conneau et al. in Unsupervised Cross-lingual Representation Learning for Speech Recognition

XLSR is a multilingual speech recognition model built on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations and jointly learns a quantization of the latents shared across languages. The model is fine-tuned on labeled data and experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining. A shared quantization module over feature encoder representations produces multilingual quantized speech units whose embeddings are then used as targets for a Transformer trained by contrastive learning. The model learns to share discrete tokens across languages, creating bridges across languages.

Source: Unsupervised Cross-lingual Representation Learning for Speech Recognition

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Recognition	13	30.23%
Automatic Speech Recognition (ASR)	7	16.28%
Language Modelling	4	9.30%
Translation	3	6.98%
Language Identification	2	4.65%
Spoken language identification	2	4.65%
Cross-Lingual Transfer	2	4.65%
Retrieval	1	2.33%
Voice Conversion	1	2.33%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Speech Recognition