TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Audio Super-Resolution	VCTK Multi-Speaker	CMGAN	Log-Spectral Distance	0.76	# 1
Speech Enhancement	VoiceBank + DEMAND	CMGAN	PESQ	3.41	# 5
Speech Enhancement	VoiceBank + DEMAND	CMGAN	CSIG	4.63	# 4
Speech Enhancement	VoiceBank + DEMAND	CMGAN	CBAK	3.94	# 3
Speech Enhancement	VoiceBank + DEMAND	CMGAN	COVL	4.12	# 4
Speech Enhancement	VoiceBank + DEMAND	CMGAN	STOI	96	# 1
Speech Enhancement	VoiceBank + DEMAND	CMGAN	SSNR	11.1	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cmgan-conformer-based-metric-gan-for-monaural/audio-super-resolution-on-vctk-multi-speaker-1)](https://paperswithcode.com/sota/audio-super-resolution-on-vctk-multi-speaker-1?p=cmgan-conformer-based-metric-gan-for-monaural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cmgan-conformer-based-metric-gan-for-monaural/speech-enhancement-on-demand)](https://paperswithcode.com/sota/speech-enhancement-on-demand?p=cmgan-conformer-based-metric-gan-for-monaural)`

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

22 Sep 2022 · Sherif Abdulatif, Ruizhe Cao, Bin Yang ·

In this work, we further develop the conformer-based metric generative adversarial network (CMGAN) model for speech enhancement (SE) in the time-frequency (TF) domain. This paper builds on our previous work but takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices. We rigorously tested the generalization ability of the model to unseen noise types and distortions. We have fortified our claims through DNS-MOS measurements and listening tests. Rather than focusing exclusively on the speech denoising task, we extend this work to address the dereverberation and super-resolution tasks. This necessitated exploring various architectural changes, specifically metric discriminator scores and masking techniques. It is essential to highlight that this is among the earliest works that attempted complex TF-domain super-resolution. Our findings show that CMGAN outperforms existing state-of-the-art methods in the three major speech enhancement tasks: denoising, dereverberation, and super-resolution. For example, in the denoising task using the Voice Bank+DEMAND dataset, CMGAN notably exceeded the performance of prior models, attaining a PESQ score of 3.41 and an SSNR of 11.10 dB. Audio samples and CMGAN implementations are available online.

PDF Abstract

Code

Add Remove Mark official

ruizhecao96/cmgan official

261

SherifAbdulatif/CMGAN official

Tasks

Add Remove

Audio Super-Resolution

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Decoder

Denoising

Generative Adversarial Network

Speech Denoising

Speech Enhancement

speech-recognition

Speech Recognition

Speech Separation

Super-Resolution

Datasets

VCTK

VoiceBank + DEMAND

Results from the Paper

Edit

Ranked #1 on Audio Super-Resolution on VCTK Multi-Speaker

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Audio Super-Resolution	VCTK Multi-Speaker	CMGAN	Log-Spectral Distance	0.76	# 1	Compare
Speech Enhancement	VoiceBank + DEMAND	CMGAN	PESQ	3.41	# 5	Compare
			CSIG	4.63	# 4	Compare
			CBAK	3.94	# 3	Compare
			COVL	4.12	# 4	Compare
			STOI	96	# 1	Compare
			SSNR	11.1	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove