TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Keyword Spotting	Google Speech Commands	res15 w/ SSN(S=4, A=Sub) (2019)	% Test Accuracy	97.5% ±0.15	# 3
Keyword Spotting	Google Speech Commands	res15 w/ SSN(S=4, A=Sub)	% Test Accuracy	96.8% ±0.13	# 2
Keyword Spotting	Google Speech Commands	res8 w/ SSN(S=4, A=Sub)	% Test Accuracy	95.4% ±0.22	# 1
Keyword Spotting	TAU Urban Acoustic Scenes 2019	CP-ResNet(ch128) w/ SSN(S=2, A=Sub)	Accuracy	84.1% ±0.20	# 2
Keyword Spotting	TAU Urban Acoustic Scenes 2019	CP-ResNet(ch64) w/ SSN(S=2, A=Sub)	Accuracy	83.6% ±0.07	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/subspectral-normalization-for-neural-audio/keyword-spotting-on-google-speech-commands)](https://paperswithcode.com/sota/keyword-spotting-on-google-speech-commands?p=subspectral-normalization-for-neural-audio)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/subspectral-normalization-for-neural-audio/keyword-spotting-on-tau-urban-acoustic-scenes)](https://paperswithcode.com/sota/keyword-spotting-on-tau-urban-acoustic-scenes?p=subspectral-normalization-for-neural-audio)`

SubSpectral Normalization for Neural Audio Data Processing

25 Mar 2021 · Simyung Chang, Hyoungwoo Park, Janghoon Cho, Hyunsin Park, Sungrack Yun, Kyuwoong Hwang ·

Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convolution layer to handle the frequency dimension differently. In this work, we introduce SubSpectral Normalization (SSN), which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. SSN also includes an affine transformation that can be applied to each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Keyword Spotting

Datasets

Speech Commands

TAU Urban Acoustic Scenes 2019

Results from the Paper

Edit

Ranked #1 on Keyword Spotting on TAU Urban Acoustic Scenes 2019

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Keyword Spotting	Google Speech Commands	res15 w/ SSN(S=4, A=Sub) (2019)	% Test Accuracy	97.5% ±0.15	# 3	Compare
Keyword Spotting	Google Speech Commands	res15 w/ SSN(S=4, A=Sub)	% Test Accuracy	96.8% ±0.13	# 2	Compare
Keyword Spotting	Google Speech Commands	res8 w/ SSN(S=4, A=Sub)	% Test Accuracy	95.4% ±0.22	# 1	Compare
Keyword Spotting	TAU Urban Acoustic Scenes 2019	CP-ResNet(ch128) w/ SSN(S=2, A=Sub)	Accuracy	84.1% ±0.20	# 2	Compare
Keyword Spotting	TAU Urban Acoustic Scenes 2019	CP-ResNet(ch64) w/ SSN(S=2, A=Sub)	Accuracy	83.6% ±0.07	# 1	Compare

Methods

Add Remove

Convolution

Edit Social Preview

SubSpectral Normalization for Neural Audio Data Processing

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove