EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

Jan Schlüter, Gerald Gutenbrunner

2022-07-12Audio Classification Pitch Classification Spoken language identification Classification Instrument Recognition

Paper PDF Code(official)

Abstract

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN with better parallelizable operations, we can reach similar results more efficiently. In experiments on six audio classification tasks, our frontend matches the accuracy of LEAF at 3% of the cost, but both fail to consistently outperform a fixed mel filterbank. The quest for learnable audio frontends is not solved.

Results

Task	Dataset	Metric	Value	Model
Dialogue	VoxForge	Accuracy	91.5	LEAF
Dialogue	VoxForge	Accuracy	86.6	EfficientLEAF
Dialogue	VoxForge	Accuracy	85.6	melspect
Spoken Language Understanding	VoxForge	Accuracy	91.5	LEAF
Spoken Language Understanding	VoxForge	Accuracy	86.6	EfficientLEAF
Spoken Language Understanding	VoxForge	Accuracy	85.6	melspect
Audio Classification	Speech Commands	Accuracy	95.2	EfficientLEAF
Audio Classification	Speech Commands	Accuracy	95.1	LEAF
Audio Classification	Speech Commands	Accuracy	95.1	melspect
Audio Classification	CREMA-D	Accuracy	60.2	EfficientLEAF
Audio Classification	CREMA-D	Accuracy	58.8	melspect
Audio Classification	CREMA-D	Accuracy	50.2	LEAF
Audio Classification	BirdCLEF 2021	Accuracy	72.2	EfficientLEAF (8s)
Audio Classification	BirdCLEF 2021	Accuracy	42.9	EfficientLEAF
Audio Classification	BirdCLEF 2021	Accuracy	42.3	LEAF
Audio Classification	BirdCLEF 2021	Accuracy	39.9	melspect
Dialogue Understanding	VoxForge	Accuracy	91.5	LEAF
Dialogue Understanding	VoxForge	Accuracy	86.6	EfficientLEAF
Dialogue Understanding	VoxForge	Accuracy	85.6	melspect
Classification	Speech Commands	Accuracy	95.2	EfficientLEAF
Classification	Speech Commands	Accuracy	95.1	LEAF
Classification	Speech Commands	Accuracy	95.1	melspect
Classification	CREMA-D	Accuracy	60.2	EfficientLEAF
Classification	CREMA-D	Accuracy	58.8	melspect
Classification	CREMA-D	Accuracy	50.2	LEAF
Classification	BirdCLEF 2021	Accuracy	72.2	EfficientLEAF (8s)
Classification	BirdCLEF 2021	Accuracy	42.9	EfficientLEAF
Classification	BirdCLEF 2021	Accuracy	42.3	LEAF
Classification	BirdCLEF 2021	Accuracy	39.9	melspect
Instrument Recognition	NSynth	Accuracy	72.1	melspect
Instrument Recognition	NSynth	Accuracy	71.7	EfficientLEAF
Instrument Recognition	NSynth	Accuracy	69.2	LEAF

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

Abstract

Results

Related Papers

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

Abstract

Results

Related Papers