TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TUNet: A Block-online Bandwidth Extension Model based on T...

TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining

Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong

2021-10-26Audio Super-ResolutionBandwidth Extension
PaperPDFCode(official)

Abstract

We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance.

Results

TaskDatasetMetricValueModel
Audio GenerationVCTK Multi-SpeakerLog-Spectral Distance1.28TUNet + MSM pre-training
Audio GenerationVCTK Multi-SpeakerLog-Spectral Distance1.36TUNet
10-shot image generationVCTK Multi-SpeakerLog-Spectral Distance1.28TUNet + MSM pre-training
10-shot image generationVCTK Multi-SpeakerLog-Spectral Distance1.36TUNet
Audio Super-ResolutionVCTK Multi-SpeakerLog-Spectral Distance1.28TUNet + MSM pre-training
Audio Super-ResolutionVCTK Multi-SpeakerLog-Spectral Distance1.36TUNet

Related Papers

EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19Neural Spectral Band Generation for Audio Coding2025-06-07French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement2025-06-04Learning to Upsample and Upmix Audio in the Latent Domain2025-05-31UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension2025-05-22A2SB: Audio-to-Audio Schrodinger Bridges2025-01-20FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation2025-01-18FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching2025-01-09