Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform

Yohannis Telila, Tommaso Cucinotta, Davide Bacciu

2025-05-07Music Transcription

Abstract

Automatic music transcription (AMT) is the problem of analyzing an audio recording of a musical piece and detecting notes that are being played. AMT is a challenging problem, particularly when it comes to polyphonic music. The goal of AMT is to produce a score representation of a music piece, by analyzing a sound signal containing multiple notes played simultaneously. In this work, we design a processing pipeline that can transform classical piano audio files in .wav format into a music score representation. The features from the audio signals are extracted using the constant-Q transform, and the resulting coefficients are used as an input to the convolutional neural network (CNN) model.

Related Papers

Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription2025-06-17 Dialogue in Resonance: An Interactive Music Piece for Piano and Real-Time Automatic Transcription System2025-05-22 Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio2025-05-19 Music Tempo Estimation on Solo Instrumental Performance2025-04-25 Scalable Approximate Algorithms for Optimal Transport Linear Models2025-04-06 Multi-task learning-based temporal pattern matching network for guitar tablature transcription2025-04-03 D3RM: A Discrete Denoising Diffusion Refinement Model for Piano Transcription2025-01-09 Meta-learning-based percussion transcription and $t\bar{a}la$ identification from low-resource audio2025-01-08