Classification of Music Structural Functions using Deep Learning

Authors

Abstract

Music Structure Analysis (MSA) is crucial for understanding and leveraging the arrangement of musical compositions in various applications, such as music information retrieval, multimedia description, and recommendation systems. The following paper presents a novel approach to MSA that aims to predict labels for structural music segments (such as verse or chorus), thereby it would enhance any MSA-based applications. This is the supervised approach in contrast to clustering-based methods. For the task, selected pre-trained Convolutional Neural Networks (CNNs), such as VGG, ResNet or MobileNet were applied to classify the segments of musical structures (verse, chorus, etc.). Results demonstrated that ResNet50 and DenseNet121 achieved the highest performance in terms of classification accuracy, with ResNet50 reaching 87% and DenseNet121 reaching 85.16%. This highlights the potential of deep learning models for accurate and efficient music structure segment labeling, opening possibilities for advanced applications in both offline and real-time music analysis scenarios.

Additional Files

Published

2025-07-09

Issue

Section

Acoustics