Non-intrusive method for audio quality assessment of lossy-compressed music recordings using convolutional neural networks

Aleksandra Kasperuk; Sławomir Krzysztof Zieliński

Non-intrusive method for audio quality assessment of lossy-compressed music recordings using convolutional neural networks

Authors

Aleksandra Kasperuk Faculty of Computer Science, Białystok University of Technology
Sławomir Krzysztof Zieliński Faculty of Computer Science, Białystok University of Technology

Abstract

Most of the existing algorithms for the objective audio quality assessment are intrusive, as they require access both to an unimpaired reference recording and an evaluated signal. This feature excludes them from many practical applications. In this paper, we introduce a non-intrusive audio quality assessment method. The proposed method is intended to account for audio artefacts arising from the lossy compression of music signals. During its development, 250 high-quality uncompressed music recordings were collated. They were subsequently processed using the selection of five popular audio codecs, resulting in the repository of 13,000 audio excerpts representing various levels of audio quality. The proposed non-intrusive method was trained with the data obtained employing a well-established intrusive model (ViSQOL v3). Next, the performance of the trained model was evaluated utilizing the quality scores obtained in the subjective listening tests undertaken remotely over the Internet. The listening tests were carried out in compliance with the MUSHRA recommendation (ITU-R BS.1534-3). In this study, the following three convolutional neural networks were compared: (1) a model employing 1D convolutional filters, (2) an Inception-based model, and (3) a VGG-based model. The last-mentioned model outperformed the model employing 1D convolutional filters in terms of predicting the scores from the listening tests, reaching a correlation value of 0.893. The performance of the Inception-based model was similar to that of the VGG-based model. Moreover, the VGG-based model outperformed the method employing a stacked gated-recurrent-unit-based deep learning framework, recently introduced by Mumtaz et al. (2022).

References

ITU-R BS. 1116-3 Recommendation. “Methods for the subjective assessment of small impairments in audio systems,” International Telecommunication Union, Geneva, 2015.

ITU-R BS. 1534-3 Recommendation. “Method for the subjective assessment of intermediate quality level of audio systems,” International Telecommunication Union, Geneva, 2015.

C. Sloan, N. Harte, D. Kelly, A.C. Kokaram, and A. Hines, “Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio,” IEEE Transactions on Broadcasting, vol. 63, pp. 693–705, Dec. 2017. https://doi.org/10.1109/TBC.2017.2704421

S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D.J. Inman, “1D convolutional neural networks and applications: A survey,” Mechanical Systems and Signal Processing, vol. 151, 107398, 2021. https://doi.org/10.1016/j.ymssp.2020.107398

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., “Going deeper with convolutions,” in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 1−9, 2015. https://doi.org/10.1109/CVPR.2015.7298594

K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in Proc. International Conference on Learning Representations (ICLR), arXiv:1409.1556, 2015. https://doi.org/10.48550/arXiv.1409.1556

M. Chinen, F. S. C. Lim, J. Skoglund, N. Gureev, F. O'Gorman, and A. Hines, “ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric,” in Proc. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Ireland, 2020. https://doi.org/10.1109/QoMEX48832.2020.9123150

M. Karjalainen, “A new auditory model for the evaluation of sound quality of audio systems,” in Proc. ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing, Tampa, FL, USA, 1985. https://doi.org/10.1109/ICASSP.1985.1168376

T. Thiede, W. Treurniet, R. Bitto, C. Schmidmer, T. Sporer, J. Beerends, C. Colomes, M. Keyhl, G. Stoll, K. Brandenburg, and B. Feiten, “PEAQ—the ITU standard for objective measurement of perceived audio quality,” J. Audio Eng. Soc., vol. 48, pp. 3−29, 2000. http://www.aes.org/e-lib/browse.cfm?elib=12078

ITU-R BS. 1387-2 Recommendation. “Method for objective measurements of perceived audio quality,” International Telecommunication Union, Geneva, 2023.

R. Huber and B. Kollmeier, “PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, pp. 1902−1911, 2006. https://doi.org/10.1109/TASL.2006.883259

J. M. Kates and K. H. Arehart, “The Hearing-Aid Audio Quality Index (HAAQI),” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, pp. 354−365, 2016. https://doi.org/10.1109/TASLP.2015.2507858

G. Jiang, A. Biswas, C. Bergler, and A. Maier, “InSE-NET: A Perceptually Coded Audio Quality Model based on CNN,” in Proc. 151st Audio Engineering Society Convention, Online, 2021. http://www.aes.org/e-lib/browse.cfm?elib=21478

P. M. Delgado and J. Herre, “Can We Still Use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality,” in Proc. Twelfth International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Ireland, 2020. https://doi.org/10.1109/QoMEX48832.2020.9123105

R. E. Zezario, S.-W. Fu, F. Chen, C.-S. Fuh, H.-M. Wang, and Y. Tsao, “Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54−70, 2023. https://doi.org/10.1109/TASLP.2022.3205757

C. K. A. Reddy, V. Gopal, and R. Cutler, “Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors,” in Proc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022. https://doi.org/10.1109/ICASSP43922.2022.9746108

A. A. Catellier and S. D. Voran, “Wawenets: A No-Reference Convolutional Waveform-Based Approach to Estimating Narrowband and Wideband Speech Quality,” in Proc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020. https://doi.org/10.1109/ICASSP40776.2020.9054204

S.-W. Fu, Y. Tsao, H.-T. Hwang, and H.-M. Wang, “Quality-Net: An end-to-end non-intrusive speech quality assessment model based on BLSTM,” in Proc. Interspeech, Hyderabad. India, pp. 1873−1877, 2018. https://doi.org/10.48550/arXiv.1808.05344

G. Mittag, B. Naderi, A. Chehadi, and Sebastian Möller, “NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets,” in Proc. Interspeech, Brno, Czechia, pp. 2127−2131, 2021. https://doi.org/10.21437/Interspeech.2021-299

C. Sørensen, J. B. Boldt, and M. G. Christensen, “Validation of the Non-Intrusive Codebook-based Short Time Objective Intelligibility Metric for Processed Speech,” in Proc. Interspeech, Graz, Austria, pp. 4270−4274, 2019. http://dx.doi.org/10.21437/Interspeech.2019-1625

D. Mumtaz, V. Jakhetiya, K. Nathwani, B. N. Subudhi, and S. C. Guntuku, “Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning,” IEEE Transactions on Industrial Informatics, vol. 18, pp. 7780−7789, 2022. https://doi.org/10.1109/TII.2021.3139010

K Organiściak and J. Borkowski, “Single-ended quality measurement of a music content via convolutional recurrent neural networks,” Metrology and Measurement Systems, vol. 27, pp. 721−733, 2020. https://doi.org/10.24425/mms.2020.134849

EBU R.128 Recommendation, “Loudness normalization and permitted maximum level of audio signals,” European Broadcasting Union, Geneva, 2020.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge.” International Journal of Computer Vision, vol. 115, pp. 211–252, 2015. https://doi.org/10.1007/s11263-015-0816-y

D. P. Kingma and J. L. Ba, “ADAM: a method for stochastic optimization,” in Proc. 3rd International Conference on Learning Representations (ICLR 2015), San Diego, pp. 1–15, 2015. https://doi.org/10.48550/arXiv.1412.6980

A. Kasperuk, “Software repository. Nonintrusive audio quality assessment ISSET2023,” GitHub, https://github.com/WaitWhatSon/nonintrusive_audio_quality_assessment_isset2023 (accessed on August 18, 2023).

M. Schoeffler, F. Stöter, B. Edler, and J. Herre, “Towards the Next Generation of Web-based Experiments: A Case Study Assessing Basic Audio Quality Following the ITU-R Recommendation BS.1534 (MUSHRA),” in Proc. 1st Web Audio Conference, Paris, France, 2015.

“The 'Mixing Secrets' Free Multitrack Download Library,” Cambridge Music Technology, https://cambridge-mt.com/ms/mtk/ (accessed on June 10, 2023).

R. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello, “MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research,” in Proc. 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan, 2014.

S. K. Zieliński, “On Some Biases Encountered in Modern Audio Quality Listening Tests (Part 2): Selected Graphical Examples and Discussion,” J. Audio Eng. Soc., vol. 64, pp. 55−74, 2016. http://www.aes.org/e-lib/browse.cfm?elib=18105

Additional Files

Published

2024-06-20

Issue

Vol. 70 No. 2 (2024)

Section

ARTICLES / PAPERS / General

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

1. License

The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on https://creativecommons.org/licenses/by/4.0/.

2. Author’s Warranties

The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s. The undersigned also warrants that the manuscript (or its essential substance) has not been published other than as an abstract or doctorate thesis and has not been submitted for consideration elsewhere, for print, electronic or digital publication.

3. User Rights

Under the Creative Commons Attribution license, the author(s) and users are free to share (copy, distribute and transmit the contribution) under the following conditions: 1. they must attribute the contribution in the manner specified by the author or licensor, 2. they may alter, transform, or build upon this work, 3. they may use this contribution for commercial purposes.

4. Rights of Authors

Authors retain the following rights:

- copyright, and other proprietary rights relating to the article, such as patent rights,

- the right to use the substance of the article in own future works, including lectures and books,

- the right to reproduce the article for own purposes, provided the copies are not offered for sale,

- the right to self-archive the article

- the right to supervision over the integrity of the content of the work and its fair use.

5. Co-Authorship

If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.

6. Termination

This agreement can be terminated by the author or the Journal Owner upon two months’ notice where the other party has materially breached this agreement and failed to remedy such breach within a month of being given the terminating party’s notice requesting such breach to be remedied. No breach or violation of this agreement will cause this agreement or any license granted in it to terminate automatically or affect the definition of the Journal Owner. The author and the Journal Owner may agree to terminate this agreement at any time. This agreement or any license granted in it cannot be terminated otherwise than in accordance with this section 6. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

7. Royalties

This agreement entitles the author to no royalties or other fees. To such extent as legally permissible, the author waives his or her right to collect royalties relative to the article in respect of any use of the article by the Journal Owner or its sublicensee.

8. Miscellaneous

The Journal Owner will publish the article (or have it published) in the Journal if the article’s editorial process is successfully completed and the Journal Owner or its sublicensee has become obligated to have the article published. Where such obligation depends on the payment of a fee, it shall not be deemed to exist until such time as that fee is paid. The Journal Owner may conform the article to a style of punctuation, spelling, capitalization and usage that it deems appropriate. The Journal Owner will be allowed to sublicense the rights that are licensed to it under this agreement. This agreement will be governed by the laws of Poland.

By signing this License, Author(s) warrant(s) that they have the full power to enter into this agreement. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

Non-intrusive method for audio quality assessment of lossy-compressed music recordings using convolutional neural networks

Authors

Abstract

References

Additional Files

Published

Issue

Section

License

Information

Current Issue