Comparative Analysis of Natural and Synthesized Polish Speech

Michał Daniluk; Agnieszka Paula Pietrzak

Comparative Analysis of Natural and Synthesized Polish Speech

Authors

Michał Daniluk Warsaw University of Technology, Institute of Radiocommunication and Multimedia Technology
Agnieszka Paula Pietrzak Warsaw University of Technology, Institute of Radiocommunication and Multimedia Technology

Abstract

In the evolving field of speech synthesis, not only intelligibility, but also naturalness remains an important factor. This paper presents a comparative analysis of natural versus synthesized Polish speech. Speech synthesizers: Ivona, Mekatron, Notevibes, and ttsmp3 were explored. Four methods for assessing synthesized speech quality and comparing it to natural speech were presented: the AB test, MOS, logatom articulation test, and MUSHRA. Sentence databases and a database of logatoms were generated for each synthesizer and recorded for natural speech. Results indicated natural speech was consistently better than synthesized speech. Among the synthesizers, Notevibes performed best in all comparisons, while Mekatron ranked lowest.

References

D. H. Klatt, "Review of text-to-speech conversion for English," J. Acoust. Soc. Am., vol. 82, no. 3, pp. 737-793, 1987.

Y. Ning, S. He, Z. Wu, C. Xing, and L. J. Zhang, "A review of deep learning based speech synthesis," Applied Sciences, vol. 9, no. 19, p. 4050, 2019

T. Dutoit, "High-quality text-to-speech synthesis: An overview," J. Electrical & Electronics Engg., vol. 17, no. 1-2, pp. 25-33, 1997.

N. Kaur and P. Singh, "Conventional and contemporary approaches used in text to speech synthesis: A review," Artificial Intelligence Review, vol. 56, no. 7, pp. 5837-5880, 2023.

ITU-R BS.1116-3, "Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems," International Telecommunication Union, 2015.

J. L. Flanagan, "Speech analysis, synthesis, and perception," Springer, 1972.

M. Kaszczuk and L. Osowski, "Evaluating IVONA Speech Synthesis System for Blizzard Challenge 2006," Blizzard Workshop, Pittsburgh, PA, 2006.

H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Communication, 2009.

Tacotron2, “Tacotron 2 synthesis”, Google Colab, 2023. [Online]. Available: https://colab.research.google.com/drive/1gsPMm4mBD71WcTftEffMs3-N89HlD1ju

A. van den Oord et al., "WaveNet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.

Notevibes, "Polish text-to-speech," Notevibes, 2023. [Online]. Available: https://notevibes.com/polish-text-to-speech/

TTSMP3, "ttsmp3 API Documentation," TTSMP3, 2023. [Online]. Available: https://ttsmp3.com/apidoc.php

E. Ozimek, A. Warzybok, and D. Kutzner, "Polish sentence matrix test for speech intelligibility measurement in noise," International Journal of Audiology, vol. 49, no. 6, pp. 444-454, 2010.

J. Rafałko, "Algorytmy automatyzacji tworzenia baz jednostek akustycznych w syntezie mowy polskiej," Institute of Systems Research of the Polish Academy of Sciences, 2014.

International Telecommunication Union, "Recommendation I.T.U.T. P. 800: Methods for subjective determination of transmission quality," Geneva, 1996.

International Telecommunications Union, "Recommendation I.T.U.R. 1534-1: Method for the Subjective Assessment of Intermediate Sound Quality (MUSHRA)," Geneva, Switzerland, 2001.

International Telecommunication Union Radiocommunication Assembly, "Method for the subjective assessment of intermediate quality level of audio systems," Series B, 2014.

W. Bartosik, "Projekt i realizacja aplikacji webowej do tworzenia i przeprowadzania testów słuchowych MUSHRA," Institute of Radioelectronics and Multimedia Technology, Warsaw University of Technology, 2020.

S. Brachmański, "Test material used to assess speech quality in Poland," in Acoustics, Acoustoelectronics and Electrical Engineering, F. Witos, Ed., Gliwice, 2021, pp. 65-79.

S. Brachmański, "Selected Issues of Speech Signal Transmission Quality Assessment [Wybrane zagadnienia oceny jakości transmisji sygnału mowy]," Wrocław, Poland: Oficyna Wydawnicza Politechniki Wrocławskiej, 2015.

Additional Files

Published

2024-06-20

Issue

Vol. 70 No. 2 (2024)

Section

ARTICLES / PAPERS / General

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

1. License

The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on https://creativecommons.org/licenses/by/4.0/.

2. Author’s Warranties

The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s. The undersigned also warrants that the manuscript (or its essential substance) has not been published other than as an abstract or doctorate thesis and has not been submitted for consideration elsewhere, for print, electronic or digital publication.

3. User Rights

Under the Creative Commons Attribution license, the author(s) and users are free to share (copy, distribute and transmit the contribution) under the following conditions: 1. they must attribute the contribution in the manner specified by the author or licensor, 2. they may alter, transform, or build upon this work, 3. they may use this contribution for commercial purposes.

4. Rights of Authors

Authors retain the following rights:

- copyright, and other proprietary rights relating to the article, such as patent rights,

- the right to use the substance of the article in own future works, including lectures and books,

- the right to reproduce the article for own purposes, provided the copies are not offered for sale,

- the right to self-archive the article

- the right to supervision over the integrity of the content of the work and its fair use.

5. Co-Authorship

If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.

6. Termination

This agreement can be terminated by the author or the Journal Owner upon two months’ notice where the other party has materially breached this agreement and failed to remedy such breach within a month of being given the terminating party’s notice requesting such breach to be remedied. No breach or violation of this agreement will cause this agreement or any license granted in it to terminate automatically or affect the definition of the Journal Owner. The author and the Journal Owner may agree to terminate this agreement at any time. This agreement or any license granted in it cannot be terminated otherwise than in accordance with this section 6. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

7. Royalties

This agreement entitles the author to no royalties or other fees. To such extent as legally permissible, the author waives his or her right to collect royalties relative to the article in respect of any use of the article by the Journal Owner or its sublicensee.

8. Miscellaneous

The Journal Owner will publish the article (or have it published) in the Journal if the article’s editorial process is successfully completed and the Journal Owner or its sublicensee has become obligated to have the article published. Where such obligation depends on the payment of a fee, it shall not be deemed to exist until such time as that fee is paid. The Journal Owner may conform the article to a style of punctuation, spelling, capitalization and usage that it deems appropriate. The Journal Owner will be allowed to sublicense the rights that are licensed to it under this agreement. This agreement will be governed by the laws of Poland.

By signing this License, Author(s) warrant(s) that they have the full power to enter into this agreement. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

Comparative Analysis of Natural and Synthesized Polish Speech

Authors

Abstract

References

Additional Files

Published

Issue

Section

License

Information

Current Issue