admin@publications.scrs.in   
SCRS Conference Proceedings on Intelligent Systems

Statistical Parametric Speech Synthesis for Punjabi Language using Deep Neural Network

Authors: Harman Singh, Parminder Singh and Manjot Kaur Gill


Publishing Date: 26-04-2022

ISBN: 978-93-91842-08-6

DOI: https://doi.org/10.52458/978-93-91842-08-6-41

Abstract

In recent years, speech technology gets very advanced, due to which speech synthesis becomes an interesting area of study for researchers. Text-To-Speech (TTS) system generates the speech from the text by using a synthesized technique like concatenative, formant, articulatory, Statistical Parametric Speech Synthesis (SPSS) etc. The Deep Neural Network (DNN) based SPSS for the Punjabi language is used in this research work. The database used for this research works contains 674 audio files and a single text file containing 674 sentences. This database was created at the Language Technologies Institute at Carnegie Mellon University (CMU) provided under Festvox distribution. Ossian toolkit is used as a front-end for text processing. The two DNNs are modeled using the merlin toolkit. The duration DNN maps the linguistic and duration features of speech. The acoustic DNN maps the linguistic and acoustic features. The subjective evaluation using the Mean Opinion Score (MOS) shows that this TTS system has good quality of naturalness that is 80.2%.

Keywords

TTS, SPSS, DNN, Punjabi, Speech Synthesis.

Cite as

Harman Singh, Parminder Singh and Manjot Kaur Gill, "Statistical Parametric Speech Synthesis for Punjabi Language using Deep Neural Network", In: Raju Pal and Praveen Kumar Shukla (eds), SCRS Conference Proceedings on Intelligent Systems, SCRS, India, 2022, pp. 431-441. https://doi.org/10.52458/978-93-91842-08-6-41

Recent