Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
Faculty of Media Engineering, Islamic Republic of Iran Broadcast University, Tehran, Iran
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametric speech synthesis are two dominant speech synthesizer techniques. The naturalness is the main challenge of all speech synthesis approaches. The Intonation, speech style and emotional state are included in naturalness factor and all of them are considered as suprasegmental features. Equipped synthesized speech with paralinguistic information is more believable from the perceptual aspect. Prosody information plays an important role on the synthesized speech quality of text to speech systems. The first purpose of modern speech synthesizer systems is text to speech conversion and the second purpose is transferring the emotional states of text in the voice form. In this paper two main speech synthesis approaches and their challenges are investigated in detail.