Publications

Automatic prosody prediction and detection with Conditional Random Field (CRF) models

Abstract

While the current TTS systems can deliver quite acceptable segmental quality of synthesized speech for voice user interface applications, its prosody is still perceived by users as “robotic” or not expressive. In this paper, we investigate how to improve TTS prosody prediction and detection. Conditional Random Field (CRF), a discriminative probabilistic model for the labeling the sequential data, is adopted. Rich syntactic and acoustic, contextual features are used in building the CRF models. Experiments performed on Boston University Radio Speech Corpus show that CRF models trained on our proposed rich contextual features can improve the accuracy of prosody prediction and detection in both speaker-dependent and speaker-independent cases. The performance is either comparable or better than the best reported results.

Metadata

publication
2010 7th International Symposium on Chinese Spoken Language Processing, 135-138, 2010
year
2010
publication date
2010/11/29
authors
Yao Qian, Zhizheng Wu, Xuezhe Ma, Frank Soong
link
https://ieeexplore.ieee.org/abstract/document/5684835/
resource_link
http://www.cs.cmu.edu/~xuezhem/publications/ISCSLP2010.pdf
conference
2010 7th International Symposium on Chinese Spoken Language Processing
pages
135-138
publisher
IEEE