Carnegie Mellon University
Browse

Style-Specific Phrasing in Speech Synthesis

Download (762.56 kB)
thesis
posted on 2025-05-30, 19:47 authored by Alok Parlikar

People pause between words and sentences when they speak. They pause to emphasize content, or to make an utterance more understandable, or just to take a breath. A speech synthesizer should also insert similar pauses to sound natural.

The process of inserting prosodic breaks in an utterance is called Phrasing. Phrasing is a crucial step during speech synthesis because other models of prosody depend on it. Phrasing also helps characterize styles of speech, and synthesizers must adapt their phrasing to different speaking styles.

This thesis presents a data-driven grammar-based approach that can be used to build style-specific phrasing models. We automatically label phrase breaks from speech data and use features over acoustic syntax in our modeling. Experimental results, both objective and subjective, show that these models are better than the prior state-of-art across various speaking styles.

This thesis presents a minimum error-rate training approach to improve the phrasing models by optimizing them directly towards the evaluation criterion: the F-measure. This framework also allows us to define a knob that can be used to vary the number of phrase breaks produced in an utterance. This can be useful when changing the speaking rate.

This thesis also discusses modeling not just the placement of phrase breaks, but also their duration. Corpus analysis shows that durations of breaks vary quite significantly between different styles, and we present methods with which this variation can be captured in a way that is perceptually better.

The presented phrasing methods can have a broader impact on intonation models and can enhance the intelligibility of the synthesis of machine translation output. These methods can also be extended to “low-resource” scenarios, such as when building voices for uncommon languages, or for languages that do not have a standardized orthography

History

Date

2013-12-13

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Alan W. Black

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC