Carnegie Mellon University
Browse

Combining multiple parallel streams for improved speech processing

Download (1.07 MB)
thesis
posted on 2025-07-09, 21:26 authored by Joao Miranda
<p dir="ltr">In a number of applications, one often has access to distinct but overlapping views over the same information. For instance, a lecture may be supported by slides, a TV series may be accompanied by subtitles, or a conference in one language may be interpreted into another. Since most useful speech and language processing technologies such as speech recognition are not perfect, it would be desirable to be able to fuse these different perspectives in order to obtain improved performance. </p><p dir="ltr">In this thesis, a general method for combining multiple information streams which are, in part or as a whole, translations of each other, is presented. The algorithms developed for this purpose rely both on word lattices, representing posterior probability distributions over word sequences, and phrase tables, which map word sequences to their respective translations, to generate an alignment of the different streams. From this alignment, we extract phrase pairs, and use them to compute a new most likely decoding of each stream, biased towards phrases in the alignment. This method was used in two different applications : transcription of simultaneously interpreted speeches in the European Parliament and of lectures supported by slides. In both of these scenarios, we achieved performance improvements when compared with speech recognition only baselines. We also demonstrate how recovering acronyms and words that cannot be found in the lattices can be used to enhance overall speech recognition performance, and propose a scheme to add new pronunciations to the recognition lexicon. Both of these techniques are also based on cross-stream information. </p><p dir="ltr">We also explored how rich transcription techniques, namely sentence segmentation and detection / recovery of disfluencies (filled pauses, hesitations, repetitions, etc.), can benefit from the information contained in parallel streams. Cues extracted from other streams were used to supplement currently existing methods to help solve each of these problems.</p>

History

Date

2016-07-12

Degree Type

  • Dissertation

Thesis Department

  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Alan Black Joao Paulo Neto

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC