Carnegie Mellon University
Browse

Semi-Supervised Learning for Speech Recognition in the Context of Accent Adaptation

Download (229.06 kB)
journal contribution
posted on 2012-09-01, 00:00 authored by Udhyakumar Nallasamy, Florian MetzeFlorian Metze, Tanja Schultz

Accented speech that is under-represented in the training data still suffers high Word Error Rate (WER) with state-of-the-art Automatic Speech Recognition (ASR) systems. Careful collection and transcription of training data for different accents can address this issue, but it is both time consuming and expensive. However, for many tasks such as broadcast news or voice search, it is easy to obtain large amounts of audio data from target users with representative accents, albeit without accent labels or even transcriptions. Semi-supervised training have been explored for ASR in the past to leverage such data, but many of these techniques assume homogeneous training and test conditions. In this paper, we experiment with cross-entropy based speaker selection to adapt a source recognizer to a target accent in a semi-supervised manner, using additional data with no accent labels. We compare our technique to self-training based only on confidence scores and show that we obtain significant improvements over the baseline by leveraging additional unlabeled data on two different tasks in Arabic and English.

History

Publisher Statement

Copyright 2012 ISCA

Date

2012-09-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC