Adapting Spoken Dialog Systems Towards Domains and Users
Spoken dialog systems have been widely used, such as the voice applications or agents in smart phone or smart car environments. However, speech systems are built using the developers’ understanding of the application domain and of potential users in the field, driven by observations collected by sampling the population at a given time. Therefore, the deployed models may not perfectly fit the real-life usage or may no longer be valid with the changing dynamics of the domain/users over time. A system which automatically adapts to the domain and users is naturally desired. In this thesis, we focus on three realistic problems in language-based communication between human and machine. First, current speech systems with fixed vocabulary have difficulty understanding out-of-vocabulary words (OOVs), leading to misunderstanding or even task failures. Our approaches can learn new words during the conversation or even before the conversation by detecting the presence of OOVs or anticipating the new words ahead of time. Our experiments show that OOV-related recognition and understanding errors can be therefore prevented. Second, cloudbased automatic speech recognition (cloud-ASR) is widely used by current dialog applications. The problem though is that it lacks the flexibility to adapt to domains or users. Our method, which combines hypotheses from 1) a local and adaptive ASR and 2) the cloud-ASR, can provide better recognition accuracy. Third, when interacting with a dialog system, users’ intention may go beyond individual domains hosted by the system. However, current multi-domain dialog systems do not have the awareness of the user’s high-level intentions, resulting in lost opportunities to assist the user in a timely manner or personalize the interaction experience. We built models to recognize the complex user intentions and enable the system to communicate with the user at the task level, in addition to the individual domain level. We believe that adaptation in these three levels can contribute to the quality of human-machine interactions.
- Language Technologies Institute
- Doctor of Philosophy (PhD)