A Spectro-Temporal Framework for Compensation of Reverberation for Speech Recognition
The objective of this thesis is the development of signal processing and analysis techniques that would provide sharply improved speech recognition accuracy in highly reverberant environments. Speech is a natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the thesis is on the robustness of ASR to reverberation.
In our work, we first provide a new framework to adequately and efficiently represent the problem of reverberation in speech feature domains. Although our framework incurs modeling approximation errors, we believe that it provides a good basis for developing reverberation compensation algorithms. Based on our framework, we successfully develop a number of dereverberation algorithms. The algorithms reduce the uncertainly involved in dereverberation tasks by using speech knowledge in terms of cepstral auto-correlation, cepstral distribution, and, non-negativity and sparsity of spectral values. We demonstrate the success of our algorithms on clean-training as well as matched-training.
Apart from dereverberation, we also provide an approach for noise robustness via a temporal-difference operation in the speech spectral domain. There, via a theoretical analysis, we predict an expected improvement in the SNR threshold shift for whitenoise conditions. We also empirically quantify and study speech-feature level distortion with respect to speech-signal level additive noise.
Finally, we provide a new framework for a joint reverberation and noise representation and compensation. The new framework generalizes the spectral domain reverberation framework by incorporating an additive noise term. Working under the new framework, we combine our dereverberation and noise compensation approaches for better dereverberation as well as for the most challenging speech recognition task that includes both noise and reverberation components.