Description of acoustic variations by hidden Markov models with tree structure
journal contributionposted on 01.12.2007 by Satoru Hayamizu, Kai-Fu Lee, Hsiao-Wuen Hon
Any type of content formally published in an academic journal, usually following a peer-review process.
Abstract: "This paper provides a description of the acoustic variations of speech and its application to a speech recognition system using hidden Markov models. There are many sources of variabilities that affect the realization of a phoneme: phonetic contexts, speakers, stress, speaking rates and so on. Explicit modeling with these sources of variabilities will give more accurate and more detailed phone models, but even with a large amount of speech data, it is necessary to put some structure to the description for robustness. Tree-based HMMs are discussed as one of such structures.Three case studies are presented: HMMs with large VQ codebook sizes, decision tree clustering and speaker-clustering. They are tested on speaker-independent continuous speech recognition experiments with a 1,000 word vocabulary. Trainability and generalizability are discussed based on the experimental results."