Improvements in Language, Lexical, and Phonetic Modeling in Sphinx-II
We studied the effect of the various types and amounts of training data on the quality of the derived vocabulary, and used our findings to derive an improved ranking of the words, using only 19% of the lm training data. We then studied the conflicting effects of increased vocabulary size on the system's accuracy, and used the result to pick an optimal vocabulary size. A similar analysis of ngram coverage yielded a very different outcome, with the best system being the one based on the most data. A new implementation of the cache language model was tested which yielded approximately 4% improvement on a development test we also studied a phrase grammar for common acronyms, which had a small but consistently positive effect, yielding an approximate gain of 0.2% (absolute) on the evaluation test set. A change was made in the evaluation of right acoustic contexts for single phone words. This yielded a consistent3% relative improvement across multiple development tests. A very simple class grammar was implemented to capture variations in verbalized pronunciation. It, too, had a small but consistently positive effect, delivering an improvement of 0.1 % (absolute) on the final evaluation test.