Bing Jing

Using Different Approaches in Language Modeling to Improve Conversational Speech Recognition

Tuesday, March 18, 2003, 11 AM 442 Dana

Abstract

Spontaneous conversational speech recognition has been attracting interest in the speech community. Language models play an important role in improving the overall recognition accuracy. A common way to train language models for domain specific recognition purposes is to use in-domain word text for the training. However, due to the sparse-data problem, i.e. the amount of in-domain training data is typically insufficient to estimate the language models' parameters, the language models trained by using only the in-domain word text give undesirable performance.

In this thesis, we investigate using different approaches to improve the performance of language models for spontaneous conversational speech, especially by using the class n-gram, interpolated with the conventional word n-gram. We explore using large corpora to obtain classes and apply them to task-specific small corpora to obtain class n-grams. The basic idea is if there is insufficient data to estimate a conventional word n-gram language model, then the classes estimated by using these insufficient data could not be robust. A large corpus may not match the task-specific word n-gram exactly, while it still has enough data to determine classes that are appropriate for the task-specific domain. We also investigate some other approaches, including using different smoothing techniques in both word n-gram and class n-gram language model training, using compound words and introducing higher order n-grams on word-level and class-level. In addition, we investigate using out-of-domain data (news transcription) to improve the language model performance for spontaneous conversational speech.

Experiments are mainly based on the Switchboard corpus of spontaneous telephone conversations, with out-of-domain text drawn from Broadcast News corpora. The effects of these approaches on improving the language model's performance are presented and discussed.

Thesis Committee:
John Makhoul (advisor)
Dana Brooks
Søren Buus