Browse Prior Art Database

Language Model Adaptation Using Word Clustering Disclosure Number: IPCOM000016435D
Original Publication Date: 2003-Feb-08
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue



Building a stochastic language model (LM) for speech recognitions, etc. requires a large corpus of target task. In some tasks no enough large corpus is available and this is an obstacle to achieve a high recognition accuracy. In this paper, we propose a method for building an LM with a higher prediction power using large corpora of different tasks than an LM estimated from a small corpus of a target task. In our experiment, we used transcriptions of air university lectures and articles of {\it Nikkei} newspaper and compared an existing interpolation-based method and our new method. The result tells us that our method allows 9.71\% of perplexity reduction.