Browse Prior Art Database

Pseudo-Pos Language Model Having a Redefined Vocabulary

IP.com Disclosure Number: IPCOM000065031D
Original Publication Date: 1985-Oct-01
Included in the Prior Art Database: 2005-Feb-19

Publishing Venue

IBM

Related People

Authors:
Jelinek, F [+details]

Abstract

A probabilistic model which generates words depending on strings of symbols from a smaller vocabulary is disclosed. The smaller vocabulary can correspond to the set of parts of the speech, the set preferably being refined based on statistical relationships between word strings in a training text. In speech or character recognition and in Japanese written character conversion, it is customary to use a language model which relates decisions about words (or word strings) to context. Probabilistically, the likelihood of a given word string w1,w2,w3 ...wn can be represented symbolically by P(w1,w2,w3,...wn). The respective probabilities of word strings serve as one basis upon which recognition or conversion can be made. In examining P(w1,w2,w3,...