Browse Prior Art Database

Method of displaying and using single and multiple statuses in a statistical NLU development environment Disclosure Number: IPCOM000014497D
Original Publication Date: 2001-Oct-01
Included in the Prior Art Database: 2003-Jun-19

Publishing Venue



In statistical natural language understanding applications, a corpora of data are used to train the parameters of the statistical model, smooth the statistical models, and then test the resulting models [1]. Some statistical methods are supervised and require the data to be annotated with meaning. For example, in statistical parsing, the sentences are annotated into a parse tree containing tags and labels. Other statistical methods are unsupervised, and use algorithms like the expectation-maximization algorithm for training and smoothing the parameters of the model [2]. Currently, supervised methods exceed their unsupervised counterparts in terms of accuracy, and the more supervised data that is used, the better the results. Keeping track of how sentences are used can become problematic as the number of sentences increases. This invention proposes using a new concept called a sentence status to keep track of how sentences are used, and to possibly dynamically alter the statuses as annotation is performed. A sentence status is used to keep track of the annotation status of a sentence. For example, suppose an NLU system uses a statistical classer and parser, as is the case in ViaVoice Telephone NLU V. 1.2. Due to the complexity involved in parsing each sentence, the designer might require 2 annotators to annotate a sentence before enabling its use as a training sentence. The designer might relax this somewhat for smoothing sentences, and might decide to require just 1 annotator. The designer might decide that all test sentences should be annotated by a single annotator who's the best. This invention proposes a method for defining and changing statuses that support these concepts. The system architect is allowed to define the sentence statuses, and how the statuses change as annotators annotate or change a sentence. The information is kept as a finite state machine, where each node corresponds to a status, and each arc corresponds to an action on the sentence that changes the annotation state of the sentence. For example, the following sentence statuses might be used: Unannotated Annotated by annotator X Annotated by N annotator(s) Annotated by annotators {X, Y} Ignore