Browse Prior Art Database

Method of displaying and using single and multiple statuses in a statistical NLU development environment

IP.com Disclosure Number: IPCOM000014497D
Original Publication Date: 2001-Oct-01
Included in the Prior Art Database: 2003-Jun-19
Document File: 2 page(s) / 61K

Publishing Venue

IBM

Abstract

In statistical natural language understanding applications, a corpora of data are used to train the parameters of the statistical model, smooth the statistical models, and then test the resulting models [1]. Some statistical methods are supervised and require the data to be annotated with meaning. For example, in statistical parsing, the sentences are annotated into a parse tree containing tags and labels. Other statistical methods are unsupervised, and use algorithms like the expectation-maximization algorithm for training and smoothing the parameters of the model [2]. Currently, supervised methods exceed their unsupervised counterparts in terms of accuracy, and the more supervised data that is used, the better the results. Keeping track of how sentences are used can become problematic as the number of sentences increases. This invention proposes using a new concept called a sentence status to keep track of how sentences are used, and to possibly dynamically alter the statuses as annotation is performed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

  Method of displaying and using single and multiple statuses in a statistical NLU development environment

   In statistical natural language understanding applications, a corpora of data are used to train the parameters of the statistical model, smooth the statistical models, and then test the resulting models [1]. Some statistical methods are supervised and require the data to be annotated with meaning. For example, in statistical parsing, the sentences are annotated into a parse tree containing tags and labels. Other statistical methods are unsupervised, and use algorithms like the expectation-maximization algorithm for training and smoothing the parameters of the model [2]. Currently, supervised methods exceed their unsupervised counterparts in terms of accuracy, and the more supervised data that is used, the better the results. Keeping track of how sentences are used can become problematic as the number of sentences increases. This invention proposes using a new concept called a sentence status to keep track of how sentences are used, and to possibly dynamically alter the statuses as annotation is performed.

A sentence status is used to keep track of the annotation status of a sentence. For example, suppose an NLU system uses a statistical classer and parser, as is the case in ViaVoice Telephone NLU V. 1.2. Due to the complexity involved in parsing each sentence, the designer might require 2 annotators to annotate a sentence before enabling its use as a training sentence. The designer might relax this somewhat for smoothing sentences, and might decide to require just 1 annotator. The designer might decide that all test sentences should be annotated by a single annotator who's the best. This invention proposes a method for defining and changing statuses that support these concepts.

The system architect is allowed to define the sentence statuses, and how the statuses change as annotators annotate or change a sentence. The information is kept as a finite state machine, where each node corresponds to a status, and each arc corresponds to an action on the sentence that changes the annotation state of the sentence. For example, the following sentence statuses might be used:

Unannotated Annotated by annotator X Annotated by N annotator(s) Annotated by annotators {X, Y} Ignore
Do Later

Initially, a sentence is unannotated. When the sentence gets annotated by annotator X, then the status changes either to "Annotated by 1 annotator", or "Annotated by annotator X". If later this sentence is also annotated by annotator Y, the the status will change to either "Annotated by 2 annotators" or "Annotated by annotators {X,Y}".

The finite state machine is consulted whenever a sentence is annotated by an annotator, and the correct arc is taken in the finite state machine from the old status to the new status. For example, if a sentence is in the "Annotated by annotator X", and then a second annotator decides this sentence should...