Browse Prior Art Database

System for building a UIMA pipeline information map

IP.com Disclosure Number: IPCOM000235810D
Publication Date: 2014-Mar-25
Document File: 2 page(s) / 57K

Publishing Venue

The IP.com Prior Art Database

Abstract

In this article we discuss a system and method for building a collection of metadata (referred to as Information map in the rest of the document) for a UIMA pipeline. The information map is built automatically by analyzing events coming out of a running pipeline. The article contains detailed information about the building blocks of the information map and the type of action that are available from leveraging this map.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 2

System for building a UIMA pipeline information map


The Unstructured Information Management Architecture (UIMA) is a specification which standardizes a software system framework for performing complex content analytics on unstructured data. The main idea of UIMA is that a document is submitted to a pipeline which is comprised of an ordered set of annotators and controllers. Each annotator is invoked sequentially or in parallel, providing annotations on the content and recording them along the way in the document.

For very large UIMA pipelines (for example the Question/Answering system IBM Watson* which may contain up to 300 annotators) it often becomes very hard for developers to understand how individual annotators contribute to the overall system because of the following reasons: -Invocation of annotators sometimes depends on runtime data found in the document. For example, a temporal annotator may only be invoked if the original question contains time related data
-Annotators may be developed by different teams in different geographies
-UIMA types may be developed by different teams as well and may get modified over time

While the Unified Modeling Language serves as an effective general-purpose modeling language for object-oriented software systems, UIMA pipeline component behavior can't effectively be described with this kind of design notation. UIMA pipeline components are so completely decoupled and component behavior is so highly dependent on pipeline input as previously stated that a method that uses a combination of static analysis of component descriptors and runtime analysis of pipeline behavior would be needed in order to see the complete picture.

This invention proposes a method for building a UIMA information map that captures fine grain information about the pipeline:
-Which annotators (local and remote) are run, in which order, and for how long
-What types of features are being written by each annotator as well as which features are being read -- or sought and not found -- by each annotator
-Configurable statistics about each features that can be used to generate metrics about key aspects of a feature. For example, how many times did features structure X had its fields Y with a value of Z.
-Relationships between the software system and the hardware running it, for example exactly which computing node an annotator ran in -Easy to navigate tree structure representing the type system.

The invention collects the information and constructs the map in an unsupervised process which is independent of the pipeline being use...