Browse Prior Art Database

System And Method To Dynamically Create Optimum Mining Flows In A Strongly Typed Analytics System.

IP.com Disclosure Number: IPCOM000226486D
Publication Date: 2013-Apr-08
Document File: 3 page(s) / 26K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed are a system and method to dynamically generate and detect flow changes in the miner chains of analytics processing. The system does this by formalizing the notion of a miner and using the additional information, along with processing the queries, to create a set of native types.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Page 01 of 3

System And Method To Dynamically Create Optimum Mining Flows In A Strongly

Typed Analytics System.

In analytics, managing a very large number of miners together with a large number of queries (concepts) is a problem. It becomes difficult to have detailed flows for miner/Natural Language Processing (NLP) modules/extractors/scorers and other forms of processing to occur in the right order and on the right subset of documents. This requires a lot of knowledge as well as manual and skill intensive work, and is often difficult to adjust to new queries with a reliable and timely response.

Current methodology and research in the specific application domain can effect changes in the miners and avoid complete processing of the corpus of data from the beginning such that the optimum subset of flows only needs processing. Most approaches in this space tend to be brute-force, where changes in the

miners/extractors or in the queries become overly long processing on huge data sets before arriving at the new consistent state for functional queries. This results in expensive system configuration, significant latency to arrive at actual query execution, and an inability to manage the increasing set of data and the queries against that data.

The novel idea for the invented system is to dynamically generate and detect flow changes in the miner chains. The system does this by formalizing the notion of a miner and using the additional information, along with processing the queries, to create a set of native types.

A typical/general analytics system has the following parts:


• A Type system that captures all known types that are usable throughout the

system


• A Query Processing Engine that takes a query in any syntactic form and

generates a set (or a list of unique) types based on the Type System


• A set of Miners (or artifact processing engines) that requires a set of input types and a set of output types. This is provided to the system as a Contract for each miner. A contract is defined as:


- TiXX


- TiYY


- M:


- ToXX

- ToYY


Ti are input types, To are output types, and M is the miner descriptor assumed unique in the set of {M} known to the system.

XX,YY are used to uniquely identify types in the convention used in this document.

Having this information, the invention attempts to generate an optimum flow graph of miners that need to run in dependent order with full parallelism as necessary using a general work flow engine. The flow graph (FLOW) captures flows associated with input

1


Page 02 of 3

artifacts or changes associated with one or more miners requiring partial processing.

This algorithm creates a significant advantage over the brute force or knowledge intensive approaches. New data types can be added incrementally in a very simple

way. Likewise, new miners can be added very easily. In both cases, there is no requirement to have extensive knowledge of the other miners, the other types, or the queries posed on the system. The system itself c...