Browse Prior Art Database

Data Package Preparation for Aggregating Analytics by Traveling Autonomous Software Agents

IP.com Disclosure Number: IPCOM000236729D
Publication Date: 2014-May-13
Document File: 4 page(s) / 253K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is the Travel-Accumulate-Reduce Approach (TAR) to Big Data, which provides capabilities of both software agents and methods methods for centralized big data analytics. The goal is to perform analytics on unstructured data stored in different locations of different types in order to identify relationships between said objects.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 01 of 4

Data Package Preparation for Aggregating Analytics by Traveling Autonomous Software Agents

In big data analytics, with increased unstructured data located in diverse and physically distinct locations and media, it is necessary to find mechanisms that not only move code to data, but also discover previously unknown relationships between data sets. Once these relationships are discovered, a global analytics steps is needed. Two or multi-level methods for moving computation to the nodes followed by a set of reduction steps on the intermediate data generated, are good at corralling big data sets and are used when the structure and relationships of data sets are known .

Software agents travel through unstructured data sources and discover relationships but cannot be expected to carry the discoveries nor data set results of analytics from one node to another; therefore, the software agents usually cannot provide a global view of big data analytics.

The novel solution is the Travel- Accumulate-Reduce Approach (TAR) to Big Data. This approach provides capabilities of both software agents and methods for batch big data analytics. A software agent capable of traveling to and executing on appliances attached to big data storage locations first performs analytics on the storage device , and then creates and leaves behind packages of results consisting of smaller data sets (intended for an aggregating analytics). These smaller data sets are left on the storage data. The software agent then proceeds to the next storage device .

Instructions (referred to as code) execute on more than one appliance attached to storage where said code autonomously travels from one appliance to the next .

Analyzed data may be accumulated at each appliance . A host with interest in that data is notified of that package. The traveling agent is capable of making a decision regarding the next destination based on what it has learned , but may not be able to take this intermediate data with it, or infer a global picture. It leaves the kernels of what it has learned at the appliance (a package marked for a aggregation host to pick up). A reduction step is performed by a host which has a global picture of intermediate data accumulated by the traveling agents based on notification of accumulated data at storage media.

The goal is to perform analytics on unstructured data stored in different locations of different types (e.g., video, music, sales records, weather, etc.) in order to identify relationships between said objects. These relationships may not be known a priori and need to be discovered by the code. The stored object data may be large and located in multiple physical or geographical locations. The storage medium of the data has an associated appliance where said traveling code can run and access objects . The code can bring with it the associated permissions and tokens needed to access specific object classes in the storage medium. The Internet Protocol (IP) addr...