Browse Prior Art Database

A Method and System for Disparate Data Aggregation and Enrichment

IP.com Disclosure Number: IPCOM000248322D
Publication Date: 2016-Nov-15
Document File: 6 page(s) / 66K

Publishing Venue

The IP.com Prior Art Database

Abstract

A system is disclosed for disparate data aggregation and enrichment in a classified environment. The system utilizes a series of processes to automatically aggregate, clean, de-conflict and enrich disparate datasets to provide an accurate answers/information to decision-makers.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 29% of the total text.

Page 01 of 6

A Method and System for Disparate Data Aggregation and Enrichment

Disclosed is a system for disparate data aggregation and enrichment in a classified environment. The system utilizes a series of processes to automatically aggregate, clean, de-conflict, and enrich disparate datasets to provide accurate answers/information to decision-makers.

The system is comprised of nine interconnected and fully automated components: infrastructure and networking, data ingest and schema creation, data cleaning and error logging, data enrichment, data linking redundant objects, data merging, database linking, data analysis, and data visualization.

Infrastructure and Networking

This system relies on the ability to ingest data across classification networks. In many cases, the full set of data required to answer a particular question is scattered across multiple disconnected networks (e.g. NIPR, SIPR, JWICS, etc.). To overcome this challenge, a cross-domain solution is employed that will pass data through the network security guards into a single repository. By reaching into each of the networks and pulling the data out, no additional effort is imposed on data owners, i.e. no additional data entry is required. As long as data is accessible, the system will reach down to grab it.

Data Ingest and Schema Creation

The raw data repositories that this system pulls from can come in many forms, such as web services, sharepoint, local workstation, web database and web page. This goes back to the idea of not imposing any additional work on the data owner, where they would have to move the data into a new storage repository for ingest. This system will ingest from a multitude of data repository types and parse that data into a raw data repository.

Since the purpose behind this system is to answer questions, a schema is created with the fields that are pertinent to answering decision-makers routine questions. Only these fields are passed on to the next function for data cleaning. The remaining fields are left untouched, but can be later included if needed to answer new questions. By using the schema approach, fewer fields need to be cleaned and less data is passed through the system. This increases the overall efficiency and reduces the "noise" that would be created on the back end if hundreds of fields were presented to the end user.

Data Cleaning and Error Logging

The data cleaning and error logging process improves the overall data quality of each data source through error correction and by creating data consistency. Typical errors that are corrected by this process include: data entry into the wrong field, misspellings,

1


Page 02 of 6

inconsistent terms, inconsistent formatting and missing data.

The system corrects these errors by using three methods: logic checks, lookup tables and machine learning algorithms. The logic checks are used to check whether the data entered makes sense. For instance, if an airplane flight has takeoff and landing information, t...